2025-09-16 12:37:56,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.075-delay_15
2025-09-16 12:37:56,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.075-delay_15
2025-09-16 12:37:56,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'15': <latency_env.delayed_mdp.ConstantDelay object at 0x14f511b047d0>}
2025-09-16 12:37:56,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 12:37:56,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 12:37:56,927 baseline-bpql-noisepromille75-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=631, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 12:37:56,928 baseline-bpql-noisepromille75-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 12:37:58,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 12:37:58,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 12:39:47,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:39:47,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 182.53177 ± 25.046
2025-09-16 12:39:47,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [163.83522, 234.61516, 214.83055, 148.76213, 168.60313, 180.39333, 156.16122, 182.58603, 194.01749, 181.5134]
2025-09-16 12:39:47,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 45.0, 44.0, 31.0, 34.0, 39.0, 32.0, 38.0, 41.0, 39.0]
2025-09-16 12:39:47,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (182.53) for latency 15
2025-09-16 12:39:47,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 17 seconds)
2025-09-16 12:41:45,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:41:46,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 288.07831 ± 64.739
2025-09-16 12:41:46,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [316.15796, 217.05649, 251.49463, 341.06296, 327.90927, 146.68666, 315.59024, 364.42352, 341.88913, 258.51205]
2025-09-16 12:41:46,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [60.0, 42.0, 47.0, 66.0, 63.0, 28.0, 60.0, 76.0, 67.0, 50.0]
2025-09-16 12:41:46,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (288.08) for latency 15
2025-09-16 12:41:46,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 5 minutes, 53 seconds)
2025-09-16 12:43:43,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:43:44,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 236.54903 ± 107.673
2025-09-16 12:43:44,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [360.79245, 447.25015, 130.24838, 129.48125, 149.85518, 258.50098, 235.68085, 140.72987, 344.12872, 168.82256]
2025-09-16 12:43:44,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [68.0, 85.0, 25.0, 25.0, 29.0, 49.0, 45.0, 27.0, 65.0, 32.0]
2025-09-16 12:43:44,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 6 minutes, 24 seconds)
2025-09-16 12:45:43,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:45:44,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 402.42117 ± 118.232
2025-09-16 12:45:44,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [501.50632, 411.0784, 539.84845, 294.44604, 414.72083, 555.3707, 132.85988, 376.20718, 364.21005, 433.96387]
2025-09-16 12:45:44,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 78.0, 101.0, 57.0, 88.0, 107.0, 26.0, 76.0, 68.0, 82.0]
2025-09-16 12:45:44,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (402.42) for latency 15
2025-09-16 12:45:44,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 6 minutes, 15 seconds)
2025-09-16 12:47:42,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:47:43,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 390.16046 ± 123.828
2025-09-16 12:47:43,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [380.6989, 271.18088, 662.95557, 367.94937, 391.98972, 159.36153, 454.4727, 344.4481, 454.1377, 414.4101]
2025-09-16 12:47:43,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [71.0, 53.0, 136.0, 75.0, 75.0, 31.0, 96.0, 73.0, 91.0, 78.0]
2025-09-16 12:47:43,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 5 minutes, 14 seconds)
2025-09-16 12:49:41,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:49:43,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 401.42523 ± 51.962
2025-09-16 12:49:43,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [350.58594, 391.23807, 379.95905, 425.03168, 453.82874, 356.2577, 357.13577, 476.1371, 485.31735, 338.76114]
2025-09-16 12:49:43,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [66.0, 72.0, 71.0, 87.0, 83.0, 73.0, 69.0, 89.0, 92.0, 66.0]
2025-09-16 12:49:43,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 6 minutes, 30 seconds)
2025-09-16 12:51:41,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:51:42,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 375.53214 ± 81.798
2025-09-16 12:51:42,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [339.45523, 333.292, 337.1323, 408.81238, 362.7446, 365.59274, 313.75726, 581.12335, 440.9249, 272.48645]
2025-09-16 12:51:42,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [64.0, 64.0, 64.0, 77.0, 80.0, 67.0, 58.0, 113.0, 82.0, 59.0]
2025-09-16 12:51:42,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 4 minutes, 51 seconds)
2025-09-16 12:53:41,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:53:42,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 395.03922 ± 102.215
2025-09-16 12:53:42,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [392.17822, 435.72504, 329.91373, 388.6955, 415.3327, 150.98514, 388.19116, 581.60675, 452.4354, 415.32858]
2025-09-16 12:53:42,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 89.0, 62.0, 79.0, 76.0, 29.0, 71.0, 110.0, 84.0, 90.0]
2025-09-16 12:53:42,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 3 minutes, 26 seconds)
2025-09-16 12:55:41,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:55:42,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 392.21930 ± 153.197
2025-09-16 12:55:42,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [351.57074, 366.68814, 356.25262, 150.76758, 436.68588, 489.93274, 427.8294, 595.70807, 130.59775, 616.1602]
2025-09-16 12:55:42,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 70.0, 67.0, 29.0, 83.0, 93.0, 82.0, 130.0, 25.0, 121.0]
2025-09-16 12:55:42,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 3 hours, 1 minute, 27 seconds)
2025-09-16 12:57:41,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:57:42,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 418.77686 ± 62.530
2025-09-16 12:57:42,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [399.32144, 392.80173, 435.14413, 407.18762, 360.8401, 479.36435, 495.41223, 427.92697, 503.32443, 286.4457]
2025-09-16 12:57:42,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 77.0, 82.0, 78.0, 69.0, 93.0, 103.0, 79.0, 92.0, 55.0]
2025-09-16 12:57:42,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (418.78) for latency 15
2025-09-16 12:57:42,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 59 minutes, 37 seconds)
2025-09-16 12:59:40,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:59:41,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 419.59491 ± 98.549
2025-09-16 12:59:41,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [535.122, 312.13593, 311.38846, 372.71387, 316.85135, 368.2805, 467.639, 623.30347, 451.57727, 436.9374]
2025-09-16 12:59:41,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 58.0, 69.0, 71.0, 61.0, 67.0, 88.0, 117.0, 82.0, 95.0]
2025-09-16 12:59:41,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (419.59) for latency 15
2025-09-16 12:59:41,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 57 minutes, 40 seconds)
2025-09-16 13:01:39,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:01:40,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 412.21289 ± 51.001
2025-09-16 13:01:40,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [386.42584, 382.98486, 439.20132, 370.75104, 386.7737, 528.8935, 331.86923, 422.1979, 431.6664, 441.36484]
2025-09-16 13:01:40,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 72.0, 81.0, 68.0, 82.0, 107.0, 61.0, 79.0, 86.0, 83.0]
2025-09-16 13:01:40,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 55 minutes, 31 seconds)
2025-09-16 13:03:39,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:03:40,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 427.38541 ± 103.117
2025-09-16 13:03:40,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [303.01205, 381.47244, 513.9505, 385.39218, 362.126, 438.0195, 440.40887, 688.87085, 347.52753, 413.07382]
2025-09-16 13:03:40,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [61.0, 69.0, 94.0, 70.0, 81.0, 95.0, 80.0, 129.0, 64.0, 77.0]
2025-09-16 13:03:40,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (427.39) for latency 15
2025-09-16 13:03:40,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 53 minutes, 24 seconds)
2025-09-16 13:05:38,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:05:39,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 446.67377 ± 80.867
2025-09-16 13:05:39,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [501.49713, 425.37714, 505.00635, 652.24896, 393.70416, 422.52365, 368.73465, 399.0991, 404.31308, 394.23355]
2025-09-16 13:05:39,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 79.0, 93.0, 122.0, 72.0, 79.0, 81.0, 88.0, 77.0, 72.0]
2025-09-16 13:05:39,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (446.67) for latency 15
2025-09-16 13:05:39,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 51 minutes, 9 seconds)
2025-09-16 13:07:37,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:07:38,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 463.43539 ± 110.789
2025-09-16 13:07:38,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [546.93823, 538.89264, 398.99902, 292.2876, 671.459, 405.2679, 386.06464, 385.89398, 589.832, 418.7194]
2025-09-16 13:07:38,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [101.0, 99.0, 73.0, 55.0, 131.0, 74.0, 71.0, 78.0, 112.0, 76.0]
2025-09-16 13:07:38,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (463.44) for latency 15
2025-09-16 13:07:38,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 48 minutes, 56 seconds)
2025-09-16 13:09:36,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:09:37,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 431.90283 ± 133.872
2025-09-16 13:09:37,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [428.04468, 329.47742, 507.7973, 605.3272, 623.4073, 149.29912, 479.54596, 316.19086, 423.4331, 456.50552]
2025-09-16 13:09:37,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 63.0, 107.0, 113.0, 121.0, 29.0, 93.0, 60.0, 78.0, 84.0]
2025-09-16 13:09:37,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 46 minutes, 41 seconds)
2025-09-16 13:11:36,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:11:37,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 363.79449 ± 125.033
2025-09-16 13:11:37,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [414.67746, 475.13846, 170.4106, 129.44092, 321.3151, 354.8626, 542.7108, 394.78745, 482.28125, 352.3204]
2025-09-16 13:11:37,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 91.0, 33.0, 25.0, 61.0, 64.0, 112.0, 74.0, 92.0, 66.0]
2025-09-16 13:11:37,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 44 minutes, 56 seconds)
2025-09-16 13:13:35,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:13:36,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 491.12030 ± 123.507
2025-09-16 13:13:36,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [531.0444, 441.06073, 450.5045, 832.1616, 490.03668, 498.94388, 347.76987, 472.95352, 436.71982, 410.00818]
2025-09-16 13:13:36,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 83.0, 83.0, 170.0, 92.0, 93.0, 65.0, 87.0, 89.0, 76.0]
2025-09-16 13:13:36,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (491.12) for latency 15
2025-09-16 13:13:36,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 42 minutes, 55 seconds)
2025-09-16 13:15:34,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:15:35,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 527.37915 ± 124.904
2025-09-16 13:15:35,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [779.2933, 589.6548, 510.95724, 681.6064, 353.57928, 567.91064, 408.84747, 513.4712, 458.44525, 410.0258]
2025-09-16 13:15:35,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [148.0, 110.0, 98.0, 143.0, 67.0, 122.0, 73.0, 101.0, 84.0, 85.0]
2025-09-16 13:15:35,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (527.38) for latency 15
2025-09-16 13:15:35,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 40 minutes, 58 seconds)
2025-09-16 13:17:33,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:17:34,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 490.12378 ± 142.752
2025-09-16 13:17:34,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [742.65546, 460.87152, 478.57336, 491.5338, 559.7422, 519.0106, 623.4788, 160.14749, 427.44766, 437.77682]
2025-09-16 13:17:34,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [144.0, 87.0, 90.0, 91.0, 103.0, 96.0, 131.0, 31.0, 80.0, 82.0]
2025-09-16 13:17:34,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 38 minutes, 59 seconds)
2025-09-16 13:19:32,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:19:33,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 422.04608 ± 63.740
2025-09-16 13:19:33,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [427.2111, 522.0397, 388.19873, 486.88428, 406.3281, 433.6587, 381.83658, 505.6808, 355.10312, 313.52023]
2025-09-16 13:19:33,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 113.0, 83.0, 91.0, 75.0, 78.0, 71.0, 95.0, 66.0, 59.0]
2025-09-16 13:19:33,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 37 minutes, 1 second)
2025-09-16 13:21:32,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:21:33,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 497.89078 ± 125.865
2025-09-16 13:21:33,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [343.52676, 356.4025, 514.50323, 777.22284, 587.3853, 365.11133, 491.3449, 438.36957, 575.71735, 529.32355]
2025-09-16 13:21:33,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [71.0, 70.0, 97.0, 153.0, 126.0, 69.0, 92.0, 81.0, 122.0, 111.0]
2025-09-16 13:21:33,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 35 minutes, 10 seconds)
2025-09-16 13:23:32,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:23:33,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 404.65372 ± 116.830
2025-09-16 13:23:33,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [426.35095, 428.55762, 360.81497, 632.2136, 125.38232, 450.9946, 428.13654, 383.13272, 388.32935, 422.6244]
2025-09-16 13:23:33,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 80.0, 69.0, 120.0, 24.0, 83.0, 78.0, 70.0, 73.0, 79.0]
2025-09-16 13:23:33,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 33 minutes, 15 seconds)
2025-09-16 13:25:31,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:25:32,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 455.49136 ± 126.193
2025-09-16 13:25:32,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [529.50037, 419.53265, 412.75986, 514.6167, 139.85487, 601.15234, 551.4667, 562.55835, 381.66956, 441.80243]
2025-09-16 13:25:32,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 77.0, 75.0, 96.0, 27.0, 109.0, 101.0, 105.0, 70.0, 95.0]
2025-09-16 13:25:32,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 31 minutes, 7 seconds)
2025-09-16 13:27:30,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:27:31,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 466.76587 ± 89.864
2025-09-16 13:27:31,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [447.29028, 666.86127, 540.7821, 412.81876, 306.04358, 431.46866, 487.5643, 407.00687, 472.07114, 495.7516]
2025-09-16 13:27:31,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 120.0, 99.0, 76.0, 57.0, 92.0, 88.0, 81.0, 88.0, 109.0]
2025-09-16 13:27:31,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 29 minutes, 13 seconds)
2025-09-16 13:29:30,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:29:31,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 494.61475 ± 109.170
2025-09-16 13:29:31,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [378.2416, 555.19995, 462.74683, 322.7843, 498.68268, 610.6596, 711.9529, 404.88876, 464.52155, 536.4691]
2025-09-16 13:29:31,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [72.0, 109.0, 90.0, 62.0, 90.0, 115.0, 140.0, 76.0, 83.0, 99.0]
2025-09-16 13:29:31,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 27 minutes, 30 seconds)
2025-09-16 13:31:30,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:31:31,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 505.32095 ± 117.895
2025-09-16 13:31:31,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [718.43896, 540.044, 553.91565, 333.17993, 511.58453, 404.0037, 673.86914, 374.10986, 505.40726, 438.65616]
2025-09-16 13:31:31,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [153.0, 100.0, 105.0, 62.0, 94.0, 88.0, 128.0, 70.0, 98.0, 81.0]
2025-09-16 13:31:31,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 25 minutes, 25 seconds)
2025-09-16 13:33:29,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:33:31,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 455.70776 ± 103.906
2025-09-16 13:33:31,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [480.56845, 555.2151, 350.9431, 322.6259, 437.85233, 392.27567, 513.4943, 692.29376, 414.33444, 397.4745]
2025-09-16 13:33:31,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 109.0, 68.0, 59.0, 80.0, 86.0, 110.0, 137.0, 85.0, 75.0]
2025-09-16 13:33:31,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 23 minutes, 22 seconds)
2025-09-16 13:35:30,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:35:31,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 500.66217 ± 207.298
2025-09-16 13:35:31,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [404.68112, 657.08514, 771.9132, 129.32309, 374.06747, 508.61737, 327.1649, 857.83105, 560.40234, 415.53598]
2025-09-16 13:35:31,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [72.0, 130.0, 144.0, 25.0, 73.0, 99.0, 64.0, 162.0, 102.0, 86.0]
2025-09-16 13:35:31,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 21 minutes, 48 seconds)
2025-09-16 13:37:29,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:37:30,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 481.42368 ± 104.100
2025-09-16 13:37:30,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [404.01785, 498.42255, 626.83923, 463.93936, 446.07327, 522.8828, 535.03467, 227.52757, 510.75864, 578.7408]
2025-09-16 13:37:30,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 102.0, 116.0, 84.0, 94.0, 95.0, 95.0, 46.0, 98.0, 109.0]
2025-09-16 13:37:30,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 19 minutes, 48 seconds)
2025-09-16 13:39:28,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:39:29,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 483.39435 ± 84.356
2025-09-16 13:39:29,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [373.56796, 447.876, 366.26144, 497.91013, 570.9203, 560.7388, 482.53113, 413.6943, 642.67834, 477.76523]
2025-09-16 13:39:29,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 84.0, 69.0, 91.0, 108.0, 106.0, 89.0, 76.0, 118.0, 89.0]
2025-09-16 13:39:30,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 17 minutes, 37 seconds)
2025-09-16 13:41:28,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:41:30,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 439.17007 ± 145.333
2025-09-16 13:41:30,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [429.49496, 328.23056, 476.73615, 597.50726, 514.2788, 518.04755, 369.09036, 119.40748, 661.6185, 377.2892]
2025-09-16 13:41:30,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 63.0, 90.0, 126.0, 95.0, 109.0, 80.0, 23.0, 121.0, 70.0]
2025-09-16 13:41:30,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 15 minutes, 40 seconds)
2025-09-16 13:43:27,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:43:28,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 500.61066 ± 138.185
2025-09-16 13:43:28,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [572.8247, 744.3922, 480.65204, 448.8536, 262.01157, 426.03455, 393.30548, 419.8948, 559.40924, 698.72815]
2025-09-16 13:43:28,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 154.0, 91.0, 83.0, 49.0, 78.0, 73.0, 79.0, 103.0, 146.0]
2025-09-16 13:43:28,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 13 minutes, 23 seconds)
2025-09-16 13:45:27,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:45:29,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 470.61700 ± 216.092
2025-09-16 13:45:29,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [716.3923, 641.5146, 390.99268, 407.12888, 555.2654, 138.93752, 124.70644, 449.60336, 451.3824, 830.2465]
2025-09-16 13:45:29,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [146.0, 117.0, 73.0, 85.0, 117.0, 27.0, 24.0, 85.0, 88.0, 153.0]
2025-09-16 13:45:29,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 11 minutes, 32 seconds)
2025-09-16 13:47:26,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:47:28,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 543.38977 ± 125.256
2025-09-16 13:47:28,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [505.5156, 359.65155, 555.7715, 769.35596, 591.0422, 747.21783, 473.70764, 426.0739, 454.0457, 551.516]
2025-09-16 13:47:28,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 69.0, 101.0, 145.0, 126.0, 136.0, 99.0, 77.0, 83.0, 101.0]
2025-09-16 13:47:28,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (543.39) for latency 15
2025-09-16 13:47:28,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 9 minutes, 28 seconds)
2025-09-16 13:49:27,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:49:28,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 484.79483 ± 136.565
2025-09-16 13:49:28,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [144.76138, 454.28168, 495.21017, 658.581, 578.0858, 496.2387, 603.5138, 576.244, 426.91513, 414.1165]
2025-09-16 13:49:28,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 83.0, 105.0, 126.0, 107.0, 90.0, 108.0, 107.0, 79.0, 85.0]
2025-09-16 13:49:28,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 7 minutes, 39 seconds)
2025-09-16 13:51:27,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:51:28,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 525.85413 ± 201.190
2025-09-16 13:51:28,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [450.2173, 513.2273, 444.00006, 881.0356, 145.3532, 849.96796, 516.23, 409.50897, 488.59454, 560.4068]
2025-09-16 13:51:28,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [95.0, 93.0, 80.0, 173.0, 28.0, 158.0, 92.0, 75.0, 99.0, 103.0]
2025-09-16 13:51:28,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 5 minutes, 42 seconds)
2025-09-16 13:53:28,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:53:29,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 491.27328 ± 140.233
2025-09-16 13:53:29,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [440.4241, 621.7481, 544.1789, 170.14566, 370.43817, 657.82587, 639.09674, 524.2596, 522.9187, 421.69687]
2025-09-16 13:53:29,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 113.0, 100.0, 33.0, 70.0, 117.0, 120.0, 95.0, 97.0, 78.0]
2025-09-16 13:53:29,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 4 minutes, 13 seconds)
2025-09-16 13:55:27,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:55:28,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 512.31476 ± 221.681
2025-09-16 13:55:28,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [770.0131, 349.30002, 369.37476, 437.31912, 469.65735, 622.89453, 682.5205, 922.087, 140.83876, 359.14255]
2025-09-16 13:55:28,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [158.0, 72.0, 71.0, 91.0, 90.0, 118.0, 125.0, 185.0, 27.0, 67.0]
2025-09-16 13:55:28,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 1 minute, 53 seconds)
2025-09-16 13:57:26,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:57:28,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 539.03064 ± 158.914
2025-09-16 13:57:28,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [476.84146, 952.58344, 369.8902, 405.32272, 640.95264, 545.33527, 497.35458, 584.9382, 502.87204, 414.216]
2025-09-16 13:57:28,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 187.0, 70.0, 75.0, 133.0, 98.0, 100.0, 108.0, 101.0, 77.0]
2025-09-16 13:57:28,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 59 minutes, 58 seconds)
2025-09-16 13:59:26,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:59:27,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 627.82776 ± 122.378
2025-09-16 13:59:27,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [506.15756, 892.64404, 615.3199, 621.42957, 796.45953, 531.72626, 471.0658, 585.9245, 610.99786, 646.5519]
2025-09-16 13:59:27,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [91.0, 164.0, 114.0, 111.0, 150.0, 95.0, 86.0, 124.0, 109.0, 136.0]
2025-09-16 13:59:27,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (627.83) for latency 15
2025-09-16 13:59:27,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 57 minutes, 53 seconds)
2025-09-16 14:01:27,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:01:28,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 548.54407 ± 145.874
2025-09-16 14:01:28,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [771.19666, 387.6026, 510.40012, 844.81305, 467.18094, 515.10614, 643.79846, 435.85013, 476.84985, 432.64294]
2025-09-16 14:01:28,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [146.0, 82.0, 106.0, 159.0, 87.0, 109.0, 123.0, 88.0, 85.0, 80.0]
2025-09-16 14:01:29,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 56 minutes, 4 seconds)
2025-09-16 14:03:27,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:03:29,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 610.59247 ± 214.435
2025-09-16 14:03:29,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [848.8502, 457.14636, 123.96087, 841.74615, 432.65518, 652.49023, 613.9138, 582.2687, 772.9641, 779.9298]
2025-09-16 14:03:29,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [159.0, 83.0, 24.0, 167.0, 80.0, 124.0, 111.0, 106.0, 144.0, 148.0]
2025-09-16 14:03:29,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 53 minutes, 58 seconds)
2025-09-16 14:05:28,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:05:29,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 543.41174 ± 77.438
2025-09-16 14:05:29,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [607.10254, 564.06696, 666.6165, 590.6051, 444.2575, 467.95477, 422.3655, 622.8628, 503.604, 544.6816]
2025-09-16 14:05:29,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 119.0, 125.0, 128.0, 79.0, 87.0, 77.0, 121.0, 96.0, 106.0]
2025-09-16 14:05:29,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 52 minutes, 8 seconds)
2025-09-16 14:07:28,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:07:29,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 591.52832 ± 179.149
2025-09-16 14:07:29,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [565.8882, 755.2411, 405.94797, 658.17395, 634.44946, 520.3727, 365.90485, 685.62103, 363.47076, 960.21313]
2025-09-16 14:07:29,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 162.0, 76.0, 120.0, 117.0, 101.0, 70.0, 124.0, 78.0, 184.0]
2025-09-16 14:07:29,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 50 minutes, 16 seconds)
2025-09-16 14:09:28,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:09:29,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 533.30383 ± 244.426
2025-09-16 14:09:29,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [898.778, 458.59384, 965.2803, 394.8559, 579.87756, 525.6753, 665.5485, 352.32153, 378.3791, 113.729095]
2025-09-16 14:09:29,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [167.0, 85.0, 186.0, 88.0, 106.0, 99.0, 140.0, 76.0, 69.0, 22.0]
2025-09-16 14:09:29,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 48 minutes, 19 seconds)
2025-09-16 14:11:29,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:11:31,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 726.21649 ± 166.018
2025-09-16 14:11:31,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [919.9583, 734.14954, 814.5489, 478.1535, 585.3129, 800.34674, 718.7951, 676.768, 504.86615, 1029.2654]
2025-09-16 14:11:31,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [171.0, 134.0, 155.0, 87.0, 126.0, 173.0, 140.0, 123.0, 106.0, 200.0]
2025-09-16 14:11:31,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (726.22) for latency 15
2025-09-16 14:11:31,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 46 minutes, 27 seconds)
2025-09-16 14:13:31,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:13:33,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 684.39764 ± 177.656
2025-09-16 14:13:33,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [397.64056, 496.42865, 803.23584, 609.3204, 671.2802, 505.98145, 687.85187, 900.8733, 981.5667, 789.7975]
2025-09-16 14:13:33,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 91.0, 147.0, 111.0, 125.0, 106.0, 127.0, 171.0, 208.0, 146.0]
2025-09-16 14:13:33,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 44 minutes, 40 seconds)
2025-09-16 14:15:30,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:15:32,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 630.27789 ± 237.750
2025-09-16 14:15:32,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [730.0792, 648.4692, 759.06256, 473.9596, 420.82455, 710.8205, 951.3221, 529.9449, 947.7026, 130.59395]
2025-09-16 14:15:32,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [135.0, 124.0, 155.0, 87.0, 78.0, 128.0, 183.0, 95.0, 195.0, 25.0]
2025-09-16 14:15:32,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 42 minutes, 26 seconds)
2025-09-16 14:17:31,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:17:32,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 598.01416 ± 109.227
2025-09-16 14:17:32,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [519.5522, 652.0554, 648.63324, 509.9044, 449.23306, 701.8652, 576.9968, 455.4187, 802.64276, 663.83984]
2025-09-16 14:17:32,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 130.0, 113.0, 95.0, 99.0, 131.0, 114.0, 99.0, 151.0, 124.0]
2025-09-16 14:17:32,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 40 minutes, 32 seconds)
2025-09-16 14:19:31,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:19:32,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 566.36267 ± 198.466
2025-09-16 14:19:32,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [359.49283, 537.5436, 362.84036, 586.57776, 348.13574, 488.01373, 1050.9072, 632.1468, 679.8752, 618.09424]
2025-09-16 14:19:32,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [64.0, 96.0, 68.0, 109.0, 65.0, 88.0, 200.0, 123.0, 136.0, 116.0]
2025-09-16 14:19:32,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 38 minutes, 30 seconds)
2025-09-16 14:21:32,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:21:34,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 568.16125 ± 125.461
2025-09-16 14:21:34,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [558.20465, 532.8061, 390.0258, 727.5408, 635.02625, 478.9294, 645.9126, 752.66656, 355.8649, 604.63556]
2025-09-16 14:21:34,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 111.0, 73.0, 134.0, 118.0, 95.0, 136.0, 143.0, 68.0, 123.0]
2025-09-16 14:21:34,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 36 minutes, 26 seconds)
2025-09-16 14:23:33,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:23:35,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 687.71979 ± 204.056
2025-09-16 14:23:35,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [961.8552, 705.299, 417.52057, 733.0604, 755.7743, 756.3898, 523.42755, 331.03766, 1007.5605, 685.2723]
2025-09-16 14:23:35,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [195.0, 129.0, 90.0, 136.0, 148.0, 137.0, 94.0, 64.0, 195.0, 125.0]
2025-09-16 14:23:35,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 34 minutes, 21 seconds)
2025-09-16 14:25:34,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:25:36,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 788.34973 ± 210.683
2025-09-16 14:25:36,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1008.85944, 603.42566, 824.6876, 794.93774, 608.51337, 961.49725, 615.1737, 1207.8187, 486.42038, 772.1631]
2025-09-16 14:25:36,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [187.0, 127.0, 151.0, 151.0, 112.0, 183.0, 115.0, 242.0, 86.0, 159.0]
2025-09-16 14:25:36,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (788.35) for latency 15
2025-09-16 14:25:36,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 32 minutes, 40 seconds)
2025-09-16 14:27:34,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:27:36,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 648.26471 ± 153.784
2025-09-16 14:27:36,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [643.29175, 760.1466, 503.20566, 552.0532, 527.0021, 719.8484, 379.3983, 784.8358, 678.274, 934.5914]
2025-09-16 14:27:36,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [117.0, 153.0, 101.0, 101.0, 98.0, 150.0, 71.0, 158.0, 131.0, 176.0]
2025-09-16 14:27:36,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 30 minutes, 33 seconds)
2025-09-16 14:29:35,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:29:36,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 663.99329 ± 294.463
2025-09-16 14:29:36,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [434.78317, 560.73224, 108.28755, 580.1924, 1216.2817, 893.90765, 780.87897, 481.5319, 632.58466, 950.75244]
2025-09-16 14:29:36,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 117.0, 21.0, 107.0, 225.0, 189.0, 146.0, 108.0, 117.0, 182.0]
2025-09-16 14:29:36,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 28 minutes, 36 seconds)
2025-09-16 14:31:36,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:31:38,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 656.71454 ± 212.494
2025-09-16 14:31:38,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [409.68146, 619.6864, 547.24945, 714.6234, 583.8447, 1212.482, 697.68823, 424.54446, 717.86237, 639.4831]
2025-09-16 14:31:38,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 130.0, 103.0, 134.0, 125.0, 237.0, 123.0, 78.0, 141.0, 120.0]
2025-09-16 14:31:38,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 26 minutes, 35 seconds)
2025-09-16 14:33:36,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:33:38,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 648.15399 ± 252.436
2025-09-16 14:33:38,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [419.43817, 633.5285, 897.47107, 781.2503, 134.89108, 637.6714, 941.1785, 696.1887, 945.79877, 394.12375]
2025-09-16 14:33:38,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 115.0, 166.0, 157.0, 26.0, 119.0, 169.0, 126.0, 179.0, 72.0]
2025-09-16 14:33:38,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 24 minutes, 23 seconds)
2025-09-16 14:35:37,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:35:39,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 751.35767 ± 228.831
2025-09-16 14:35:39,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1233.1941, 728.55865, 919.81537, 830.90173, 717.6361, 421.42047, 443.74927, 554.72156, 828.45264, 835.1268]
2025-09-16 14:35:39,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [225.0, 136.0, 176.0, 176.0, 137.0, 76.0, 78.0, 114.0, 157.0, 154.0]
2025-09-16 14:35:39,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 22 minutes, 23 seconds)
2025-09-16 14:37:39,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:37:41,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 704.59821 ± 256.442
2025-09-16 14:37:41,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [150.19246, 735.39246, 867.42053, 921.54376, 481.3109, 573.2323, 1106.9114, 887.7479, 592.4081, 729.8216]
2025-09-16 14:37:41,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 133.0, 157.0, 179.0, 90.0, 118.0, 207.0, 162.0, 114.0, 142.0]
2025-09-16 14:37:41,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 20 minutes, 34 seconds)
2025-09-16 14:39:38,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:39:40,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 733.78613 ± 241.140
2025-09-16 14:39:40,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [727.22253, 324.6942, 510.4207, 593.2469, 889.5251, 856.78674, 843.8037, 492.02203, 918.85956, 1181.2803]
2025-09-16 14:39:40,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [137.0, 64.0, 92.0, 108.0, 186.0, 158.0, 180.0, 97.0, 166.0, 219.0]
2025-09-16 14:39:41,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 18 minutes, 31 seconds)
2025-09-16 14:41:40,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:41:42,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 607.28217 ± 206.387
2025-09-16 14:41:42,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [360.52792, 619.0114, 746.6298, 666.27844, 654.1678, 678.22986, 795.1223, 164.57608, 483.8559, 904.4218]
2025-09-16 14:41:42,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 121.0, 142.0, 128.0, 121.0, 129.0, 144.0, 32.0, 91.0, 173.0]
2025-09-16 14:41:42,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 16 minutes, 29 seconds)
2025-09-16 14:43:41,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:43:43,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 703.28729 ± 307.130
2025-09-16 14:43:43,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [772.8692, 807.1668, 288.69724, 1278.2333, 903.1544, 806.408, 603.7862, 587.97217, 130.08434, 854.5012]
2025-09-16 14:43:43,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [139.0, 146.0, 57.0, 236.0, 157.0, 147.0, 123.0, 108.0, 25.0, 158.0]
2025-09-16 14:43:43,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 14 minutes, 36 seconds)
2025-09-16 14:45:45,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:45:48,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 857.79980 ± 268.515
2025-09-16 14:45:48,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [805.4288, 774.0029, 594.4492, 958.34186, 712.2152, 758.4133, 1357.7623, 1349.4181, 700.5641, 567.4028]
2025-09-16 14:45:48,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [165.0, 142.0, 112.0, 176.0, 127.0, 139.0, 265.0, 289.0, 143.0, 110.0]
2025-09-16 14:45:48,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (857.80) for latency 15
2025-09-16 14:45:48,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 13 minutes, 2 seconds)
2025-09-16 14:47:45,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:47:47,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 705.29871 ± 262.047
2025-09-16 14:47:47,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [745.0814, 663.3816, 397.91248, 903.8445, 846.2951, 681.9772, 945.2268, 123.704445, 1069.1229, 676.4401]
2025-09-16 14:47:47,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [144.0, 122.0, 76.0, 166.0, 154.0, 126.0, 184.0, 24.0, 198.0, 137.0]
2025-09-16 14:47:47,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 10 minutes, 43 seconds)
2025-09-16 14:49:44,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:49:47,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 760.23303 ± 357.573
2025-09-16 14:49:47,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [505.24265, 1284.1952, 134.75694, 1471.8378, 765.0368, 771.5206, 702.048, 713.2844, 639.2929, 615.1154]
2025-09-16 14:49:47,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 270.0, 26.0, 275.0, 136.0, 142.0, 145.0, 141.0, 118.0, 134.0]
2025-09-16 14:49:47,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 8 minutes, 41 seconds)
2025-09-16 14:51:47,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:51:50,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 781.43530 ± 328.617
2025-09-16 14:51:50,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1111.1292, 1117.9646, 468.77362, 1245.1276, 452.43793, 625.52783, 374.0859, 405.78387, 1051.6819, 961.84015]
2025-09-16 14:51:50,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [214.0, 199.0, 87.0, 231.0, 96.0, 123.0, 72.0, 87.0, 197.0, 184.0]
2025-09-16 14:51:50,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 6 minutes, 50 seconds)
2025-09-16 14:53:49,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:53:51,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 866.57068 ± 351.175
2025-09-16 14:53:51,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [801.04047, 815.3894, 907.8514, 1443.0422, 364.34503, 479.52682, 768.4875, 819.8663, 1543.2108, 722.9472]
2025-09-16 14:53:51,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [147.0, 143.0, 168.0, 284.0, 67.0, 87.0, 138.0, 150.0, 289.0, 146.0]
2025-09-16 14:53:51,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (866.57) for latency 15
2025-09-16 14:53:51,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 4 minutes, 53 seconds)
2025-09-16 14:55:50,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:55:53,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 844.07684 ± 330.143
2025-09-16 14:55:53,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1445.058, 916.0322, 822.48267, 971.0315, 654.0977, 786.0767, 916.1246, 300.76486, 1241.9994, 387.10068]
2025-09-16 14:55:53,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [269.0, 177.0, 171.0, 181.0, 124.0, 158.0, 171.0, 58.0, 235.0, 70.0]
2025-09-16 14:55:53,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 2 minutes, 31 seconds)
2025-09-16 14:57:52,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:57:53,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 686.01129 ± 294.498
2025-09-16 14:57:53,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1405.9865, 801.0537, 385.88742, 472.64993, 354.21286, 488.7966, 640.2737, 798.5613, 641.32623, 871.36456]
2025-09-16 14:57:53,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [263.0, 146.0, 72.0, 101.0, 65.0, 105.0, 120.0, 142.0, 114.0, 162.0]
2025-09-16 14:57:53,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 39 seconds)
2025-09-16 14:59:53,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:59:55,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 842.35583 ± 264.635
2025-09-16 14:59:55,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [961.353, 961.6265, 736.6358, 606.46545, 1431.9226, 656.4381, 926.84534, 423.50937, 718.3077, 1000.4548]
2025-09-16 14:59:55,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [178.0, 189.0, 139.0, 116.0, 270.0, 124.0, 169.0, 79.0, 147.0, 180.0]
2025-09-16 14:59:55,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 58 minutes, 50 seconds)
2025-09-16 15:01:56,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:01:58,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 777.54285 ± 213.995
2025-09-16 15:01:58,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1014.6189, 969.05457, 534.9553, 1006.4096, 717.7761, 949.00824, 615.4557, 361.12994, 903.0996, 703.92004]
2025-09-16 15:01:58,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [184.0, 170.0, 98.0, 179.0, 130.0, 177.0, 116.0, 66.0, 179.0, 129.0]
2025-09-16 15:01:58,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 56 minutes, 48 seconds)
2025-09-16 15:03:59,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:04:01,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 906.74365 ± 388.637
2025-09-16 15:04:01,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1169.3256, 1488.9393, 1333.0801, 431.21225, 286.94763, 1092.9943, 1240.6523, 717.2519, 646.3163, 660.7172]
2025-09-16 15:04:01,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [242.0, 284.0, 258.0, 80.0, 54.0, 211.0, 227.0, 134.0, 116.0, 121.0]
2025-09-16 15:04:01,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (906.74) for latency 15
2025-09-16 15:04:01,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 54 minutes, 53 seconds)
2025-09-16 15:05:59,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:06:02,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1000.45734 ± 272.162
2025-09-16 15:06:02,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1125.8302, 797.9795, 1694.5516, 1174.6588, 926.2121, 804.9309, 1076.781, 830.95874, 755.5214, 817.1494]
2025-09-16 15:06:02,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [204.0, 146.0, 326.0, 218.0, 182.0, 168.0, 198.0, 169.0, 146.0, 146.0]
2025-09-16 15:06:02,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (1000.46) for latency 15
2025-09-16 15:06:02,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 52 minutes, 47 seconds)
2025-09-16 15:08:02,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:08:04,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 959.75458 ± 360.501
2025-09-16 15:08:04,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1522.6488, 1010.41046, 987.198, 941.38904, 130.32413, 625.4754, 877.07245, 1015.7065, 1179.5093, 1307.8114]
2025-09-16 15:08:04,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [290.0, 185.0, 181.0, 174.0, 25.0, 113.0, 162.0, 223.0, 222.0, 252.0]
2025-09-16 15:08:04,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 50 minutes, 53 seconds)
2025-09-16 15:10:04,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:10:07,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 863.18829 ± 275.881
2025-09-16 15:10:07,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [868.95667, 1036.5444, 481.5202, 902.707, 1525.8806, 515.1224, 842.6942, 761.3472, 926.824, 770.286]
2025-09-16 15:10:07,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [177.0, 198.0, 100.0, 183.0, 286.0, 95.0, 157.0, 155.0, 178.0, 156.0]
2025-09-16 15:10:07,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 48 minutes, 54 seconds)
2025-09-16 15:12:05,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:12:08,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 896.66418 ± 363.953
2025-09-16 15:12:08,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1229.611, 832.7189, 1404.829, 771.8949, 889.49054, 1081.3478, 938.8875, 146.2948, 429.90863, 1241.6586]
2025-09-16 15:12:08,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [222.0, 162.0, 252.0, 140.0, 157.0, 195.0, 175.0, 28.0, 75.0, 232.0]
2025-09-16 15:12:08,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 46 minutes, 43 seconds)
2025-09-16 15:14:08,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:14:10,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 946.24152 ± 309.020
2025-09-16 15:14:10,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [665.78394, 762.67926, 1009.6134, 566.5285, 1050.7466, 1041.9358, 1090.6052, 537.2641, 1622.9705, 1114.2885]
2025-09-16 15:14:10,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [137.0, 143.0, 189.0, 122.0, 200.0, 205.0, 197.0, 118.0, 290.0, 209.0]
2025-09-16 15:14:10,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 44 minutes, 40 seconds)
2025-09-16 15:16:13,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:16:16,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 934.49628 ± 338.805
2025-09-16 15:16:16,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [805.7279, 1022.9817, 525.4347, 785.5325, 1828.0924, 940.0892, 865.4598, 1125.04, 773.94916, 672.65533]
2025-09-16 15:16:16,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [158.0, 190.0, 96.0, 156.0, 339.0, 180.0, 162.0, 199.0, 157.0, 136.0]
2025-09-16 15:16:16,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 42 minutes, 57 seconds)
2025-09-16 15:18:12,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:18:15,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 946.89490 ± 316.276
2025-09-16 15:18:15,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1762.1799, 1249.022, 588.0235, 802.41565, 714.17725, 834.9301, 962.87415, 832.17285, 863.1147, 860.03876]
2025-09-16 15:18:15,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [348.0, 225.0, 106.0, 148.0, 124.0, 181.0, 172.0, 177.0, 173.0, 163.0]
2025-09-16 15:18:15,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 40 minutes, 41 seconds)
2025-09-16 15:20:17,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:20:19,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 831.63135 ± 158.251
2025-09-16 15:20:19,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1110.2751, 726.9848, 742.3455, 649.252, 1113.6642, 656.36017, 912.4382, 815.1253, 815.5195, 774.3485]
2025-09-16 15:20:19,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [226.0, 155.0, 152.0, 134.0, 238.0, 131.0, 175.0, 162.0, 158.0, 151.0]
2025-09-16 15:20:19,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 38 minutes, 47 seconds)
2025-09-16 15:22:18,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:22:21,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1094.24048 ± 472.667
2025-09-16 15:22:21,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [912.8914, 880.2409, 2139.9404, 1144.6825, 508.66418, 1155.8616, 652.74194, 1753.1488, 842.6657, 951.56775]
2025-09-16 15:22:21,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [174.0, 178.0, 415.0, 214.0, 91.0, 226.0, 143.0, 360.0, 172.0, 189.0]
2025-09-16 15:22:21,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (1094.24) for latency 15
2025-09-16 15:22:21,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 36 minutes, 47 seconds)
2025-09-16 15:24:21,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:24:24,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1323.18140 ± 517.388
2025-09-16 15:24:24,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [2245.8381, 974.5172, 1794.3319, 681.15643, 1445.6289, 1594.464, 1303.7748, 1773.5955, 599.2315, 819.2757]
2025-09-16 15:24:24,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [422.0, 178.0, 344.0, 124.0, 272.0, 300.0, 255.0, 327.0, 107.0, 165.0]
2025-09-16 15:24:24,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (1323.18) for latency 15
2025-09-16 15:24:24,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 34 minutes, 47 seconds)
2025-09-16 15:26:24,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:26:27,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1294.30688 ± 509.363
2025-09-16 15:26:27,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [802.5303, 740.7487, 1468.1735, 1028.5072, 2337.8818, 1926.8478, 987.85583, 1095.2039, 1678.9645, 876.35583]
2025-09-16 15:26:27,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [169.0, 153.0, 284.0, 204.0, 435.0, 343.0, 213.0, 195.0, 318.0, 190.0]
2025-09-16 15:26:27,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 32 minutes, 36 seconds)
2025-09-16 15:28:30,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:28:33,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1005.05359 ± 326.282
2025-09-16 15:28:33,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1126.9622, 1238.9012, 1083.8364, 713.2349, 609.3751, 1345.9608, 1498.7391, 1206.2655, 755.5319, 471.7289]
2025-09-16 15:28:33,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [209.0, 238.0, 212.0, 134.0, 126.0, 258.0, 281.0, 225.0, 143.0, 88.0]
2025-09-16 15:28:33,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 30 minutes, 54 seconds)
2025-09-16 15:30:30,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:30:33,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1193.53687 ± 393.141
2025-09-16 15:30:33,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [881.0237, 927.6974, 842.9251, 1828.8162, 1814.4266, 1318.3309, 1353.9468, 1049.4396, 1342.4761, 576.28656]
2025-09-16 15:30:33,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [174.0, 172.0, 156.0, 335.0, 372.0, 240.0, 256.0, 187.0, 269.0, 103.0]
2025-09-16 15:30:33,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 28 minutes, 39 seconds)
2025-09-16 15:32:34,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:32:37,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1069.37622 ± 410.646
2025-09-16 15:32:37,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [964.7914, 576.4484, 669.68585, 1085.1885, 525.35156, 1238.0051, 1464.938, 1766.1802, 1569.7057, 833.46783]
2025-09-16 15:32:37,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [173.0, 110.0, 120.0, 206.0, 94.0, 225.0, 278.0, 336.0, 306.0, 156.0]
2025-09-16 15:32:37,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 26 minutes, 41 seconds)
2025-09-16 15:34:35,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:34:37,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 817.81635 ± 481.732
2025-09-16 15:34:37,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [971.308, 865.25836, 810.07, 693.9544, 139.78801, 652.73175, 1160.7787, 1916.8098, 832.8991, 134.56543]
2025-09-16 15:34:37,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [170.0, 167.0, 174.0, 143.0, 27.0, 137.0, 206.0, 362.0, 180.0, 26.0]
2025-09-16 15:34:37,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 24 minutes, 30 seconds)
2025-09-16 15:36:39,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:36:42,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1280.37561 ± 682.049
2025-09-16 15:36:42,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [2937.7856, 1426.5376, 1966.5172, 1312.207, 996.67847, 608.42303, 653.4485, 739.77795, 841.29944, 1321.0807]
2025-09-16 15:36:42,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [549.0, 275.0, 374.0, 260.0, 185.0, 108.0, 143.0, 144.0, 160.0, 231.0]
2025-09-16 15:36:43,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 22 minutes, 33 seconds)
2025-09-16 15:38:40,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:38:43,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1069.67700 ± 490.717
2025-09-16 15:38:43,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [611.6756, 1312.5116, 1058.9005, 476.75528, 575.362, 1548.8645, 838.7287, 889.6221, 1216.3088, 2168.0408]
2025-09-16 15:38:43,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [123.0, 263.0, 199.0, 90.0, 110.0, 296.0, 156.0, 170.0, 249.0, 385.0]
2025-09-16 15:38:43,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 20 minutes, 21 seconds)
2025-09-16 15:40:43,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:40:46,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1039.09595 ± 574.091
2025-09-16 15:40:46,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [150.89464, 537.5715, 1087.697, 1985.2358, 1077.8185, 1332.687, 1598.0864, 148.35211, 1444.2312, 1028.385]
2025-09-16 15:40:46,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 97.0, 224.0, 375.0, 190.0, 266.0, 304.0, 28.0, 257.0, 199.0]
2025-09-16 15:40:46,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 18 minutes, 23 seconds)
2025-09-16 15:42:45,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:42:48,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1234.25232 ± 520.600
2025-09-16 15:42:48,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [574.9285, 1729.3956, 1639.7219, 675.1729, 1855.6423, 1444.3136, 961.0653, 1220.2295, 405.94925, 1836.1049]
2025-09-16 15:42:48,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 308.0, 297.0, 118.0, 349.0, 262.0, 179.0, 224.0, 72.0, 343.0]
2025-09-16 15:42:48,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 16 minutes, 17 seconds)
2025-09-16 15:44:48,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:44:50,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 977.49249 ± 549.872
2025-09-16 15:44:50,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [869.26276, 998.51294, 704.2603, 2322.6204, 392.51382, 747.9988, 1088.5165, 721.6905, 392.7352, 1536.8135]
2025-09-16 15:44:50,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [165.0, 173.0, 154.0, 429.0, 72.0, 137.0, 198.0, 139.0, 70.0, 290.0]
2025-09-16 15:44:50,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 14 minutes, 18 seconds)
2025-09-16 15:46:53,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:46:56,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1101.60913 ± 238.859
2025-09-16 15:46:56,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1445.2279, 1411.9971, 1323.0667, 1219.1743, 1211.8884, 838.2981, 1030.7448, 895.09235, 874.2586, 766.3432]
2025-09-16 15:46:56,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [272.0, 263.0, 258.0, 220.0, 221.0, 160.0, 188.0, 164.0, 172.0, 138.0]
2025-09-16 15:46:56,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 12 minutes, 15 seconds)
2025-09-16 15:48:56,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:49:00,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1223.19519 ± 500.745
2025-09-16 15:49:00,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [350.25723, 1485.8751, 1811.5586, 461.8323, 1056.8687, 1478.9812, 1696.1824, 1786.7692, 882.37585, 1221.2513]
2025-09-16 15:49:00,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 282.0, 331.0, 104.0, 194.0, 281.0, 351.0, 320.0, 163.0, 221.0]
2025-09-16 15:49:00,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 10 minutes, 16 seconds)
2025-09-16 15:50:54,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:50:58,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1337.42871 ± 637.639
2025-09-16 15:50:58,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1425.8682, 1552.7312, 2504.2493, 539.08575, 836.1884, 857.8043, 2161.69, 851.3114, 1877.3903, 767.9674]
2025-09-16 15:50:58,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [277.0, 285.0, 479.0, 95.0, 173.0, 166.0, 405.0, 178.0, 325.0, 136.0]
2025-09-16 15:50:58,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (1337.43) for latency 15
2025-09-16 15:50:58,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes, 9 seconds)
2025-09-16 15:52:56,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:53:00,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1364.47290 ± 391.718
2025-09-16 15:53:00,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1133.6289, 2260.2178, 1713.8722, 1050.432, 1212.3683, 1374.2389, 1215.2538, 1009.6822, 1715.1754, 959.8593]
2025-09-16 15:53:00,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [210.0, 414.0, 316.0, 207.0, 233.0, 281.0, 224.0, 184.0, 297.0, 175.0]
2025-09-16 15:53:00,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (1364.47) for latency 15
2025-09-16 15:53:00,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 7 seconds)
2025-09-16 15:54:58,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:55:01,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1151.12476 ± 530.002
2025-09-16 15:55:01,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1326.5577, 1120.1389, 2332.982, 140.35689, 665.50684, 1239.5062, 918.4007, 1198.6365, 1392.2202, 1176.9412]
2025-09-16 15:55:01,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [264.0, 213.0, 460.0, 27.0, 147.0, 218.0, 177.0, 230.0, 277.0, 219.0]
2025-09-16 15:55:01,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 4 seconds)
2025-09-16 15:57:02,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:57:06,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1253.25537 ± 456.115
2025-09-16 15:57:06,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [643.5611, 1514.4504, 1413.3204, 1194.823, 714.4206, 1441.3999, 1309.0925, 2288.8608, 796.39856, 1216.2257]
2025-09-16 15:57:06,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [134.0, 277.0, 254.0, 241.0, 136.0, 284.0, 232.0, 427.0, 138.0, 252.0]
2025-09-16 15:57:06,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 1 second)
2025-09-16 15:59:03,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:59:06,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1249.13269 ± 673.330
2025-09-16 15:59:06,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [917.1842, 1421.0105, 626.3476, 133.89786, 1798.6404, 756.54297, 1163.3861, 2431.979, 1104.2113, 2138.1267]
2025-09-16 15:59:06,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [180.0, 280.0, 131.0, 26.0, 341.0, 141.0, 232.0, 443.0, 198.0, 384.0]
2025-09-16 15:59:06,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1251 [DEBUG]: Training session finished
