2025-09-16 13:37:00,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.075-delay_18
2025-09-16 13:37:00,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.075-delay_18
2025-09-16 13:37:00,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'18': <latency_env.delayed_mdp.ConstantDelay object at 0x14ea64e9ca10>}
2025-09-16 13:37:00,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 13:37:00,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 13:37:00,758 baseline-bpql-noisepromille75-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=682, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 13:37:00,758 baseline-bpql-noisepromille75-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 13:37:02,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 13:37:02,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 13:38:49,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:38:50,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 334.11539 ± 33.541
2025-09-16 13:38:50,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [278.0266, 348.33542, 329.5128, 307.7554, 346.19412, 372.1934, 340.84796, 386.81085, 348.0818, 283.39523]
2025-09-16 13:38:50,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [53.0, 65.0, 64.0, 66.0, 67.0, 72.0, 66.0, 77.0, 66.0, 55.0]
2025-09-16 13:38:50,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (334.12) for latency 18
2025-09-16 13:38:50,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 58 minutes)
2025-09-16 13:40:46,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:40:47,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 341.69263 ± 100.952
2025-09-16 13:40:47,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [326.16605, 230.72203, 367.56284, 300.53763, 422.59232, 509.12152, 330.04037, 437.19284, 134.61081, 358.37988]
2025-09-16 13:40:47,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [60.0, 45.0, 71.0, 57.0, 79.0, 109.0, 61.0, 84.0, 26.0, 71.0]
2025-09-16 13:40:47,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (341.69) for latency 18
2025-09-16 13:40:47,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 3 minutes, 24 seconds)
2025-09-16 13:42:43,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:42:44,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 332.13766 ± 148.708
2025-09-16 13:42:44,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [538.774, 588.20087, 322.56577, 411.5336, 435.6268, 221.69473, 180.58673, 298.72427, 134.63084, 189.0391]
2025-09-16 13:42:44,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 120.0, 67.0, 79.0, 88.0, 44.0, 35.0, 65.0, 26.0, 37.0]
2025-09-16 13:42:44,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 3 minutes, 59 seconds)
2025-09-16 13:44:40,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:44:41,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 390.19012 ± 169.790
2025-09-16 13:44:41,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [473.2551, 584.39685, 422.42496, 407.33456, 688.4301, 335.46786, 113.87341, 394.36475, 119.48481, 362.86868]
2025-09-16 13:44:41,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 111.0, 83.0, 75.0, 133.0, 66.0, 22.0, 79.0, 23.0, 68.0]
2025-09-16 13:44:41,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (390.19) for latency 18
2025-09-16 13:44:41,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 3 minutes, 28 seconds)
2025-09-16 13:46:38,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:46:40,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 351.02155 ± 78.367
2025-09-16 13:46:40,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [456.2142, 369.67206, 162.0741, 350.97424, 304.63184, 340.13263, 411.402, 318.1576, 361.07092, 435.88602]
2025-09-16 13:46:40,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [96.0, 75.0, 31.0, 74.0, 63.0, 72.0, 85.0, 65.0, 77.0, 84.0]
2025-09-16 13:46:40,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 2 minutes, 45 seconds)
2025-09-16 13:48:36,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:48:37,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 295.14713 ± 95.839
2025-09-16 13:48:37,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [309.21283, 338.3139, 381.2032, 124.73578, 145.59227, 239.62247, 431.1964, 391.5878, 288.50696, 301.49954]
2025-09-16 13:48:37,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [58.0, 63.0, 74.0, 24.0, 28.0, 47.0, 80.0, 73.0, 54.0, 60.0]
2025-09-16 13:48:37,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 3 minutes, 55 seconds)
2025-09-16 13:50:35,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:50:36,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 332.58234 ± 30.182
2025-09-16 13:50:36,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [314.87396, 303.71173, 313.99768, 389.8648, 335.967, 310.51422, 347.58594, 337.34735, 293.75174, 378.2088]
2025-09-16 13:50:36,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [59.0, 58.0, 60.0, 77.0, 63.0, 58.0, 65.0, 65.0, 56.0, 74.0]
2025-09-16 13:50:36,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 2 minutes, 28 seconds)
2025-09-16 13:52:33,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:52:34,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 443.13037 ± 72.267
2025-09-16 13:52:34,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [424.01508, 543.9282, 352.4538, 389.14624, 455.5374, 501.49338, 391.37955, 423.77475, 370.03766, 579.5378]
2025-09-16 13:52:34,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [95.0, 111.0, 66.0, 75.0, 99.0, 95.0, 74.0, 86.0, 81.0, 116.0]
2025-09-16 13:52:34,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (443.13) for latency 18
2025-09-16 13:52:34,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 1 minute, 4 seconds)
2025-09-16 13:54:31,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:54:32,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 393.56366 ± 168.174
2025-09-16 13:54:32,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [518.464, 551.17096, 164.73892, 623.64014, 540.3597, 199.38512, 384.49686, 387.098, 119.85627, 446.427]
2025-09-16 13:54:32,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [110.0, 122.0, 32.0, 128.0, 102.0, 38.0, 75.0, 81.0, 23.0, 88.0]
2025-09-16 13:54:32,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 59 minutes, 19 seconds)
2025-09-16 13:56:30,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:56:31,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 328.31320 ± 137.472
2025-09-16 13:56:31,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [318.92752, 388.8057, 339.18646, 431.25378, 150.946, 511.70285, 394.44504, 118.23255, 488.82764, 140.80449]
2025-09-16 13:56:31,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [67.0, 72.0, 65.0, 80.0, 29.0, 95.0, 73.0, 23.0, 90.0, 27.0]
2025-09-16 13:56:31,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 57 minutes, 25 seconds)
2025-09-16 13:58:28,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:58:29,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 365.41553 ± 165.716
2025-09-16 13:58:29,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [138.49626, 351.12903, 178.15839, 364.27307, 412.77518, 513.53156, 128.28725, 393.8649, 521.26184, 652.378]
2025-09-16 13:58:29,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 65.0, 34.0, 72.0, 80.0, 94.0, 25.0, 74.0, 98.0, 120.0]
2025-09-16 13:58:29,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 55 minutes, 32 seconds)
2025-09-16 14:00:26,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:00:27,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 396.26639 ± 132.699
2025-09-16 14:00:27,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [305.9667, 124.634384, 333.14935, 578.0703, 416.2522, 548.0326, 354.31985, 551.8282, 433.67914, 316.73105]
2025-09-16 14:00:27,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [59.0, 24.0, 71.0, 124.0, 90.0, 103.0, 77.0, 105.0, 82.0, 61.0]
2025-09-16 14:00:27,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 53 minutes, 29 seconds)
2025-09-16 14:02:25,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:02:26,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 419.34906 ± 155.113
2025-09-16 14:02:26,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [433.14822, 527.2483, 436.442, 422.2324, 574.5839, 561.5787, 458.68408, 124.678314, 526.03076, 128.86433]
2025-09-16 14:02:26,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 98.0, 82.0, 86.0, 123.0, 107.0, 87.0, 24.0, 111.0, 25.0]
2025-09-16 14:02:26,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 51 minutes, 33 seconds)
2025-09-16 14:04:23,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:04:24,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 354.66022 ± 108.922
2025-09-16 14:04:24,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [439.9886, 340.3991, 128.84422, 464.26837, 366.12338, 430.8208, 173.03448, 374.07397, 385.49875, 443.55045]
2025-09-16 14:04:24,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 62.0, 25.0, 86.0, 73.0, 85.0, 33.0, 71.0, 73.0, 93.0]
2025-09-16 14:04:24,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 49 minutes, 29 seconds)
2025-09-16 14:06:21,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:06:22,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 433.77838 ± 122.467
2025-09-16 14:06:22,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [484.05118, 383.15628, 359.96292, 141.14525, 645.48114, 463.98376, 409.68753, 475.56415, 485.47955, 489.27188]
2025-09-16 14:06:22,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [91.0, 83.0, 66.0, 27.0, 122.0, 89.0, 75.0, 86.0, 90.0, 92.0]
2025-09-16 14:06:22,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 47 minutes, 25 seconds)
2025-09-16 14:08:20,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:08:21,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 382.75665 ± 156.859
2025-09-16 14:08:21,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [306.13675, 429.9929, 474.62213, 344.07147, 614.62054, 237.44496, 533.7961, 562.9319, 148.40166, 175.54808]
2025-09-16 14:08:21,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [57.0, 81.0, 89.0, 66.0, 113.0, 45.0, 100.0, 120.0, 29.0, 34.0]
2025-09-16 14:08:21,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 45 minutes, 44 seconds)
2025-09-16 14:10:18,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:10:19,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 434.58658 ± 120.631
2025-09-16 14:10:19,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [459.55, 480.02597, 404.4287, 363.54306, 518.04895, 623.0621, 134.8346, 415.75534, 497.59628, 449.0209]
2025-09-16 14:10:19,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 95.0, 76.0, 68.0, 108.0, 119.0, 26.0, 77.0, 91.0, 82.0]
2025-09-16 14:10:19,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 43 minutes, 54 seconds)
2025-09-16 14:12:16,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:12:17,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 390.61044 ± 157.942
2025-09-16 14:12:17,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [576.0834, 431.20074, 139.8901, 162.73108, 384.76407, 677.4931, 378.60355, 483.87735, 309.10455, 362.35678]
2025-09-16 14:12:17,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [106.0, 79.0, 27.0, 31.0, 74.0, 141.0, 71.0, 90.0, 58.0, 69.0]
2025-09-16 14:12:17,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 41 minutes, 39 seconds)
2025-09-16 14:14:15,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:14:16,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 436.78287 ± 166.328
2025-09-16 14:14:16,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [447.06714, 628.9776, 477.69714, 113.72033, 458.2619, 470.81998, 562.81696, 129.43874, 555.16675, 523.862]
2025-09-16 14:14:16,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 130.0, 91.0, 22.0, 84.0, 87.0, 107.0, 25.0, 105.0, 107.0]
2025-09-16 14:14:16,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 40 minutes)
2025-09-16 14:16:14,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:16:15,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 342.43185 ± 136.572
2025-09-16 14:16:15,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [383.13278, 145.83266, 422.7994, 372.1681, 114.45596, 459.40833, 470.1092, 470.0446, 426.9263, 159.44124]
2025-09-16 14:16:15,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [70.0, 28.0, 81.0, 69.0, 22.0, 84.0, 83.0, 103.0, 79.0, 31.0]
2025-09-16 14:16:15,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 38 minutes, 7 seconds)
2025-09-16 14:18:13,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:18:14,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 328.89108 ± 171.318
2025-09-16 14:18:14,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [453.82773, 124.31161, 146.82144, 314.88535, 155.42978, 425.2492, 441.33078, 580.606, 532.08954, 114.35946]
2025-09-16 14:18:14,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [95.0, 24.0, 28.0, 61.0, 30.0, 78.0, 80.0, 111.0, 105.0, 22.0]
2025-09-16 14:18:14,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 36 minutes, 5 seconds)
2025-09-16 14:20:10,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:20:11,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 462.61182 ± 121.348
2025-09-16 14:20:11,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [520.3455, 548.1069, 507.5435, 440.34763, 596.89734, 531.445, 532.7818, 416.05307, 382.82346, 149.77403]
2025-09-16 14:20:11,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 103.0, 111.0, 81.0, 115.0, 110.0, 98.0, 76.0, 74.0, 29.0]
2025-09-16 14:20:11,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (462.61) for latency 18
2025-09-16 14:20:11,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 33 minutes, 53 seconds)
2025-09-16 14:22:09,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:22:10,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 444.50195 ± 140.529
2025-09-16 14:22:10,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [696.9013, 440.9398, 339.0185, 124.52156, 534.98157, 472.19574, 398.8592, 496.07703, 419.89032, 521.6346]
2025-09-16 14:22:10,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [130.0, 83.0, 72.0, 24.0, 103.0, 91.0, 75.0, 92.0, 81.0, 99.0]
2025-09-16 14:22:10,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 32 minutes, 9 seconds)
2025-09-16 14:24:08,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:24:09,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 397.46844 ± 140.868
2025-09-16 14:24:09,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [410.33676, 549.627, 456.84714, 494.06345, 107.93025, 397.37802, 169.50117, 550.7, 387.19827, 451.1026]
2025-09-16 14:24:09,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 103.0, 85.0, 92.0, 21.0, 75.0, 33.0, 121.0, 82.0, 83.0]
2025-09-16 14:24:09,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 30 minutes, 12 seconds)
2025-09-16 14:26:07,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:26:08,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 365.28564 ± 127.039
2025-09-16 14:26:08,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [523.6963, 369.54428, 399.09836, 428.19, 113.92918, 140.04488, 411.84006, 427.18466, 471.382, 367.9466]
2025-09-16 14:26:08,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [113.0, 70.0, 73.0, 80.0, 22.0, 27.0, 80.0, 83.0, 100.0, 69.0]
2025-09-16 14:26:08,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 28 minutes, 15 seconds)
2025-09-16 14:28:05,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:28:06,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 512.03412 ± 154.120
2025-09-16 14:28:06,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [402.0086, 523.1816, 506.4777, 475.74377, 145.48279, 658.345, 451.0405, 653.418, 601.7489, 702.8944]
2025-09-16 14:28:06,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 107.0, 102.0, 91.0, 28.0, 123.0, 86.0, 127.0, 119.0, 132.0]
2025-09-16 14:28:06,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (512.03) for latency 18
2025-09-16 14:28:06,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 26 minutes, 9 seconds)
2025-09-16 14:30:03,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:30:04,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 342.82343 ± 131.712
2025-09-16 14:30:04,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [432.4547, 161.38788, 357.26456, 515.9496, 395.33, 185.14781, 431.35086, 113.09619, 380.87918, 455.37387]
2025-09-16 14:30:04,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 31.0, 77.0, 97.0, 72.0, 36.0, 79.0, 22.0, 71.0, 83.0]
2025-09-16 14:30:04,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 24 minutes, 15 seconds)
2025-09-16 14:32:02,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:32:04,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 528.02960 ± 104.022
2025-09-16 14:32:04,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [683.51794, 405.36542, 600.2984, 678.77124, 456.1751, 627.9505, 396.253, 459.0523, 474.00375, 498.90808]
2025-09-16 14:32:04,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [140.0, 78.0, 124.0, 127.0, 83.0, 115.0, 77.0, 97.0, 88.0, 95.0]
2025-09-16 14:32:04,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (528.03) for latency 18
2025-09-16 14:32:04,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 22 minutes, 27 seconds)
2025-09-16 14:34:01,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:34:03,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 493.92920 ± 158.368
2025-09-16 14:34:03,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [640.47955, 162.12427, 408.9892, 659.5424, 512.7402, 527.53033, 363.4893, 539.7943, 397.01498, 727.5872]
2025-09-16 14:34:03,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [123.0, 31.0, 75.0, 123.0, 111.0, 98.0, 70.0, 101.0, 74.0, 138.0]
2025-09-16 14:34:03,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 20 minutes, 30 seconds)
2025-09-16 14:36:01,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:36:02,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 414.34000 ± 197.623
2025-09-16 14:36:02,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [516.29443, 119.717354, 523.6667, 124.74298, 625.0783, 480.54706, 114.584015, 551.4083, 601.30194, 486.05872]
2025-09-16 14:36:02,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 23.0, 95.0, 24.0, 120.0, 87.0, 22.0, 104.0, 113.0, 94.0]
2025-09-16 14:36:02,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 18 minutes, 35 seconds)
2025-09-16 14:37:59,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:38:00,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 402.02835 ± 150.727
2025-09-16 14:38:00,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [493.94772, 459.50104, 522.3809, 342.34775, 449.0599, 124.851364, 113.81913, 553.7614, 481.87726, 478.73715]
2025-09-16 14:38:00,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 86.0, 97.0, 63.0, 85.0, 24.0, 22.0, 105.0, 87.0, 87.0]
2025-09-16 14:38:00,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 16 minutes, 33 seconds)
2025-09-16 14:39:58,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:39:59,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 517.61560 ± 204.058
2025-09-16 14:39:59,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [527.979, 733.0926, 950.0215, 549.56647, 518.205, 404.95236, 517.5784, 440.50482, 404.49127, 129.76445]
2025-09-16 14:39:59,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 145.0, 175.0, 100.0, 94.0, 74.0, 110.0, 81.0, 83.0, 25.0]
2025-09-16 14:39:59,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 14 minutes, 52 seconds)
2025-09-16 14:41:56,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:41:57,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 396.45401 ± 133.576
2025-09-16 14:41:57,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [451.23645, 382.10815, 166.08629, 461.0156, 388.77795, 549.43604, 430.83673, 393.57645, 156.2522, 585.21423]
2025-09-16 14:41:57,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 81.0, 32.0, 87.0, 71.0, 110.0, 79.0, 77.0, 30.0, 107.0]
2025-09-16 14:41:57,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 12 minutes, 35 seconds)
2025-09-16 14:43:55,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:43:57,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 433.20245 ± 198.482
2025-09-16 14:43:57,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [183.21895, 595.5623, 636.9107, 133.78653, 642.89343, 457.56555, 400.71683, 140.93037, 518.4479, 621.9919]
2025-09-16 14:43:57,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 119.0, 126.0, 26.0, 120.0, 84.0, 73.0, 27.0, 102.0, 121.0]
2025-09-16 14:43:57,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 10 minutes, 38 seconds)
2025-09-16 14:45:54,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:45:56,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 394.51306 ± 182.417
2025-09-16 14:45:56,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [485.56366, 671.0699, 494.47476, 130.0486, 136.5009, 145.0198, 421.70563, 394.7954, 541.65857, 524.2933]
2025-09-16 14:45:56,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 138.0, 108.0, 25.0, 26.0, 28.0, 87.0, 73.0, 101.0, 96.0]
2025-09-16 14:45:56,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 8 minutes, 38 seconds)
2025-09-16 14:47:52,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:47:54,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 514.48334 ± 59.120
2025-09-16 14:47:54,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [533.3884, 493.08588, 464.93808, 600.0943, 489.55255, 646.23566, 461.29776, 511.33365, 472.7591, 472.14728]
2025-09-16 14:47:54,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [95.0, 106.0, 88.0, 110.0, 88.0, 117.0, 84.0, 99.0, 92.0, 86.0]
2025-09-16 14:47:54,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 6 minutes, 42 seconds)
2025-09-16 14:49:52,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:49:53,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 438.61237 ± 189.658
2025-09-16 14:49:53,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [128.98933, 715.23706, 113.96085, 528.5717, 455.61697, 642.5881, 349.17648, 384.37607, 495.96423, 571.6424]
2025-09-16 14:49:53,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 132.0, 22.0, 98.0, 82.0, 120.0, 69.0, 82.0, 101.0, 107.0]
2025-09-16 14:49:53,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 4 minutes, 41 seconds)
2025-09-16 14:51:50,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:51:52,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 534.55499 ± 52.122
2025-09-16 14:51:52,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [560.8187, 486.0838, 502.78482, 505.21704, 604.5477, 521.2732, 616.53326, 588.37244, 451.637, 508.28198]
2025-09-16 14:51:52,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [102.0, 89.0, 92.0, 98.0, 115.0, 96.0, 115.0, 109.0, 84.0, 107.0]
2025-09-16 14:51:52,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (534.55) for latency 18
2025-09-16 14:51:52,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 2 minutes, 50 seconds)
2025-09-16 14:53:50,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:53:51,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 443.41440 ± 192.651
2025-09-16 14:53:51,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [135.08682, 525.4808, 596.0053, 349.20334, 344.87878, 699.7085, 556.66846, 628.08844, 491.139, 107.8844]
2025-09-16 14:53:51,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 100.0, 108.0, 63.0, 66.0, 139.0, 101.0, 119.0, 89.0, 21.0]
2025-09-16 14:53:51,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 52 seconds)
2025-09-16 14:55:49,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:55:51,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 493.83829 ± 175.592
2025-09-16 14:55:51,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [621.59296, 385.23434, 483.47614, 167.06393, 552.84045, 465.71863, 401.8314, 579.2687, 875.3721, 405.98422]
2025-09-16 14:55:51,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [131.0, 79.0, 88.0, 32.0, 102.0, 86.0, 74.0, 106.0, 165.0, 85.0]
2025-09-16 14:55:51,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 58 minutes, 59 seconds)
2025-09-16 14:57:48,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:57:50,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 456.99561 ± 128.663
2025-09-16 14:57:50,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [511.73715, 130.1855, 437.9959, 532.85767, 555.5302, 319.94263, 534.4849, 487.34512, 572.1553, 487.72174]
2025-09-16 14:57:50,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [96.0, 25.0, 86.0, 100.0, 105.0, 60.0, 98.0, 94.0, 121.0, 90.0]
2025-09-16 14:57:50,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 57 minutes, 11 seconds)
2025-09-16 14:59:47,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:59:48,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 513.76575 ± 96.525
2025-09-16 14:59:48,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [552.4655, 424.83984, 458.82635, 698.54803, 374.53143, 383.4317, 569.69763, 577.04724, 531.88983, 566.3801]
2025-09-16 14:59:48,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [121.0, 80.0, 99.0, 130.0, 69.0, 72.0, 114.0, 123.0, 96.0, 105.0]
2025-09-16 14:59:48,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 55 minutes, 5 seconds)
2025-09-16 15:01:46,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:01:47,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 448.30609 ± 239.837
2025-09-16 15:01:47,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [140.31534, 618.061, 145.71376, 480.38245, 308.75067, 858.32635, 170.50166, 586.2208, 452.35635, 722.4326]
2025-09-16 15:01:47,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 124.0, 28.0, 90.0, 58.0, 179.0, 33.0, 106.0, 87.0, 133.0]
2025-09-16 15:01:47,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 53 minutes, 9 seconds)
2025-09-16 15:03:46,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:03:47,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 421.02188 ± 235.636
2025-09-16 15:03:47,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [123.71757, 129.61748, 463.1942, 123.95587, 451.91898, 612.9912, 471.18146, 520.0184, 401.2804, 912.3431]
2025-09-16 15:03:47,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 25.0, 98.0, 24.0, 83.0, 118.0, 86.0, 97.0, 74.0, 179.0]
2025-09-16 15:03:47,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 51 minutes, 12 seconds)
2025-09-16 15:05:44,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:05:45,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 510.78897 ± 248.021
2025-09-16 15:05:45,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [440.60968, 123.71242, 740.84644, 130.85239, 512.71155, 893.6492, 670.6407, 458.21167, 366.35016, 770.3055]
2025-09-16 15:05:45,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 24.0, 136.0, 25.0, 98.0, 168.0, 126.0, 84.0, 67.0, 159.0]
2025-09-16 15:05:45,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 49 minutes, 1 second)
2025-09-16 15:07:43,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:07:45,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 550.17615 ± 157.268
2025-09-16 15:07:45,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [394.29315, 637.01825, 488.0413, 626.3096, 601.8548, 903.724, 448.69266, 445.57843, 630.6761, 325.573]
2025-09-16 15:07:45,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [72.0, 141.0, 107.0, 114.0, 112.0, 188.0, 82.0, 80.0, 135.0, 63.0]
2025-09-16 15:07:45,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (550.18) for latency 18
2025-09-16 15:07:45,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 47 minutes, 4 seconds)
2025-09-16 15:09:43,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:09:44,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 438.07285 ± 181.310
2025-09-16 15:09:44,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [135.96443, 483.76202, 382.7151, 500.1012, 144.12144, 650.5084, 400.811, 525.06934, 423.36993, 734.3056]
2025-09-16 15:09:44,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 94.0, 72.0, 94.0, 28.0, 128.0, 74.0, 100.0, 83.0, 159.0]
2025-09-16 15:09:44,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 45 minutes, 17 seconds)
2025-09-16 15:11:43,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:11:44,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 510.33472 ± 153.252
2025-09-16 15:11:44,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [560.4926, 454.56552, 150.7955, 479.46844, 711.6282, 395.58362, 522.73, 698.73975, 521.2445, 608.0992]
2025-09-16 15:11:44,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [106.0, 84.0, 29.0, 86.0, 141.0, 73.0, 96.0, 139.0, 93.0, 113.0]
2025-09-16 15:11:44,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 43 minutes, 30 seconds)
2025-09-16 15:13:42,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:13:43,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 476.35727 ± 234.428
2025-09-16 15:13:43,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [595.02026, 123.35275, 497.7899, 607.18933, 129.3881, 659.5177, 769.15265, 171.83617, 722.8825, 487.4434]
2025-09-16 15:13:43,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [123.0, 24.0, 94.0, 118.0, 25.0, 125.0, 143.0, 33.0, 144.0, 95.0]
2025-09-16 15:13:43,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 41 minutes, 20 seconds)
2025-09-16 15:15:40,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:15:42,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 475.94092 ± 197.097
2025-09-16 15:15:42,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [408.70923, 144.58597, 357.99976, 544.99164, 547.19464, 719.83624, 749.83057, 535.41144, 590.0845, 160.7652]
2025-09-16 15:15:42,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 28.0, 77.0, 103.0, 99.0, 136.0, 150.0, 100.0, 125.0, 31.0]
2025-09-16 15:15:42,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 39 minutes, 26 seconds)
2025-09-16 15:17:39,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:17:41,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 540.31726 ± 236.606
2025-09-16 15:17:41,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [139.0031, 485.716, 114.18681, 775.85443, 577.97534, 485.51443, 618.53033, 602.2696, 741.5383, 862.58435]
2025-09-16 15:17:41,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 102.0, 22.0, 147.0, 109.0, 89.0, 116.0, 131.0, 141.0, 164.0]
2025-09-16 15:17:41,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 37 minutes, 22 seconds)
2025-09-16 15:19:39,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:19:40,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 476.54120 ± 99.573
2025-09-16 15:19:40,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [351.4217, 392.64395, 437.062, 519.13947, 354.96918, 650.60187, 596.7282, 431.9256, 580.8052, 450.11508]
2025-09-16 15:19:40,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 73.0, 80.0, 96.0, 75.0, 118.0, 110.0, 93.0, 107.0, 92.0]
2025-09-16 15:19:41,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 35 minutes, 22 seconds)
2025-09-16 15:21:38,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:21:39,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 469.54956 ± 148.203
2025-09-16 15:21:39,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [746.41943, 342.7448, 406.7027, 485.9038, 156.98041, 535.42084, 450.67725, 456.44406, 606.0408, 508.16165]
2025-09-16 15:21:39,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [139.0, 73.0, 78.0, 105.0, 30.0, 99.0, 102.0, 82.0, 112.0, 96.0]
2025-09-16 15:21:39,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 33 minutes, 12 seconds)
2025-09-16 15:23:37,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:23:38,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 564.18970 ± 199.879
2025-09-16 15:23:38,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [134.8138, 527.78864, 516.061, 469.34836, 799.06244, 627.5959, 578.3108, 451.9436, 615.0944, 921.8783]
2025-09-16 15:23:38,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 111.0, 97.0, 100.0, 146.0, 114.0, 108.0, 86.0, 122.0, 174.0]
2025-09-16 15:23:38,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (564.19) for latency 18
2025-09-16 15:23:38,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 31 minutes, 18 seconds)
2025-09-16 15:25:38,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:25:39,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 512.99457 ± 181.724
2025-09-16 15:25:39,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [823.3675, 140.0615, 450.03787, 403.34564, 549.8446, 389.64532, 715.501, 468.7618, 649.23987, 540.1407]
2025-09-16 15:25:39,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [167.0, 27.0, 100.0, 73.0, 101.0, 71.0, 134.0, 86.0, 126.0, 99.0]
2025-09-16 15:25:39,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 29 minutes, 36 seconds)
2025-09-16 15:27:36,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:27:38,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 499.25677 ± 60.777
2025-09-16 15:27:38,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [399.05902, 622.05206, 523.661, 496.24802, 530.2221, 527.86395, 438.0824, 471.24203, 541.09686, 443.0403]
2025-09-16 15:27:38,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 113.0, 95.0, 108.0, 104.0, 97.0, 81.0, 85.0, 99.0, 97.0]
2025-09-16 15:27:38,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 27 minutes, 33 seconds)
2025-09-16 15:29:36,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:29:37,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 498.57339 ± 141.950
2025-09-16 15:29:37,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [450.47827, 558.57745, 511.64102, 651.5838, 572.1523, 500.28824, 118.92157, 482.91995, 491.74182, 647.4298]
2025-09-16 15:29:37,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 111.0, 94.0, 125.0, 103.0, 100.0, 23.0, 90.0, 90.0, 132.0]
2025-09-16 15:29:37,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 25 minutes, 31 seconds)
2025-09-16 15:31:36,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:31:37,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 495.90308 ± 141.241
2025-09-16 15:31:37,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [391.901, 706.57086, 532.5012, 512.0801, 564.20807, 497.40762, 516.35345, 476.06082, 616.33936, 145.60838]
2025-09-16 15:31:37,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 142.0, 100.0, 95.0, 104.0, 96.0, 105.0, 89.0, 114.0, 28.0]
2025-09-16 15:31:37,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 23 minutes, 40 seconds)
2025-09-16 15:33:34,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:33:35,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 614.23865 ± 194.118
2025-09-16 15:33:35,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [610.679, 486.32706, 1092.7562, 685.5711, 498.0493, 421.9845, 760.5616, 470.63818, 440.5714, 675.24786]
2025-09-16 15:33:35,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [112.0, 91.0, 205.0, 124.0, 92.0, 77.0, 141.0, 87.0, 88.0, 123.0]
2025-09-16 15:33:35,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (614.24) for latency 18
2025-09-16 15:33:35,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 21 minutes, 34 seconds)
2025-09-16 15:35:34,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:35:36,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 541.20831 ± 221.426
2025-09-16 15:35:36,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [458.08575, 1042.8855, 155.53137, 576.0788, 689.7079, 684.4054, 489.16742, 457.42484, 476.59366, 382.2019]
2025-09-16 15:35:36,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 198.0, 30.0, 106.0, 131.0, 133.0, 110.0, 99.0, 86.0, 70.0]
2025-09-16 15:35:36,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 19 minutes, 35 seconds)
2025-09-16 15:37:34,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:37:35,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 617.17450 ± 272.497
2025-09-16 15:37:35,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [849.6925, 393.0686, 688.8792, 363.82147, 845.58453, 523.2554, 119.70525, 1123.8356, 591.22266, 672.6796]
2025-09-16 15:37:35,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [159.0, 83.0, 140.0, 70.0, 162.0, 98.0, 23.0, 218.0, 121.0, 133.0]
2025-09-16 15:37:35,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (617.17) for latency 18
2025-09-16 15:37:35,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 17 minutes, 41 seconds)
2025-09-16 15:39:33,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:39:34,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 443.52017 ± 221.714
2025-09-16 15:39:34,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [704.9617, 626.16364, 130.6097, 569.69434, 581.2651, 124.64179, 468.7113, 108.31534, 647.1503, 473.68832]
2025-09-16 15:39:34,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [136.0, 113.0, 25.0, 109.0, 107.0, 24.0, 87.0, 21.0, 128.0, 99.0]
2025-09-16 15:39:34,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 15 minutes, 37 seconds)
2025-09-16 15:41:32,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:41:34,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 554.49664 ± 446.945
2025-09-16 15:41:34,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [525.12506, 108.59727, 150.41959, 671.27234, 581.2473, 1409.858, 144.77827, 1292.3951, 525.6321, 135.64111]
2025-09-16 15:41:34,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [113.0, 21.0, 29.0, 126.0, 106.0, 265.0, 28.0, 246.0, 102.0, 26.0]
2025-09-16 15:41:34,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 13 minutes, 36 seconds)
2025-09-16 15:43:33,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:43:34,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 571.70093 ± 141.456
2025-09-16 15:43:34,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [758.4025, 793.4124, 592.17224, 511.79672, 466.9118, 391.3465, 435.06186, 751.296, 577.6045, 439.00418]
2025-09-16 15:43:34,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [159.0, 143.0, 110.0, 92.0, 103.0, 85.0, 80.0, 151.0, 105.0, 80.0]
2025-09-16 15:43:34,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 11 minutes, 53 seconds)
2025-09-16 15:45:31,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:45:33,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 580.08496 ± 184.052
2025-09-16 15:45:33,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [641.58307, 457.66623, 576.63763, 843.01385, 150.13466, 501.48334, 584.5291, 560.5013, 799.86786, 685.4324]
2025-09-16 15:45:33,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [127.0, 83.0, 109.0, 165.0, 29.0, 92.0, 106.0, 102.0, 154.0, 144.0]
2025-09-16 15:45:33,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 9 minutes, 38 seconds)
2025-09-16 15:47:33,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:47:35,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 575.35083 ± 286.374
2025-09-16 15:47:35,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [685.16034, 535.8115, 611.7286, 875.26294, 584.37787, 109.24817, 462.52997, 666.4496, 124.52131, 1098.418]
2025-09-16 15:47:35,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [129.0, 99.0, 126.0, 169.0, 109.0, 21.0, 85.0, 124.0, 24.0, 220.0]
2025-09-16 15:47:35,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 7 minutes, 57 seconds)
2025-09-16 15:49:30,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:49:32,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 419.44342 ± 202.448
2025-09-16 15:49:32,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [459.4521, 515.3159, 141.36528, 145.10588, 598.2695, 379.17026, 577.0106, 742.0645, 140.13382, 496.5465]
2025-09-16 15:49:32,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 94.0, 27.0, 28.0, 111.0, 83.0, 104.0, 155.0, 27.0, 92.0]
2025-09-16 15:49:32,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 5 minutes, 43 seconds)
2025-09-16 15:51:30,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:51:32,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 609.14838 ± 227.457
2025-09-16 15:51:32,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [523.6937, 908.32684, 156.69647, 568.4105, 876.70044, 797.0625, 424.71533, 827.4448, 461.78763, 546.6456]
2025-09-16 15:51:32,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 183.0, 30.0, 116.0, 155.0, 158.0, 78.0, 155.0, 101.0, 101.0]
2025-09-16 15:51:32,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 3 minutes, 48 seconds)
2025-09-16 15:53:30,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:53:31,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 495.93488 ± 239.197
2025-09-16 15:53:31,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [542.448, 639.1443, 997.3757, 411.9904, 473.65494, 519.29706, 559.727, 572.2331, 124.28035, 119.19795]
2025-09-16 15:53:31,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 116.0, 186.0, 80.0, 93.0, 95.0, 104.0, 110.0, 24.0, 23.0]
2025-09-16 15:53:32,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 1 minute, 42 seconds)
2025-09-16 15:55:30,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:55:32,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 618.80963 ± 178.024
2025-09-16 15:55:32,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [469.10892, 744.27966, 575.70776, 617.8192, 553.20306, 943.36, 438.26953, 659.54877, 850.0423, 336.7568]
2025-09-16 15:55:32,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 144.0, 105.0, 119.0, 101.0, 180.0, 82.0, 124.0, 161.0, 64.0]
2025-09-16 15:55:32,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (618.81) for latency 18
2025-09-16 15:55:32,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 59 minutes, 52 seconds)
2025-09-16 15:57:30,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:57:31,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 625.25793 ± 261.465
2025-09-16 15:57:31,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [508.58273, 707.5576, 627.8517, 125.36575, 411.40903, 896.16693, 435.595, 859.343, 1075.8658, 604.8418]
2025-09-16 15:57:31,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [107.0, 155.0, 113.0, 24.0, 76.0, 175.0, 79.0, 155.0, 204.0, 114.0]
2025-09-16 15:57:31,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (625.26) for latency 18
2025-09-16 15:57:31,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 57 minutes, 39 seconds)
2025-09-16 15:59:30,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:59:31,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 590.98352 ± 248.435
2025-09-16 15:59:31,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [483.3034, 766.5309, 125.08096, 723.4636, 130.4929, 766.47797, 691.7849, 762.3655, 827.7472, 632.5883]
2025-09-16 15:59:31,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 143.0, 24.0, 145.0, 25.0, 144.0, 141.0, 147.0, 159.0, 120.0]
2025-09-16 15:59:31,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 55 minutes, 58 seconds)
2025-09-16 16:01:28,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:01:30,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 607.32526 ± 242.044
2025-09-16 16:01:30,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [702.7517, 774.32416, 754.94635, 620.83636, 767.93176, 150.27881, 124.682915, 624.838, 806.5176, 746.145]
2025-09-16 16:01:30,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [129.0, 141.0, 145.0, 121.0, 148.0, 29.0, 24.0, 129.0, 157.0, 144.0]
2025-09-16 16:01:30,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 53 minutes, 49 seconds)
2025-09-16 16:03:28,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:03:30,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 472.29160 ± 243.610
2025-09-16 16:03:30,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [856.39703, 632.37726, 537.67096, 135.21074, 579.5286, 527.3261, 509.77884, 113.47921, 690.604, 140.5428]
2025-09-16 16:03:30,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [165.0, 113.0, 95.0, 26.0, 117.0, 100.0, 91.0, 22.0, 128.0, 27.0]
2025-09-16 16:03:30,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 51 minutes, 50 seconds)
2025-09-16 16:05:29,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:05:31,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 662.08972 ± 114.630
2025-09-16 16:05:31,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [726.1905, 843.1479, 692.04913, 504.18063, 829.95654, 553.4079, 684.1037, 532.38525, 694.88104, 560.5945]
2025-09-16 16:05:31,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [142.0, 178.0, 152.0, 94.0, 157.0, 120.0, 127.0, 99.0, 130.0, 106.0]
2025-09-16 16:05:31,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (662.09) for latency 18
2025-09-16 16:05:31,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 49 minutes, 58 seconds)
2025-09-16 16:07:30,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:07:31,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 603.88336 ± 383.977
2025-09-16 16:07:31,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [485.01645, 1218.4548, 130.30327, 134.58563, 1127.9235, 828.4416, 723.4302, 459.73553, 811.5949, 119.347435]
2025-09-16 16:07:31,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 238.0, 25.0, 26.0, 230.0, 158.0, 134.0, 83.0, 171.0, 23.0]
2025-09-16 16:07:31,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 48 minutes)
2025-09-16 16:09:29,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:09:31,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 627.66534 ± 226.966
2025-09-16 16:09:31,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [599.94635, 652.533, 441.3439, 529.0559, 504.37354, 360.98547, 1237.564, 627.7978, 595.4892, 727.5642]
2025-09-16 16:09:31,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [123.0, 118.0, 78.0, 97.0, 93.0, 67.0, 241.0, 133.0, 111.0, 140.0]
2025-09-16 16:09:31,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 45 minutes, 59 seconds)
2025-09-16 16:11:30,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:11:32,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 805.86853 ± 353.404
2025-09-16 16:11:32,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [931.8328, 1434.8977, 764.5595, 722.88617, 638.5065, 1202.7736, 446.5279, 155.82059, 1081.0029, 679.8773]
2025-09-16 16:11:32,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [170.0, 279.0, 155.0, 141.0, 117.0, 237.0, 84.0, 30.0, 203.0, 145.0]
2025-09-16 16:11:32,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (805.87) for latency 18
2025-09-16 16:11:32,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 44 minutes, 9 seconds)
2025-09-16 16:13:30,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:13:32,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 657.56628 ± 154.369
2025-09-16 16:13:32,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [523.2169, 666.80334, 558.1558, 817.0443, 443.76007, 992.9563, 686.3575, 561.1674, 753.83344, 572.3672]
2025-09-16 16:13:32,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [95.0, 137.0, 102.0, 146.0, 81.0, 192.0, 124.0, 105.0, 141.0, 103.0]
2025-09-16 16:13:32,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 42 minutes, 8 seconds)
2025-09-16 16:15:31,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:15:33,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 541.49548 ± 179.588
2025-09-16 16:15:33,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [450.48282, 577.80414, 344.4854, 530.36847, 693.76556, 586.0815, 633.8323, 892.06384, 505.26547, 200.8056]
2025-09-16 16:15:33,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 110.0, 67.0, 99.0, 125.0, 104.0, 115.0, 164.0, 94.0, 39.0]
2025-09-16 16:15:33,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 40 minutes, 5 seconds)
2025-09-16 16:17:30,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:17:32,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 576.02985 ± 308.537
2025-09-16 16:17:32,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [517.2594, 821.15845, 481.9214, 437.278, 108.45369, 1094.5088, 736.1948, 953.5695, 474.86606, 135.08772]
2025-09-16 16:17:32,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 149.0, 89.0, 79.0, 21.0, 206.0, 139.0, 189.0, 93.0, 26.0]
2025-09-16 16:17:32,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 38 minutes, 1 second)
2025-09-16 16:19:30,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:19:33,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 690.70050 ± 303.599
2025-09-16 16:19:33,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [166.45856, 541.1401, 782.73846, 582.7231, 630.3339, 851.5867, 1318.4431, 444.8944, 1016.7216, 571.9647]
2025-09-16 16:19:33,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 97.0, 145.0, 108.0, 128.0, 158.0, 254.0, 99.0, 201.0, 103.0]
2025-09-16 16:19:33,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 36 minutes, 4 seconds)
2025-09-16 16:21:33,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:21:35,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 722.11108 ± 345.839
2025-09-16 16:21:35,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [914.36395, 1027.7063, 748.1977, 1306.6919, 858.5583, 879.61536, 165.73392, 507.6639, 656.1187, 156.46078]
2025-09-16 16:21:35,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [185.0, 198.0, 145.0, 270.0, 161.0, 171.0, 32.0, 92.0, 120.0, 30.0]
2025-09-16 16:21:35,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 34 minutes, 8 seconds)
2025-09-16 16:23:32,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:23:34,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 778.92950 ± 382.675
2025-09-16 16:23:34,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1643.7904, 757.79364, 769.5807, 1084.4598, 544.8736, 584.4276, 123.4182, 832.9318, 937.63776, 510.38147]
2025-09-16 16:23:34,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [319.0, 145.0, 158.0, 205.0, 101.0, 121.0, 24.0, 158.0, 181.0, 91.0]
2025-09-16 16:23:34,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 32 minutes, 8 seconds)
2025-09-16 16:25:33,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:25:35,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 681.77814 ± 304.090
2025-09-16 16:25:35,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [367.31778, 768.1621, 461.16327, 1023.05524, 945.72577, 515.52826, 857.0835, 626.1737, 1134.7415, 118.83014]
2025-09-16 16:25:35,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 141.0, 86.0, 187.0, 182.0, 92.0, 159.0, 121.0, 204.0, 23.0]
2025-09-16 16:25:35,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 30 minutes, 6 seconds)
2025-09-16 16:27:33,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:27:35,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 795.61627 ± 195.018
2025-09-16 16:27:35,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [975.39844, 1054.9469, 806.09393, 451.6039, 993.4746, 748.3822, 608.71564, 580.5575, 989.1325, 747.8574]
2025-09-16 16:27:35,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [177.0, 201.0, 144.0, 81.0, 192.0, 146.0, 115.0, 105.0, 191.0, 135.0]
2025-09-16 16:27:35,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 28 minutes, 9 seconds)
2025-09-16 16:29:36,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:29:38,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 757.97510 ± 398.700
2025-09-16 16:29:38,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [612.97766, 346.59338, 823.3649, 1417.523, 784.69653, 134.44907, 663.5799, 552.4583, 765.4576, 1478.6508]
2025-09-16 16:29:38,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [114.0, 66.0, 150.0, 275.0, 157.0, 26.0, 121.0, 100.0, 142.0, 284.0]
2025-09-16 16:29:38,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 26 minutes, 13 seconds)
2025-09-16 16:31:33,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:31:35,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 588.56531 ± 72.552
2025-09-16 16:31:35,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [624.3783, 535.9461, 588.0275, 553.7049, 652.7113, 584.5412, 491.0869, 566.96594, 760.536, 527.75476]
2025-09-16 16:31:35,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [130.0, 110.0, 130.0, 124.0, 136.0, 108.0, 101.0, 120.0, 147.0, 98.0]
2025-09-16 16:31:35,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 24 minutes)
2025-09-16 16:33:35,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:33:37,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 891.59003 ± 206.779
2025-09-16 16:33:37,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [658.844, 521.42126, 1020.65173, 792.54095, 1045.355, 770.588, 1063.8009, 846.7575, 1263.465, 932.4763]
2025-09-16 16:33:37,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [120.0, 98.0, 193.0, 140.0, 204.0, 146.0, 212.0, 171.0, 242.0, 173.0]
2025-09-16 16:33:37,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (891.59) for latency 18
2025-09-16 16:33:37,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 22 minutes, 6 seconds)
2025-09-16 16:35:36,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:35:38,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 619.08557 ± 455.954
2025-09-16 16:35:38,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [130.76825, 568.54706, 1667.2848, 585.91504, 1092.111, 578.40485, 118.78331, 129.876, 560.1063, 759.05865]
2025-09-16 16:35:38,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 122.0, 335.0, 104.0, 210.0, 126.0, 23.0, 25.0, 102.0, 138.0]
2025-09-16 16:35:38,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 20 minutes, 5 seconds)
2025-09-16 16:37:37,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:37:38,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 435.40363 ± 288.874
2025-09-16 16:37:38,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [652.1598, 861.11334, 460.7333, 114.610596, 794.1745, 136.2476, 725.1404, 125.217705, 364.3479, 120.290855]
2025-09-16 16:37:38,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [128.0, 158.0, 83.0, 22.0, 161.0, 26.0, 143.0, 24.0, 68.0, 23.0]
2025-09-16 16:37:38,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 18 minutes, 4 seconds)
2025-09-16 16:39:36,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:39:38,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 773.00238 ± 730.813
2025-09-16 16:39:38,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1328.2618, 129.70995, 559.4332, 422.73502, 2670.0312, 638.62775, 130.22748, 795.8979, 914.5204, 140.57892]
2025-09-16 16:39:38,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [254.0, 25.0, 109.0, 79.0, 508.0, 121.0, 25.0, 147.0, 162.0, 27.0]
2025-09-16 16:39:38,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 16 minutes)
2025-09-16 16:41:37,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:41:39,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 524.46100 ± 381.349
2025-09-16 16:41:39,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [177.86061, 685.48083, 755.67065, 129.55682, 1359.6805, 415.40472, 130.14455, 782.8199, 666.73254, 141.25914]
2025-09-16 16:41:39,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 127.0, 136.0, 25.0, 268.0, 76.0, 25.0, 143.0, 119.0, 27.0]
2025-09-16 16:41:39,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 14 minutes, 4 seconds)
2025-09-16 16:43:37,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:43:38,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 671.84119 ± 190.301
2025-09-16 16:43:38,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [631.69086, 498.7251, 552.48364, 939.3291, 1012.559, 658.788, 495.36075, 444.43375, 876.0445, 608.99695]
2025-09-16 16:43:38,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 90.0, 114.0, 176.0, 182.0, 124.0, 89.0, 87.0, 174.0, 113.0]
2025-09-16 16:43:38,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 12 minutes, 1 second)
2025-09-16 16:45:37,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:45:39,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 650.00616 ± 240.404
2025-09-16 16:45:39,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [714.61676, 125.10249, 761.86896, 558.7675, 877.5762, 701.7544, 393.33957, 913.0897, 532.0355, 921.91046]
2025-09-16 16:45:39,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [129.0, 24.0, 140.0, 103.0, 178.0, 128.0, 83.0, 189.0, 98.0, 172.0]
2025-09-16 16:45:39,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 10 minutes, 1 second)
2025-09-16 16:47:37,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:47:40,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 948.28778 ± 475.124
2025-09-16 16:47:40,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [580.7137, 1431.3163, 1034.2438, 1873.4996, 854.0536, 595.95013, 573.9357, 1129.9513, 156.57825, 1252.6355]
2025-09-16 16:47:40,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [115.0, 285.0, 193.0, 364.0, 155.0, 129.0, 104.0, 232.0, 30.0, 250.0]
2025-09-16 16:47:40,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (948.29) for latency 18
2025-09-16 16:47:40,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes, 1 second)
2025-09-16 16:49:39,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:49:42,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1110.10083 ± 1023.931
2025-09-16 16:49:42,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [495.38962, 1699.8091, 390.8124, 113.7813, 2701.3274, 134.93483, 2896.435, 745.86316, 125.32254, 1797.3324]
2025-09-16 16:49:42,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [112.0, 330.0, 73.0, 22.0, 522.0, 26.0, 555.0, 132.0, 24.0, 349.0]
2025-09-16 16:49:42,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (1110.10) for latency 18
2025-09-16 16:49:42,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 2 seconds)
2025-09-16 16:51:42,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:51:44,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 802.00079 ± 473.255
2025-09-16 16:51:44,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [842.74536, 160.84627, 135.66856, 795.61053, 723.2882, 737.4227, 1883.0247, 1111.6962, 1046.9176, 582.7881]
2025-09-16 16:51:44,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [151.0, 31.0, 26.0, 149.0, 129.0, 135.0, 358.0, 212.0, 203.0, 109.0]
2025-09-16 16:51:44,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 2 seconds)
2025-09-16 16:53:42,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:53:45,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 764.53351 ± 412.884
2025-09-16 16:53:45,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1230.5876, 624.0697, 588.42957, 172.26236, 610.5393, 155.66241, 581.0998, 1287.1886, 1129.2006, 1266.2952]
2025-09-16 16:53:45,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [227.0, 114.0, 106.0, 33.0, 108.0, 30.0, 104.0, 236.0, 224.0, 227.0]
2025-09-16 16:53:45,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 1 second)
2025-09-16 16:55:43,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:55:44,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 663.20074 ± 441.423
2025-09-16 16:55:44,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [135.68523, 1059.4506, 108.46581, 937.9919, 735.5089, 1556.9552, 135.35872, 802.37354, 495.8609, 664.3571]
2025-09-16 16:55:44,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 196.0, 21.0, 183.0, 146.0, 311.0, 26.0, 153.0, 88.0, 120.0]
2025-09-16 16:55:44,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1251 [DEBUG]: Training session finished
