2025-09-16 12:15:14,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.150-delay_12
2025-09-16 12:15:14,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.150-delay_12
2025-09-16 12:15:14,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'12': <latency_env.delayed_mdp.ConstantDelay object at 0x14b1960847d0>}
2025-09-16 12:15:14,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 12:15:14,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 12:15:14,370 baseline-bpql-noisepromille150-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=580, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 12:15:14,370 baseline-bpql-noisepromille150-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 12:15:16,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 12:15:16,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 12:16:59,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:17:00,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 218.41000 ± 88.479
2025-09-16 12:17:00,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [203.9859, 162.15538, 153.18701, 218.3288, 401.61905, 165.79039, 160.33537, 172.26968, 166.40796, 380.02045]
2025-09-16 12:17:00,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [40.0, 32.0, 31.0, 39.0, 72.0, 33.0, 33.0, 34.0, 33.0, 70.0]
2025-09-16 12:17:00,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (218.41) for latency 12
2025-09-16 12:17:00,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 52 minutes, 24 seconds)
2025-09-16 12:18:52,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:18:53,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 337.43958 ± 83.176
2025-09-16 12:18:53,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [282.65994, 255.1158, 532.71185, 350.91913, 366.47153, 400.80197, 220.90111, 330.24042, 287.385, 347.18906]
2025-09-16 12:18:53,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [60.0, 49.0, 108.0, 68.0, 69.0, 75.0, 46.0, 62.0, 56.0, 70.0]
2025-09-16 12:18:53,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (337.44) for latency 12
2025-09-16 12:18:53,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 57 minutes, 41 seconds)
2025-09-16 12:20:46,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:20:46,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 330.60825 ± 72.301
2025-09-16 12:20:46,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [390.41537, 391.20477, 303.0628, 353.40955, 343.27945, 333.78177, 233.77887, 170.82584, 413.7049, 372.6193]
2025-09-16 12:20:46,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 75.0, 60.0, 69.0, 70.0, 67.0, 47.0, 33.0, 82.0, 68.0]
2025-09-16 12:20:46,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 58 minutes, 17 seconds)
2025-09-16 12:22:39,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:22:40,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 298.11725 ± 122.375
2025-09-16 12:22:40,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [313.65775, 163.70074, 567.41473, 213.46046, 426.37134, 179.76254, 303.7642, 332.81586, 321.7743, 158.45045]
2025-09-16 12:22:40,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [59.0, 32.0, 112.0, 45.0, 85.0, 34.0, 61.0, 62.0, 64.0, 33.0]
2025-09-16 12:22:40,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 57 minutes, 38 seconds)
2025-09-16 12:24:32,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:24:33,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 313.27765 ± 141.850
2025-09-16 12:24:33,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [477.71326, 127.49416, 308.63312, 307.43478, 144.7676, 150.64824, 529.60596, 271.44745, 503.5675, 311.46454]
2025-09-16 12:24:33,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 25.0, 58.0, 59.0, 28.0, 30.0, 100.0, 51.0, 96.0, 58.0]
2025-09-16 12:24:33,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 56 minutes, 29 seconds)
2025-09-16 12:26:25,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:26:26,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 306.34418 ± 184.470
2025-09-16 12:26:26,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [688.1316, 128.65648, 152.46573, 411.3189, 544.48035, 100.780235, 367.3017, 168.26347, 237.0315, 265.0121]
2025-09-16 12:26:26,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [133.0, 25.0, 31.0, 88.0, 113.0, 20.0, 77.0, 34.0, 46.0, 50.0]
2025-09-16 12:26:26,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 57 minutes, 25 seconds)
2025-09-16 12:28:19,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:28:20,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 311.76910 ± 138.512
2025-09-16 12:28:20,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [493.8551, 405.59985, 100.08782, 293.7174, 404.2901, 436.83817, 331.1523, 124.004486, 406.15976, 121.986176]
2025-09-16 12:28:20,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 77.0, 20.0, 56.0, 80.0, 86.0, 63.0, 24.0, 78.0, 24.0]
2025-09-16 12:28:20,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 55 minutes, 40 seconds)
2025-09-16 12:30:12,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:30:13,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 338.91345 ± 85.063
2025-09-16 12:30:13,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [367.45352, 313.5933, 342.54767, 352.2544, 319.09003, 482.25595, 123.73091, 328.9638, 374.42758, 384.81744]
2025-09-16 12:30:13,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [70.0, 59.0, 69.0, 65.0, 61.0, 107.0, 24.0, 62.0, 71.0, 71.0]
2025-09-16 12:30:13,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (338.91) for latency 12
2025-09-16 12:30:13,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 53 minutes, 47 seconds)
2025-09-16 12:32:06,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:32:07,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 362.45160 ± 103.126
2025-09-16 12:32:07,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [619.87665, 383.6168, 355.78625, 340.81476, 281.6395, 422.267, 313.11783, 268.3436, 235.7478, 403.30594]
2025-09-16 12:32:07,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [117.0, 72.0, 70.0, 63.0, 53.0, 80.0, 62.0, 51.0, 45.0, 78.0]
2025-09-16 12:32:07,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (362.45) for latency 12
2025-09-16 12:32:07,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 52 minutes, 8 seconds)
2025-09-16 12:33:59,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:34:00,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 304.21161 ± 87.480
2025-09-16 12:34:00,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [276.49756, 112.47452, 444.39447, 225.7918, 299.1663, 334.55527, 333.99213, 291.0012, 409.45035, 314.7924]
2025-09-16 12:34:00,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [54.0, 22.0, 87.0, 43.0, 55.0, 65.0, 66.0, 55.0, 76.0, 60.0]
2025-09-16 12:34:00,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 50 minutes, 6 seconds)
2025-09-16 12:35:53,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:35:54,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 408.42398 ± 119.471
2025-09-16 12:35:54,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [484.7852, 507.59906, 139.88109, 234.04356, 424.88895, 382.27484, 499.3843, 502.29572, 421.02325, 488.06363]
2025-09-16 12:35:54,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [91.0, 100.0, 27.0, 43.0, 81.0, 79.0, 98.0, 98.0, 92.0, 95.0]
2025-09-16 12:35:54,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (408.42) for latency 12
2025-09-16 12:35:54,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 48 minutes, 23 seconds)
2025-09-16 12:37:48,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:37:49,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 385.49387 ± 73.177
2025-09-16 12:37:49,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [395.73596, 362.75128, 305.35632, 483.8156, 396.65216, 382.71262, 493.1748, 381.89398, 231.2429, 421.6031]
2025-09-16 12:37:49,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 68.0, 57.0, 97.0, 74.0, 74.0, 100.0, 75.0, 43.0, 78.0]
2025-09-16 12:37:49,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 46 minutes, 50 seconds)
2025-09-16 12:39:42,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:39:43,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 474.38892 ± 129.612
2025-09-16 12:39:43,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [822.6992, 465.27524, 387.95044, 409.64056, 303.81754, 452.77618, 493.56403, 513.0132, 415.43796, 479.71484]
2025-09-16 12:39:43,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [157.0, 93.0, 84.0, 77.0, 60.0, 91.0, 92.0, 97.0, 75.0, 88.0]
2025-09-16 12:39:43,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (474.39) for latency 12
2025-09-16 12:39:43,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 45 minutes, 13 seconds)
2025-09-16 12:41:35,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:41:36,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 366.93774 ± 126.017
2025-09-16 12:41:36,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [309.54974, 611.7444, 308.6915, 266.68814, 386.15686, 479.509, 125.8263, 322.71274, 404.35388, 454.1446]
2025-09-16 12:41:36,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [59.0, 129.0, 59.0, 59.0, 70.0, 90.0, 25.0, 68.0, 74.0, 83.0]
2025-09-16 12:41:36,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 43 minutes, 9 seconds)
2025-09-16 12:43:29,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:43:30,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 367.60815 ± 95.522
2025-09-16 12:43:30,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [336.5225, 296.8893, 273.8146, 337.24814, 596.6776, 392.8174, 407.06866, 256.82343, 327.08408, 451.1357]
2025-09-16 12:43:30,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [67.0, 65.0, 53.0, 61.0, 115.0, 73.0, 75.0, 46.0, 61.0, 84.0]
2025-09-16 12:43:30,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 41 minutes, 35 seconds)
2025-09-16 12:45:23,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:45:24,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 306.46417 ± 129.631
2025-09-16 12:45:24,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [129.51822, 452.15015, 358.2037, 438.78372, 432.82056, 255.95227, 95.60347, 163.21965, 324.7369, 413.6532]
2025-09-16 12:45:24,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 80.0, 72.0, 87.0, 93.0, 49.0, 19.0, 31.0, 58.0, 76.0]
2025-09-16 12:45:24,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 39 minutes, 40 seconds)
2025-09-16 12:47:17,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:47:18,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 382.39786 ± 161.976
2025-09-16 12:47:18,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [342.12476, 521.44684, 346.88144, 147.83556, 359.66064, 95.55873, 576.8523, 339.0778, 611.5923, 482.94818]
2025-09-16 12:47:18,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 96.0, 64.0, 28.0, 66.0, 19.0, 111.0, 62.0, 116.0, 88.0]
2025-09-16 12:47:18,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 37 minutes, 38 seconds)
2025-09-16 12:49:12,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:49:13,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 438.56708 ± 137.386
2025-09-16 12:49:13,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [422.5064, 403.52792, 408.95804, 406.05597, 414.7977, 586.6313, 554.13513, 486.09982, 94.517685, 608.4409]
2025-09-16 12:49:13,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [75.0, 75.0, 74.0, 81.0, 76.0, 107.0, 119.0, 90.0, 19.0, 112.0]
2025-09-16 12:49:13,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 35 minutes, 49 seconds)
2025-09-16 12:51:07,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:51:08,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 478.02530 ± 110.736
2025-09-16 12:51:08,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [413.27884, 337.97275, 680.64233, 669.3165, 469.29446, 374.44012, 385.62564, 504.029, 449.39462, 496.25873]
2025-09-16 12:51:08,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [75.0, 64.0, 126.0, 121.0, 98.0, 69.0, 69.0, 92.0, 81.0, 91.0]
2025-09-16 12:51:08,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (478.03) for latency 12
2025-09-16 12:51:08,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 34 minutes, 20 seconds)
2025-09-16 12:53:01,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:53:02,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 387.94168 ± 142.857
2025-09-16 12:53:02,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [454.37814, 137.28557, 376.11313, 477.4639, 352.09796, 512.4029, 427.21588, 550.0446, 112.82293, 479.59174]
2025-09-16 12:53:02,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [91.0, 27.0, 73.0, 89.0, 67.0, 95.0, 77.0, 104.0, 22.0, 97.0]
2025-09-16 12:53:02,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 32 minutes, 32 seconds)
2025-09-16 12:54:55,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:54:56,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 440.93750 ± 80.218
2025-09-16 12:54:56,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [474.72134, 379.31747, 338.24585, 548.7538, 433.8485, 598.39264, 340.6509, 400.3412, 430.1534, 464.9502]
2025-09-16 12:54:56,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 69.0, 63.0, 101.0, 79.0, 113.0, 62.0, 76.0, 80.0, 87.0]
2025-09-16 12:54:56,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 30 minutes, 41 seconds)
2025-09-16 12:56:49,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:56:50,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 355.37143 ± 98.031
2025-09-16 12:56:50,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [408.18942, 288.93164, 304.6391, 411.17038, 487.96808, 118.3842, 366.92212, 337.4988, 445.82355, 384.1873]
2025-09-16 12:56:50,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 52.0, 55.0, 76.0, 103.0, 23.0, 77.0, 60.0, 82.0, 85.0]
2025-09-16 12:56:50,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 28 minutes, 37 seconds)
2025-09-16 12:58:43,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:58:45,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 474.39511 ± 99.577
2025-09-16 12:58:45,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [523.74805, 409.808, 659.72815, 437.54947, 567.03894, 366.52097, 598.4639, 360.68002, 404.75723, 415.65607]
2025-09-16 12:58:45,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 81.0, 128.0, 80.0, 120.0, 81.0, 109.0, 65.0, 75.0, 87.0]
2025-09-16 12:58:45,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 26 minutes, 45 seconds)
2025-09-16 13:00:38,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:00:39,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 446.46735 ± 80.675
2025-09-16 13:00:39,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [513.75415, 291.2158, 470.05963, 524.2916, 422.40274, 416.33743, 400.66553, 571.8671, 496.95245, 357.12653]
2025-09-16 13:00:39,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [95.0, 54.0, 85.0, 96.0, 77.0, 76.0, 75.0, 107.0, 94.0, 66.0]
2025-09-16 13:00:39,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 24 minutes, 43 seconds)
2025-09-16 13:02:33,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:02:34,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 400.14755 ± 122.654
2025-09-16 13:02:34,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [360.15268, 335.69046, 515.8451, 139.43091, 482.78247, 401.50793, 405.7932, 328.97647, 404.4757, 626.82086]
2025-09-16 13:02:34,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 60.0, 107.0, 27.0, 88.0, 80.0, 73.0, 63.0, 72.0, 117.0]
2025-09-16 13:02:34,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 22 minutes, 50 seconds)
2025-09-16 13:04:27,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:04:28,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 471.69702 ± 111.902
2025-09-16 13:04:28,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [408.46826, 405.83408, 640.7853, 331.63037, 414.66, 413.28806, 416.3196, 429.2282, 564.9669, 691.78937]
2025-09-16 13:04:28,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 73.0, 133.0, 62.0, 74.0, 75.0, 80.0, 82.0, 105.0, 127.0]
2025-09-16 13:04:28,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 21 minutes, 1 second)
2025-09-16 13:06:22,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:06:24,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 483.69916 ± 116.638
2025-09-16 13:06:24,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [648.6357, 572.29315, 385.3319, 444.49084, 619.717, 559.7974, 331.09692, 339.08417, 366.27057, 570.274]
2025-09-16 13:06:24,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [126.0, 121.0, 70.0, 83.0, 128.0, 106.0, 62.0, 60.0, 66.0, 104.0]
2025-09-16 13:06:24,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (483.70) for latency 12
2025-09-16 13:06:24,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 19 minutes, 34 seconds)
2025-09-16 13:08:17,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:08:18,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 426.12646 ± 139.715
2025-09-16 13:08:18,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [157.71841, 427.84937, 338.57327, 307.0358, 563.918, 573.5601, 343.66406, 382.53336, 618.8181, 547.5941]
2025-09-16 13:08:18,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 80.0, 62.0, 56.0, 103.0, 120.0, 65.0, 69.0, 119.0, 110.0]
2025-09-16 13:08:18,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 17 minutes, 31 seconds)
2025-09-16 13:10:12,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:10:13,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 417.45850 ± 114.751
2025-09-16 13:10:13,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [493.97598, 510.32193, 577.51355, 347.66068, 134.00404, 405.55634, 418.71747, 453.37405, 466.3755, 367.0852]
2025-09-16 13:10:13,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 98.0, 107.0, 70.0, 26.0, 88.0, 78.0, 100.0, 84.0, 79.0]
2025-09-16 13:10:13,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 15 minutes, 49 seconds)
2025-09-16 13:12:06,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:12:07,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 413.12949 ± 156.183
2025-09-16 13:12:07,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [270.82587, 294.97726, 401.72617, 530.3125, 462.75317, 100.343704, 509.9871, 344.0489, 556.5168, 659.8033]
2025-09-16 13:12:07,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [53.0, 54.0, 73.0, 103.0, 97.0, 20.0, 94.0, 63.0, 102.0, 125.0]
2025-09-16 13:12:07,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 13 minutes, 52 seconds)
2025-09-16 13:14:01,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:14:02,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 480.03506 ± 246.777
2025-09-16 13:14:02,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [420.63333, 1020.6531, 717.45074, 427.87552, 286.50317, 396.74503, 299.89484, 113.15459, 435.67752, 681.7628]
2025-09-16 13:14:02,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 194.0, 135.0, 80.0, 50.0, 72.0, 55.0, 22.0, 93.0, 120.0]
2025-09-16 13:14:02,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 12 minutes, 6 seconds)
2025-09-16 13:15:56,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:15:57,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 480.17886 ± 187.637
2025-09-16 13:15:57,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [344.95374, 469.2328, 962.16693, 657.09125, 457.66486, 496.13177, 364.13553, 308.00742, 406.11304, 336.2916]
2025-09-16 13:15:57,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [63.0, 93.0, 202.0, 128.0, 92.0, 92.0, 79.0, 58.0, 75.0, 61.0]
2025-09-16 13:15:57,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 10 minutes, 3 seconds)
2025-09-16 13:17:51,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:17:53,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 513.68689 ± 136.033
2025-09-16 13:17:53,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [491.14825, 651.67236, 778.9696, 440.08072, 435.4756, 430.89523, 705.75464, 369.087, 406.8903, 426.89517]
2025-09-16 13:17:53,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 122.0, 161.0, 79.0, 84.0, 93.0, 130.0, 74.0, 73.0, 80.0]
2025-09-16 13:17:53,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (513.69) for latency 12
2025-09-16 13:17:53,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 8 minutes, 24 seconds)
2025-09-16 13:19:47,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:19:48,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 514.55035 ± 214.311
2025-09-16 13:19:48,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [497.2348, 875.294, 492.92276, 101.58747, 536.9713, 817.95306, 610.2726, 518.6767, 338.18237, 356.40863]
2025-09-16 13:19:48,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 175.0, 97.0, 20.0, 98.0, 160.0, 115.0, 99.0, 67.0, 69.0]
2025-09-16 13:19:48,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (514.55) for latency 12
2025-09-16 13:19:48,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 6 minutes, 34 seconds)
2025-09-16 13:21:41,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:21:43,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 451.55606 ± 171.812
2025-09-16 13:21:43,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [456.74594, 634.67456, 369.19446, 707.74036, 322.45938, 403.77948, 471.66672, 386.6771, 660.4068, 102.21548]
2025-09-16 13:21:43,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 138.0, 67.0, 137.0, 61.0, 88.0, 87.0, 71.0, 122.0, 20.0]
2025-09-16 13:21:43,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 4 minutes, 37 seconds)
2025-09-16 13:23:36,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:23:38,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 517.82642 ± 117.485
2025-09-16 13:23:38,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [464.4456, 688.15576, 385.56848, 315.0482, 622.5931, 481.15097, 535.8475, 584.6882, 669.37604, 431.39014]
2025-09-16 13:23:38,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 127.0, 68.0, 61.0, 121.0, 93.0, 112.0, 108.0, 134.0, 76.0]
2025-09-16 13:23:38,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (517.83) for latency 12
2025-09-16 13:23:38,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 2 minutes, 44 seconds)
2025-09-16 13:25:32,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:25:33,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 453.61313 ± 144.729
2025-09-16 13:25:33,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [591.18695, 456.60803, 442.87027, 640.0794, 342.1833, 634.0817, 424.96298, 482.59854, 392.3661, 129.19405]
2025-09-16 13:25:33,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [110.0, 83.0, 89.0, 121.0, 62.0, 121.0, 77.0, 92.0, 74.0, 25.0]
2025-09-16 13:25:33,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 55 seconds)
2025-09-16 13:27:27,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:27:29,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 426.20917 ± 158.754
2025-09-16 13:27:29,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [499.11444, 523.47675, 106.49094, 709.24786, 348.33353, 310.61136, 487.99005, 543.53174, 293.10056, 440.19495]
2025-09-16 13:27:29,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [91.0, 95.0, 21.0, 133.0, 67.0, 59.0, 104.0, 100.0, 53.0, 81.0]
2025-09-16 13:27:29,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 59 minutes)
2025-09-16 13:29:23,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:29:24,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 548.37708 ± 280.964
2025-09-16 13:29:24,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [441.77014, 866.2465, 538.2432, 519.0687, 713.0448, 150.5577, 1111.8088, 595.9169, 173.81204, 373.3023]
2025-09-16 13:29:24,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 186.0, 102.0, 92.0, 131.0, 29.0, 220.0, 113.0, 33.0, 73.0]
2025-09-16 13:29:24,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (548.38) for latency 12
2025-09-16 13:29:24,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 57 minutes, 7 seconds)
2025-09-16 13:31:18,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:31:19,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 522.43933 ± 124.822
2025-09-16 13:31:19,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [313.73178, 656.11426, 569.0539, 384.13458, 661.88995, 634.024, 589.6484, 557.1839, 524.4581, 334.15457]
2025-09-16 13:31:19,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [60.0, 124.0, 121.0, 71.0, 121.0, 114.0, 113.0, 110.0, 100.0, 62.0]
2025-09-16 13:31:19,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 55 minutes, 18 seconds)
2025-09-16 13:33:14,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:33:15,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 452.64902 ± 174.618
2025-09-16 13:33:15,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [171.06143, 653.38934, 554.9235, 386.8363, 660.8846, 541.5928, 573.3602, 431.7233, 421.43808, 131.28056]
2025-09-16 13:33:15,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 126.0, 107.0, 83.0, 133.0, 107.0, 109.0, 81.0, 85.0, 25.0]
2025-09-16 13:33:15,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 53 minutes, 28 seconds)
2025-09-16 13:35:08,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:35:10,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 617.07245 ± 145.090
2025-09-16 13:35:10,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [362.90305, 750.1481, 721.5318, 695.4467, 634.2714, 705.23267, 352.24713, 780.9737, 628.1026, 539.86725]
2025-09-16 13:35:10,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 139.0, 137.0, 147.0, 119.0, 151.0, 65.0, 149.0, 121.0, 99.0]
2025-09-16 13:35:10,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (617.07) for latency 12
2025-09-16 13:35:10,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 51 minutes, 33 seconds)
2025-09-16 13:37:05,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:37:06,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 596.33386 ± 219.766
2025-09-16 13:37:06,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1111.9818, 601.3847, 290.0924, 573.0714, 740.2494, 460.39484, 499.45575, 500.88095, 773.2122, 412.61523]
2025-09-16 13:37:06,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [217.0, 110.0, 55.0, 105.0, 137.0, 85.0, 89.0, 96.0, 145.0, 75.0]
2025-09-16 13:37:06,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 49 minutes, 43 seconds)
2025-09-16 13:39:01,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:39:02,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 527.31757 ± 165.494
2025-09-16 13:39:02,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [446.2531, 317.31467, 642.4464, 352.36246, 379.19846, 523.5878, 668.8172, 440.46463, 872.675, 630.0561]
2025-09-16 13:39:02,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 57.0, 123.0, 64.0, 67.0, 95.0, 124.0, 83.0, 171.0, 120.0]
2025-09-16 13:39:02,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 47 minutes, 49 seconds)
2025-09-16 13:40:56,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:40:57,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 549.47308 ± 251.882
2025-09-16 13:40:57,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [348.9531, 473.10403, 797.2318, 489.5963, 1018.85956, 397.0258, 119.46932, 829.85315, 428.7973, 591.84064]
2025-09-16 13:40:57,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [62.0, 87.0, 147.0, 95.0, 200.0, 69.0, 23.0, 154.0, 79.0, 106.0]
2025-09-16 13:40:57,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 45 minutes, 58 seconds)
2025-09-16 13:42:53,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:42:55,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 755.52551 ± 380.346
2025-09-16 13:42:55,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [800.5328, 443.6371, 483.8905, 619.3326, 1690.9617, 401.72144, 426.66287, 894.44904, 688.1357, 1105.931]
2025-09-16 13:42:55,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [149.0, 84.0, 92.0, 128.0, 325.0, 72.0, 92.0, 170.0, 146.0, 211.0]
2025-09-16 13:42:55,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (755.53) for latency 12
2025-09-16 13:42:55,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 44 minutes, 21 seconds)
2025-09-16 13:44:49,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:44:51,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 586.28210 ± 127.809
2025-09-16 13:44:51,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [555.0938, 669.5522, 563.07886, 616.8297, 456.8336, 511.07172, 478.07745, 654.07574, 900.48895, 457.71918]
2025-09-16 13:44:51,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 130.0, 108.0, 113.0, 83.0, 96.0, 84.0, 122.0, 167.0, 85.0]
2025-09-16 13:44:51,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 42 minutes, 33 seconds)
2025-09-16 13:46:44,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:46:45,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 604.00873 ± 194.431
2025-09-16 13:46:45,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [505.91367, 273.06607, 682.43854, 467.70374, 727.43225, 823.8737, 500.23862, 674.39264, 956.3284, 428.69952]
2025-09-16 13:46:45,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 49.0, 127.0, 94.0, 139.0, 151.0, 107.0, 128.0, 175.0, 80.0]
2025-09-16 13:46:45,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 40 minutes, 22 seconds)
2025-09-16 13:48:39,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:48:41,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 549.97284 ± 158.514
2025-09-16 13:48:41,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [490.37512, 739.5116, 249.27722, 631.0869, 681.84436, 735.44543, 619.9586, 551.6318, 318.76114, 481.83643]
2025-09-16 13:48:41,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 137.0, 48.0, 127.0, 144.0, 154.0, 114.0, 104.0, 57.0, 86.0]
2025-09-16 13:48:41,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 38 minutes, 20 seconds)
2025-09-16 13:50:36,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:50:37,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 635.07092 ± 130.268
2025-09-16 13:50:37,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [598.971, 452.9314, 647.6374, 774.84216, 602.6169, 404.20932, 740.3203, 565.9939, 747.04346, 816.14307]
2025-09-16 13:50:37,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [110.0, 89.0, 122.0, 144.0, 118.0, 72.0, 135.0, 102.0, 134.0, 153.0]
2025-09-16 13:50:37,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 36 minutes, 41 seconds)
2025-09-16 13:52:31,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:52:32,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 692.99396 ± 209.459
2025-09-16 13:52:32,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [718.98346, 974.10406, 730.6424, 161.58578, 854.66895, 503.33615, 722.05255, 764.3692, 768.15405, 732.0429]
2025-09-16 13:52:32,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [136.0, 185.0, 135.0, 31.0, 167.0, 94.0, 140.0, 138.0, 137.0, 135.0]
2025-09-16 13:52:32,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 34 minutes, 22 seconds)
2025-09-16 13:54:27,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:54:28,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 571.55103 ± 145.721
2025-09-16 13:54:28,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [536.4843, 435.1654, 670.2823, 520.1968, 842.8711, 291.2344, 737.5705, 557.5469, 562.2999, 561.8589]
2025-09-16 13:54:28,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 76.0, 127.0, 90.0, 152.0, 54.0, 159.0, 107.0, 105.0, 112.0]
2025-09-16 13:54:28,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 32 minutes, 22 seconds)
2025-09-16 13:56:24,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:56:26,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 733.66595 ± 229.978
2025-09-16 13:56:26,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [672.24603, 799.8832, 627.84015, 653.7265, 512.2085, 1307.9861, 523.37836, 729.49, 556.0525, 953.848]
2025-09-16 13:56:26,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [127.0, 148.0, 123.0, 140.0, 107.0, 273.0, 95.0, 135.0, 99.0, 182.0]
2025-09-16 13:56:26,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 30 minutes, 55 seconds)
2025-09-16 13:58:19,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:58:21,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 714.26282 ± 186.146
2025-09-16 13:58:21,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [556.6702, 650.3821, 730.571, 573.8889, 1077.4425, 869.1086, 840.89233, 445.89612, 861.9999, 535.77673]
2025-09-16 13:58:21,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [110.0, 129.0, 154.0, 104.0, 195.0, 157.0, 152.0, 99.0, 164.0, 96.0]
2025-09-16 13:58:21,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 28 minutes, 57 seconds)
2025-09-16 14:00:16,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:00:17,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 654.32422 ± 202.094
2025-09-16 14:00:17,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [617.59863, 739.4117, 484.85422, 382.861, 629.0351, 872.46686, 701.521, 589.5258, 1094.0083, 431.95923]
2025-09-16 14:00:17,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [119.0, 138.0, 100.0, 70.0, 128.0, 168.0, 147.0, 111.0, 208.0, 85.0]
2025-09-16 14:00:17,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 27 minutes, 1 second)
2025-09-16 14:02:12,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:02:13,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 613.67578 ± 318.689
2025-09-16 14:02:13,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [727.64453, 568.9302, 446.21082, 1222.8997, 129.48444, 870.9228, 864.75793, 142.59908, 664.75793, 498.55093]
2025-09-16 14:02:13,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [141.0, 105.0, 83.0, 225.0, 25.0, 183.0, 166.0, 27.0, 144.0, 90.0]
2025-09-16 14:02:13,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 25 minutes, 11 seconds)
2025-09-16 14:04:10,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:04:11,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 674.32819 ± 301.088
2025-09-16 14:04:11,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [924.91815, 1078.5216, 496.45337, 704.6157, 1278.5043, 444.15414, 299.4612, 456.2006, 521.0338, 539.4189]
2025-09-16 14:04:11,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [184.0, 192.0, 94.0, 128.0, 245.0, 82.0, 53.0, 82.0, 94.0, 107.0]
2025-09-16 14:04:11,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 23 minutes, 36 seconds)
2025-09-16 14:06:04,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:06:06,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 625.83716 ± 285.482
2025-09-16 14:06:06,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [512.7362, 411.711, 1144.0692, 632.3797, 687.1517, 101.3903, 776.89276, 708.89325, 342.62872, 940.5182]
2025-09-16 14:06:06,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 73.0, 217.0, 127.0, 128.0, 20.0, 155.0, 134.0, 67.0, 194.0]
2025-09-16 14:06:06,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 21 minutes, 15 seconds)
2025-09-16 14:08:00,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:08:02,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 628.44397 ± 317.757
2025-09-16 14:08:02,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [107.45821, 1089.6815, 647.5098, 483.81152, 745.6965, 963.3172, 123.1216, 526.27496, 963.9141, 633.6545]
2025-09-16 14:08:02,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 209.0, 122.0, 92.0, 136.0, 184.0, 24.0, 91.0, 182.0, 114.0]
2025-09-16 14:08:02,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 19 minutes, 22 seconds)
2025-09-16 14:09:57,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:09:59,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 746.78528 ± 506.493
2025-09-16 14:09:59,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [602.26697, 403.2115, 441.3587, 259.33813, 1236.7404, 792.8132, 549.90717, 1838.986, 119.168205, 1224.0632]
2025-09-16 14:09:59,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [121.0, 76.0, 81.0, 48.0, 232.0, 149.0, 101.0, 353.0, 23.0, 222.0]
2025-09-16 14:09:59,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 17 minutes, 29 seconds)
2025-09-16 14:11:53,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:11:56,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 886.09052 ± 399.017
2025-09-16 14:11:56,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [624.17303, 1340.8479, 1240.5065, 572.1521, 297.01862, 1188.9933, 1533.5504, 416.2105, 878.72504, 768.72736]
2025-09-16 14:11:56,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 260.0, 223.0, 107.0, 55.0, 226.0, 286.0, 88.0, 163.0, 162.0]
2025-09-16 14:11:56,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (886.09) for latency 12
2025-09-16 14:11:56,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 15 minutes, 42 seconds)
2025-09-16 14:13:51,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:13:53,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 602.06415 ± 284.141
2025-09-16 14:13:53,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [496.62195, 100.974365, 509.88348, 1260.99, 720.1392, 652.38855, 603.59973, 423.69852, 463.3575, 788.9885]
2025-09-16 14:13:53,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 20.0, 107.0, 257.0, 135.0, 124.0, 127.0, 90.0, 98.0, 158.0]
2025-09-16 14:13:53,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 13 minutes, 39 seconds)
2025-09-16 14:15:46,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:15:49,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 915.63586 ± 492.246
2025-09-16 14:15:49,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1880.1113, 310.0044, 1321.1064, 1587.532, 823.76434, 360.68295, 665.50507, 684.63275, 884.7091, 638.3101]
2025-09-16 14:15:49,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [357.0, 56.0, 241.0, 300.0, 156.0, 67.0, 122.0, 136.0, 172.0, 121.0]
2025-09-16 14:15:49,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (915.64) for latency 12
2025-09-16 14:15:49,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 11 minutes, 53 seconds)
2025-09-16 14:17:46,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:17:48,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 796.60565 ± 195.028
2025-09-16 14:17:48,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [854.4073, 682.13586, 532.7353, 684.4692, 954.69476, 640.67126, 836.7897, 704.1625, 808.0984, 1267.8927]
2025-09-16 14:17:48,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [164.0, 130.0, 96.0, 127.0, 185.0, 116.0, 151.0, 118.0, 145.0, 224.0]
2025-09-16 14:17:48,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 10 minutes, 21 seconds)
2025-09-16 14:19:40,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:19:43,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 995.49432 ± 406.557
2025-09-16 14:19:43,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [903.5796, 1469.9559, 769.43286, 947.84875, 1275.359, 730.9028, 771.2495, 785.55853, 417.31964, 1883.7363]
2025-09-16 14:19:43,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [164.0, 281.0, 166.0, 174.0, 243.0, 128.0, 147.0, 160.0, 76.0, 359.0]
2025-09-16 14:19:43,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (995.49) for latency 12
2025-09-16 14:19:43,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 8 minutes, 11 seconds)
2025-09-16 14:21:37,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:21:40,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 836.66827 ± 371.079
2025-09-16 14:21:40,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1249.8492, 430.93338, 314.25955, 656.2525, 679.64294, 1394.0276, 829.24005, 487.13022, 980.88184, 1344.4661]
2025-09-16 14:21:40,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [235.0, 90.0, 56.0, 134.0, 136.0, 259.0, 147.0, 93.0, 176.0, 254.0]
2025-09-16 14:21:40,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 6 minutes, 11 seconds)
2025-09-16 14:23:34,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:23:36,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 887.12988 ± 435.196
2025-09-16 14:23:36,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [2035.2535, 696.3044, 403.31787, 853.74286, 646.3481, 1005.89984, 635.6589, 1121.2086, 551.24634, 922.3188]
2025-09-16 14:23:36,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [396.0, 137.0, 82.0, 163.0, 126.0, 177.0, 127.0, 201.0, 100.0, 173.0]
2025-09-16 14:23:36,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 4 minutes, 10 seconds)
2025-09-16 14:25:32,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:25:34,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 863.99493 ± 466.276
2025-09-16 14:25:34,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [478.66672, 758.11725, 885.67365, 827.2225, 374.80804, 726.90375, 1507.2631, 536.4527, 601.6673, 1943.1742]
2025-09-16 14:25:34,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 145.0, 175.0, 155.0, 68.0, 134.0, 286.0, 103.0, 122.0, 380.0]
2025-09-16 14:25:35,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 2 minutes, 28 seconds)
2025-09-16 14:27:29,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:27:31,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1029.44043 ± 520.501
2025-09-16 14:27:31,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1033.0654, 642.9468, 621.8908, 1098.8102, 1632.6757, 102.06142, 1718.7543, 1156.7512, 574.73364, 1712.716]
2025-09-16 14:27:31,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [201.0, 134.0, 113.0, 221.0, 320.0, 20.0, 322.0, 221.0, 104.0, 319.0]
2025-09-16 14:27:31,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (1029.44) for latency 12
2025-09-16 14:27:31,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 17 seconds)
2025-09-16 14:29:25,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:29:28,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 953.21857 ± 540.224
2025-09-16 14:29:28,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1234.9852, 1598.9795, 635.2026, 375.00943, 318.82788, 1614.7169, 960.70886, 415.55508, 1799.8805, 578.31903]
2025-09-16 14:29:28,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [226.0, 294.0, 128.0, 68.0, 58.0, 302.0, 174.0, 84.0, 333.0, 127.0]
2025-09-16 14:29:28,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 58 minutes, 26 seconds)
2025-09-16 14:31:23,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:31:26,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 920.27246 ± 313.919
2025-09-16 14:31:26,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [530.7646, 773.50134, 1091.5098, 801.94086, 937.4525, 1033.6685, 597.8783, 902.3201, 813.3943, 1720.2941]
2025-09-16 14:31:26,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [110.0, 162.0, 202.0, 143.0, 195.0, 190.0, 108.0, 168.0, 155.0, 319.0]
2025-09-16 14:31:26,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 56 minutes, 38 seconds)
2025-09-16 14:33:21,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:33:24,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1068.14917 ± 553.896
2025-09-16 14:33:24,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1045.8798, 1019.9161, 863.7227, 591.8106, 1076.0448, 344.38022, 1811.902, 1166.3596, 2250.8982, 510.57858]
2025-09-16 14:33:24,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [180.0, 191.0, 159.0, 106.0, 203.0, 66.0, 342.0, 223.0, 453.0, 104.0]
2025-09-16 14:33:24,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (1068.15) for latency 12
2025-09-16 14:33:24,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 54 minutes, 52 seconds)
2025-09-16 14:35:18,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:35:20,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 909.28809 ± 295.833
2025-09-16 14:35:20,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [417.1448, 1163.347, 915.58484, 1059.8733, 994.9954, 920.95087, 968.42645, 1416.0768, 385.3529, 851.1289]
2025-09-16 14:35:20,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 207.0, 186.0, 190.0, 185.0, 167.0, 176.0, 264.0, 69.0, 161.0]
2025-09-16 14:35:20,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 52 minutes, 43 seconds)
2025-09-16 14:37:14,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:37:17,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1167.37158 ± 328.782
2025-09-16 14:37:17,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1546.614, 1524.5099, 515.2669, 1125.6727, 662.9657, 1194.1455, 1428.9227, 1057.3235, 1367.2864, 1251.0087]
2025-09-16 14:37:17,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [288.0, 284.0, 94.0, 201.0, 137.0, 229.0, 262.0, 194.0, 251.0, 241.0]
2025-09-16 14:37:17,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (1167.37) for latency 12
2025-09-16 14:37:17,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 50 minutes, 47 seconds)
2025-09-16 14:39:12,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:39:15,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 993.71924 ± 449.358
2025-09-16 14:39:15,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1065.802, 398.9119, 335.43414, 1342.0557, 543.03, 1403.1514, 721.33734, 1687.2909, 1416.408, 1023.7711]
2025-09-16 14:39:15,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [195.0, 85.0, 63.0, 252.0, 108.0, 271.0, 127.0, 316.0, 268.0, 185.0]
2025-09-16 14:39:15,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 48 minutes, 55 seconds)
2025-09-16 14:41:10,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:41:13,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1002.39484 ± 345.538
2025-09-16 14:41:13,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [635.07715, 774.3366, 1055.5295, 960.6824, 1027.4766, 744.7609, 1927.4293, 1126.0404, 1023.3069, 749.309]
2025-09-16 14:41:13,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [119.0, 150.0, 191.0, 189.0, 183.0, 136.0, 357.0, 210.0, 192.0, 155.0]
2025-09-16 14:41:13,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 46 minutes, 58 seconds)
2025-09-16 14:43:08,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:43:12,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1111.01343 ± 565.824
2025-09-16 14:43:12,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1003.7692, 418.3848, 1393.3063, 605.6599, 677.80725, 2345.894, 1271.0162, 1778.0023, 874.21924, 742.07605]
2025-09-16 14:43:12,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [194.0, 92.0, 272.0, 116.0, 148.0, 467.0, 252.0, 335.0, 149.0, 130.0]
2025-09-16 14:43:12,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 45 minutes, 2 seconds)
2025-09-16 14:45:05,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:45:07,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 726.10095 ± 347.821
2025-09-16 14:45:07,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1092.2301, 686.92316, 376.21338, 1033.3805, 1327.3513, 583.02466, 886.76843, 699.44745, 467.71918, 107.95149]
2025-09-16 14:45:07,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [217.0, 135.0, 72.0, 201.0, 252.0, 114.0, 159.0, 140.0, 95.0, 21.0]
2025-09-16 14:45:07,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 43 minutes)
2025-09-16 14:47:01,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:47:04,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1027.67554 ± 691.278
2025-09-16 14:47:04,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1475.6381, 977.40094, 473.57535, 107.97579, 509.32553, 522.78107, 1142.5481, 1161.4119, 2713.0693, 1193.0286]
2025-09-16 14:47:04,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [273.0, 188.0, 85.0, 21.0, 108.0, 116.0, 223.0, 224.0, 514.0, 227.0]
2025-09-16 14:47:04,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 41 minutes, 2 seconds)
2025-09-16 14:48:59,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:49:02,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1078.77014 ± 473.758
2025-09-16 14:49:02,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1316.6932, 1607.3036, 1159.9094, 611.85065, 1914.74, 137.33727, 829.9977, 1156.3102, 1139.6147, 913.9452]
2025-09-16 14:49:02,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [251.0, 306.0, 224.0, 106.0, 362.0, 26.0, 170.0, 216.0, 210.0, 186.0]
2025-09-16 14:49:02,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 39 minutes, 8 seconds)
2025-09-16 14:50:57,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:50:59,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 949.13928 ± 495.548
2025-09-16 14:50:59,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1345.2003, 129.85498, 432.20435, 1229.5745, 1653.7146, 846.54193, 718.4529, 1445.1868, 1321.744, 368.91788]
2025-09-16 14:50:59,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [271.0, 25.0, 87.0, 224.0, 316.0, 155.0, 126.0, 284.0, 245.0, 68.0]
2025-09-16 14:50:59,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 37 minutes, 8 seconds)
2025-09-16 14:52:58,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:53:00,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 799.71484 ± 445.411
2025-09-16 14:53:00,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [934.67316, 112.03788, 1167.3201, 1333.9084, 617.06915, 320.4994, 1010.2935, 1301.6091, 134.479, 1065.2588]
2025-09-16 14:53:00,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [174.0, 22.0, 206.0, 250.0, 113.0, 60.0, 199.0, 245.0, 26.0, 194.0]
2025-09-16 14:53:00,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 35 minutes, 18 seconds)
2025-09-16 14:54:51,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:54:53,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 754.74030 ± 282.041
2025-09-16 14:54:53,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [257.92343, 624.1652, 489.04514, 1142.5161, 961.3715, 979.98676, 377.93704, 880.5181, 975.1159, 858.82367]
2025-09-16 14:54:53,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [48.0, 126.0, 85.0, 206.0, 177.0, 199.0, 65.0, 165.0, 176.0, 162.0]
2025-09-16 14:54:53,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 33 minutes, 13 seconds)
2025-09-16 14:56:49,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:56:53,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1683.13989 ± 1117.210
2025-09-16 14:56:53,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [308.5273, 1992.1364, 1474.0383, 2506.003, 677.673, 854.47626, 4131.311, 2712.249, 1484.8972, 690.0878]
2025-09-16 14:56:53,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [56.0, 378.0, 290.0, 476.0, 144.0, 175.0, 803.0, 517.0, 287.0, 128.0]
2025-09-16 14:56:53,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (1683.14) for latency 12
2025-09-16 14:56:53,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 31 minutes, 26 seconds)
2025-09-16 14:58:50,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:58:52,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 967.16974 ± 553.434
2025-09-16 14:58:52,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [2255.6804, 647.1065, 547.333, 1111.8236, 1058.3801, 583.2134, 1468.7842, 550.2456, 1174.0698, 275.06027]
2025-09-16 14:58:52,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [437.0, 125.0, 93.0, 223.0, 191.0, 105.0, 291.0, 104.0, 229.0, 49.0]
2025-09-16 14:58:52,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 29 minutes, 31 seconds)
2025-09-16 15:00:49,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:00:52,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1330.82886 ± 1019.415
2025-09-16 15:00:52,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [955.2899, 393.81064, 2172.665, 471.7391, 1551.85, 455.5838, 1169.691, 1312.8536, 872.0267, 3952.778]
2025-09-16 15:00:52,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [199.0, 74.0, 397.0, 83.0, 292.0, 83.0, 227.0, 247.0, 156.0, 731.0]
2025-09-16 15:00:52,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 27 minutes, 40 seconds)
2025-09-16 15:02:44,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:02:48,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1581.11377 ± 1213.112
2025-09-16 15:02:48,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [2773.488, 362.05756, 1317.1703, 996.8718, 935.659, 691.7303, 4296.2505, 857.51764, 739.7336, 2840.6594]
2025-09-16 15:02:48,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [518.0, 80.0, 254.0, 172.0, 161.0, 127.0, 816.0, 167.0, 146.0, 544.0]
2025-09-16 15:02:48,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 25 minutes, 28 seconds)
2025-09-16 15:04:44,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:04:48,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1334.65112 ± 837.808
2025-09-16 15:04:48,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [3152.6396, 2340.8557, 1109.6141, 1425.3512, 1862.617, 323.32123, 697.7557, 605.6463, 754.80273, 1073.9069]
2025-09-16 15:04:48,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [611.0, 453.0, 209.0, 286.0, 344.0, 58.0, 125.0, 118.0, 125.0, 204.0]
2025-09-16 15:04:48,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 23 minutes, 47 seconds)
2025-09-16 15:06:42,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:06:46,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1488.20044 ± 1124.636
2025-09-16 15:06:46,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1438.0924, 1239.154, 3982.838, 311.1911, 781.5079, 3062.269, 930.7787, 787.1765, 467.78644, 1881.2096]
2025-09-16 15:06:46,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [281.0, 216.0, 768.0, 56.0, 168.0, 592.0, 191.0, 156.0, 91.0, 358.0]
2025-09-16 15:06:46,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 21 minutes, 44 seconds)
2025-09-16 15:08:45,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:08:48,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1175.74146 ± 959.736
2025-09-16 15:08:48,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [488.49265, 3592.6736, 1369.6068, 1841.863, 119.097275, 245.04762, 811.8263, 1258.7943, 1362.8989, 667.11285]
2025-09-16 15:08:48,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 668.0, 242.0, 343.0, 23.0, 44.0, 157.0, 257.0, 255.0, 123.0]
2025-09-16 15:08:48,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 19 minutes, 50 seconds)
2025-09-16 15:10:42,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:10:47,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1955.98010 ± 1041.448
2025-09-16 15:10:47,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1890.0375, 784.07947, 2461.8794, 1884.6936, 1663.1375, 3756.141, 1933.2701, 621.872, 3676.565, 888.1259]
2025-09-16 15:10:47,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [354.0, 158.0, 467.0, 366.0, 336.0, 731.0, 380.0, 112.0, 712.0, 177.0]
2025-09-16 15:10:47,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1226 [INFO]: New best (1955.98) for latency 12
2025-09-16 15:10:47,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 17 minutes, 50 seconds)
2025-09-16 15:12:40,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:12:44,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1209.65381 ± 1010.713
2025-09-16 15:12:44,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [340.4663, 446.1699, 1298.0797, 2841.785, 278.50934, 868.57574, 679.5391, 3405.5173, 928.6221, 1009.2734]
2025-09-16 15:12:44,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [61.0, 83.0, 241.0, 524.0, 52.0, 158.0, 126.0, 642.0, 189.0, 176.0]
2025-09-16 15:12:44,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 15 minutes, 53 seconds)
2025-09-16 15:14:43,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:14:47,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1345.64185 ± 780.482
2025-09-16 15:14:47,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [874.5227, 108.696175, 866.5599, 2657.4243, 2666.8489, 1688.7256, 710.5918, 1306.5751, 1527.547, 1048.9279]
2025-09-16 15:14:47,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [168.0, 21.0, 158.0, 512.0, 511.0, 331.0, 136.0, 240.0, 276.0, 201.0]
2025-09-16 15:14:47,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 13 minutes, 58 seconds)
2025-09-16 15:16:41,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:16:46,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1873.17322 ± 1491.808
2025-09-16 15:16:46,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [2740.7642, 287.52264, 550.4485, 1394.7064, 2836.9905, 5282.96, 242.75461, 1886.8771, 2694.327, 814.3822]
2025-09-16 15:16:46,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [491.0, 52.0, 104.0, 258.0, 513.0, 1000.0, 47.0, 331.0, 517.0, 145.0]
2025-09-16 15:16:46,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 11 minutes, 59 seconds)
2025-09-16 15:18:40,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:18:43,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1007.94250 ± 821.924
2025-09-16 15:18:43,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [3211.4946, 998.05615, 1013.2276, 1532.9658, 1074.1497, 429.14465, 616.22284, 297.2464, 528.746, 378.17123]
2025-09-16 15:18:43,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [609.0, 198.0, 195.0, 292.0, 206.0, 89.0, 113.0, 57.0, 95.0, 76.0]
2025-09-16 15:18:43,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes, 55 seconds)
2025-09-16 15:20:40,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:20:43,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1012.93488 ± 594.265
2025-09-16 15:20:43,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [527.0066, 1124.5197, 1152.5249, 2026.4025, 889.737, 815.36646, 1000.92834, 153.33372, 2043.6268, 395.90225]
2025-09-16 15:20:43,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 200.0, 221.0, 392.0, 183.0, 161.0, 172.0, 29.0, 369.0, 76.0]
2025-09-16 15:20:43,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 56 seconds)
2025-09-16 15:22:38,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:22:42,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1441.81348 ± 935.585
2025-09-16 15:22:42,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1596.6807, 1881.0731, 1079.8484, 2293.7769, 2963.636, 640.5937, 107.63287, 455.96286, 2657.7976, 741.1328]
2025-09-16 15:22:42,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [326.0, 349.0, 200.0, 429.0, 565.0, 122.0, 21.0, 85.0, 496.0, 144.0]
2025-09-16 15:22:42,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 59 seconds)
2025-09-16 15:24:38,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:24:42,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1691.96057 ± 1461.195
2025-09-16 15:24:42,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [429.91556, 518.32776, 142.86267, 2694.8909, 980.4976, 4051.3032, 1668.075, 1440.0809, 554.8106, 4438.842]
2025-09-16 15:24:42,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [75.0, 107.0, 27.0, 524.0, 184.0, 768.0, 302.0, 264.0, 114.0, 836.0]
2025-09-16 15:24:42,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 58 seconds)
2025-09-16 15:26:38,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:26:42,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1603.49048 ± 1001.616
2025-09-16 15:26:42,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1300.3383, 899.5972, 309.38254, 3115.7021, 481.9475, 1003.035, 2599.2515, 3290.3325, 1435.9357, 1599.3824]
2025-09-16 15:26:42,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [258.0, 162.0, 57.0, 601.0, 94.0, 189.0, 498.0, 623.0, 274.0, 316.0]
2025-09-16 15:26:42,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 59 seconds)
2025-09-16 15:28:45,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:28:49,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1221 [DEBUG]: Total Reward: 1486.28967 ± 1203.526
2025-09-16 15:28:49,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1222 [DEBUG]: All rewards: [1123.4528, 1063.9836, 471.516, 1114.4601, 825.6075, 1128.7406, 1494.0199, 1012.47894, 4977.817, 1650.8203]
2025-09-16 15:28:49,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1223 [DEBUG]: All trajectory lengths: [201.0, 190.0, 88.0, 209.0, 157.0, 212.0, 293.0, 187.0, 903.0, 302.0]
2025-09-16 15:28:49,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-humanoid):1251 [DEBUG]: Training session finished
