2025-08-07 06:53:00,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc5-humanoid/ExtremeClogL1U23-bpql-mem24
2025-08-07 06:53:00,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc5-humanoid/ExtremeClogL1U23-bpql-mem24
2025-08-07 06:53:00,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x151124e63f50>}
2025-08-07 06:53:00,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1111 [DEBUG]: using device: cuda
2025-08-07 06:53:00,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1133 [INFO]: Creating new trainer
2025-08-07 06:53:00,243 baseline-bpql-noiseperc5-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=784, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-08-07 06:53:00,243 baseline-bpql-noiseperc5-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 06:53:02,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1194 [DEBUG]: Starting training session...
2025-08-07 06:53:02,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 1/100
2025-08-07 06:54:51,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:54:52,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 316.64963 ± 164.242
2025-08-07 06:54:52,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [492.73944, 391.99442, 506.80927, 181.26256, 515.7811, 161.12944, 140.33191, 480.17416, 140.40506, 155.86891]
2025-08-07 06:54:52,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 76.0, 96.0, 35.0, 96.0, 31.0, 27.0, 91.0, 27.0, 30.0]
2025-08-07 06:54:52,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1226 [INFO]: New best (316.65) for latency ExtremeClogL1U23
2025-08-07 06:54:52,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 2 minutes, 32 seconds)
2025-08-07 06:56:52,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:56:53,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 305.11710 ± 131.467
2025-08-07 06:56:53,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [166.28384, 165.12791, 386.67316, 467.96503, 477.49548, 144.45734, 363.71667, 350.85968, 129.37544, 399.21622]
2025-08-07 06:56:53,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 32.0, 73.0, 90.0, 92.0, 28.0, 68.0, 69.0, 25.0, 77.0]
2025-08-07 06:56:53,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 8 minutes, 54 seconds)
2025-08-07 06:58:51,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:58:52,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 273.06845 ± 113.633
2025-08-07 06:58:52,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [134.73497, 208.1663, 417.1826, 471.94824, 189.3115, 186.7595, 130.64243, 351.3143, 328.0086, 312.61615]
2025-08-07 06:58:52,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 40.0, 87.0, 97.0, 37.0, 36.0, 25.0, 65.0, 63.0, 63.0]
2025-08-07 06:58:52,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 8 minutes, 52 seconds)
2025-08-07 07:00:50,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:00:51,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 356.89471 ± 155.637
2025-08-07 07:00:51,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [379.0449, 645.5333, 335.04062, 503.9446, 160.63342, 192.23326, 475.67572, 310.33487, 430.84845, 135.65807]
2025-08-07 07:00:51,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [75.0, 124.0, 66.0, 101.0, 31.0, 37.0, 91.0, 65.0, 82.0, 26.0]
2025-08-07 07:00:51,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1226 [INFO]: New best (356.89) for latency ExtremeClogL1U23
2025-08-07 07:00:51,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 7 minutes, 54 seconds)
2025-08-07 07:02:50,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:02:51,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 363.14691 ± 145.018
2025-08-07 07:02:51,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [514.13, 321.8803, 502.32123, 417.28702, 372.86133, 130.14374, 543.07776, 171.73705, 191.58315, 466.4474]
2025-08-07 07:02:51,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [96.0, 62.0, 97.0, 79.0, 77.0, 25.0, 106.0, 33.0, 37.0, 87.0]
2025-08-07 07:02:51,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1226 [INFO]: New best (363.15) for latency ExtremeClogL1U23
2025-08-07 07:02:51,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 6 minutes, 38 seconds)
2025-08-07 07:04:49,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:04:50,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 278.60889 ± 140.268
2025-08-07 07:04:50,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [387.63385, 194.00146, 160.68062, 180.18698, 476.97827, 204.00314, 486.62064, 124.75621, 140.42711, 430.80066]
2025-08-07 07:04:50,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 38.0, 31.0, 35.0, 91.0, 40.0, 105.0, 24.0, 27.0, 91.0]
2025-08-07 07:04:50,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 7 minutes, 14 seconds)
2025-08-07 07:06:46,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:06:46,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 171.79863 ± 11.203
2025-08-07 07:06:46,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [159.58464, 173.06276, 170.56294, 190.67075, 165.11334, 178.55295, 191.2458, 160.29143, 168.78873, 160.11302]
2025-08-07 07:06:46,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 34.0, 33.0, 37.0, 32.0, 35.0, 38.0, 31.0, 33.0, 31.0]
2025-08-07 07:06:46,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 3 minutes, 53 seconds)
2025-08-07 07:08:45,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:08:45,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 194.87216 ± 65.447
2025-08-07 07:08:45,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [134.84564, 261.64624, 135.46501, 148.9502, 330.8409, 258.0152, 221.81372, 166.33588, 134.80077, 156.0079]
2025-08-07 07:08:45,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 52.0, 26.0, 29.0, 66.0, 51.0, 44.0, 32.0, 26.0, 30.0]
2025-08-07 07:08:45,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 1 minute, 58 seconds)
2025-08-07 07:10:44,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:10:45,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 234.68733 ± 45.205
2025-08-07 07:10:45,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [234.81285, 196.69916, 258.02792, 130.35349, 314.29614, 260.66022, 231.27226, 224.29462, 242.61053, 253.8463]
2025-08-07 07:10:45,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [46.0, 38.0, 53.0, 25.0, 62.0, 51.0, 46.0, 45.0, 48.0, 51.0]
2025-08-07 07:10:45,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 3 hours, 2 seconds)
2025-08-07 07:12:44,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:12:45,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 295.50571 ± 146.751
2025-08-07 07:12:45,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [395.8089, 150.6854, 369.26807, 171.31732, 140.46198, 620.32947, 177.44855, 199.6379, 388.60727, 341.49237]
2025-08-07 07:12:45,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 29.0, 69.0, 33.0, 27.0, 117.0, 34.0, 38.0, 73.0, 65.0]
2025-08-07 07:12:45,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 58 minutes, 5 seconds)
2025-08-07 07:14:44,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:14:45,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 302.67072 ± 108.602
2025-08-07 07:14:45,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [161.91528, 463.38425, 341.25018, 395.2301, 336.7481, 376.3542, 145.49313, 135.85728, 357.63913, 312.83566]
2025-08-07 07:14:45,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 87.0, 64.0, 74.0, 67.0, 70.0, 28.0, 26.0, 67.0, 59.0]
2025-08-07 07:14:45,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 56 minutes, 33 seconds)
2025-08-07 07:16:45,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:16:46,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 421.26874 ± 112.580
2025-08-07 07:16:46,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [419.26007, 341.68738, 361.0372, 472.93796, 466.88235, 527.01025, 566.2141, 374.22144, 521.94855, 161.48846]
2025-08-07 07:16:46,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 65.0, 68.0, 88.0, 93.0, 96.0, 106.0, 72.0, 99.0, 31.0]
2025-08-07 07:16:46,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1226 [INFO]: New best (421.27) for latency ExtremeClogL1U23
2025-08-07 07:16:46,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 56 minutes, 2 seconds)
2025-08-07 07:18:45,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:18:47,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 355.34210 ± 122.523
2025-08-07 07:18:47,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [184.34494, 496.2226, 329.01712, 178.21591, 420.1067, 420.69302, 186.01405, 383.04434, 455.19855, 500.56366]
2025-08-07 07:18:47,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [36.0, 98.0, 63.0, 34.0, 77.0, 82.0, 36.0, 72.0, 87.0, 97.0]
2025-08-07 07:18:47,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 54 minutes, 19 seconds)
2025-08-07 07:20:46,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:20:47,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 343.40710 ± 174.502
2025-08-07 07:20:47,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [186.85538, 164.50415, 363.06412, 150.42027, 371.88727, 501.35233, 156.0865, 387.1426, 716.1159, 436.64258]
2025-08-07 07:20:47,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [36.0, 32.0, 68.0, 29.0, 68.0, 96.0, 30.0, 73.0, 142.0, 85.0]
2025-08-07 07:20:47,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 52 minutes, 38 seconds)
2025-08-07 07:22:47,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:22:49,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 422.03409 ± 72.172
2025-08-07 07:22:49,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [368.81525, 376.60678, 511.44724, 434.2824, 500.56018, 362.91696, 473.15216, 482.27176, 270.89612, 439.39203]
2025-08-07 07:22:49,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [69.0, 82.0, 103.0, 81.0, 95.0, 68.0, 87.0, 90.0, 51.0, 81.0]
2025-08-07 07:22:49,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1226 [INFO]: New best (422.03) for latency ExtremeClogL1U23
2025-08-07 07:22:49,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 51 minutes, 7 seconds)
2025-08-07 07:24:48,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:24:49,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 367.74234 ± 125.477
2025-08-07 07:24:49,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [214.91223, 406.25037, 156.34853, 207.62843, 507.48868, 391.83163, 391.83926, 546.922, 392.18076, 462.0214]
2025-08-07 07:24:49,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [41.0, 76.0, 30.0, 40.0, 96.0, 73.0, 74.0, 103.0, 75.0, 86.0]
2025-08-07 07:24:49,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 49 minutes, 4 seconds)
2025-08-07 07:26:48,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:26:49,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 292.83905 ± 117.041
2025-08-07 07:26:49,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [329.51718, 389.44788, 397.24014, 456.82437, 189.12251, 170.95776, 411.5325, 150.24944, 124.808464, 308.69077]
2025-08-07 07:26:49,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [64.0, 87.0, 74.0, 86.0, 37.0, 33.0, 81.0, 29.0, 24.0, 58.0]
2025-08-07 07:26:49,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 46 minutes, 47 seconds)
2025-08-07 07:28:49,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:28:50,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 340.76419 ± 162.958
2025-08-07 07:28:50,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [150.5492, 181.82208, 450.8823, 431.04813, 140.20728, 472.3784, 477.83704, 384.57397, 588.1925, 130.15108]
2025-08-07 07:28:50,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 35.0, 85.0, 81.0, 27.0, 88.0, 93.0, 72.0, 119.0, 25.0]
2025-08-07 07:28:51,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 45 minutes, 3 seconds)
2025-08-07 07:30:49,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:30:50,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 315.87656 ± 133.086
2025-08-07 07:30:50,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [441.97708, 140.72978, 380.49228, 307.6143, 469.51608, 135.76956, 440.40668, 463.97488, 207.7352, 170.54988]
2025-08-07 07:30:50,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 27.0, 72.0, 59.0, 86.0, 26.0, 92.0, 98.0, 40.0, 33.0]
2025-08-07 07:30:50,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 42 minutes, 47 seconds)
2025-08-07 07:32:50,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:32:51,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 325.29535 ± 235.742
2025-08-07 07:32:51,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [359.80017, 457.1187, 244.23547, 957.4979, 391.6167, 162.42828, 156.27855, 210.48892, 183.51753, 129.9713]
2025-08-07 07:32:51,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [67.0, 88.0, 47.0, 188.0, 71.0, 31.0, 30.0, 41.0, 35.0, 25.0]
2025-08-07 07:32:51,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 40 minutes, 45 seconds)
2025-08-07 07:34:49,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:34:51,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 358.47086 ± 135.840
2025-08-07 07:34:51,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [155.83626, 491.93344, 190.46632, 469.1685, 213.44617, 488.25934, 454.156, 260.04678, 525.61884, 335.77682]
2025-08-07 07:34:51,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 90.0, 37.0, 97.0, 41.0, 94.0, 85.0, 53.0, 99.0, 71.0]
2025-08-07 07:34:51,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 38 minutes, 26 seconds)
2025-08-07 07:36:51,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:36:52,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 366.07043 ± 106.476
2025-08-07 07:36:52,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [435.468, 141.03198, 385.5492, 451.04156, 431.01865, 438.55566, 437.4976, 276.44193, 219.37497, 444.72452]
2025-08-07 07:36:52,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 27.0, 72.0, 82.0, 80.0, 96.0, 81.0, 53.0, 42.0, 81.0]
2025-08-07 07:36:52,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 36 minutes, 39 seconds)
2025-08-07 07:38:51,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:38:52,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 426.68164 ± 133.331
2025-08-07 07:38:52,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [613.47437, 458.6039, 537.96326, 196.68047, 535.2943, 487.84708, 182.82431, 438.73965, 426.47128, 388.9179]
2025-08-07 07:38:52,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 83.0, 101.0, 38.0, 101.0, 103.0, 35.0, 82.0, 80.0, 72.0]
2025-08-07 07:38:52,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1226 [INFO]: New best (426.68) for latency ExtremeClogL1U23
2025-08-07 07:38:52,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 34 minutes, 25 seconds)
2025-08-07 07:40:51,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:40:52,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 347.72714 ± 156.072
2025-08-07 07:40:52,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [155.37834, 187.56857, 642.2598, 428.76474, 440.4424, 187.87012, 452.18726, 161.6149, 407.89288, 413.2927]
2025-08-07 07:40:52,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 36.0, 121.0, 85.0, 79.0, 36.0, 83.0, 31.0, 91.0, 74.0]
2025-08-07 07:40:52,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 32 minutes, 33 seconds)
2025-08-07 07:42:52,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:42:53,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 249.40903 ± 142.018
2025-08-07 07:42:53,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [155.97969, 434.70035, 397.64642, 156.04645, 543.76215, 194.65912, 134.8647, 139.91232, 171.14218, 165.37698]
2025-08-07 07:42:53,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 81.0, 79.0, 30.0, 102.0, 37.0, 26.0, 27.0, 33.0, 32.0]
2025-08-07 07:42:53,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 30 minutes, 26 seconds)
2025-08-07 07:44:52,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:44:53,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 333.74219 ± 137.428
2025-08-07 07:44:53,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [447.0701, 371.2823, 140.40556, 267.98148, 510.48685, 420.2712, 472.01196, 130.42838, 166.92755, 410.5563]
2025-08-07 07:44:53,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [91.0, 71.0, 27.0, 52.0, 98.0, 77.0, 86.0, 25.0, 32.0, 82.0]
2025-08-07 07:44:53,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 28 minutes, 33 seconds)
2025-08-07 07:46:53,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:46:54,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 306.58539 ± 151.110
2025-08-07 07:46:54,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [421.38858, 176.2669, 176.24602, 418.5114, 562.3618, 125.01512, 404.98633, 161.10542, 169.62286, 450.34903]
2025-08-07 07:46:54,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 34.0, 34.0, 78.0, 106.0, 24.0, 78.0, 31.0, 33.0, 84.0]
2025-08-07 07:46:54,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 26 minutes, 35 seconds)
2025-08-07 07:48:53,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:48:55,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 369.74475 ± 157.762
2025-08-07 07:48:55,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [412.93124, 433.10095, 444.69366, 584.14325, 435.55014, 534.0733, 426.82016, 125.035126, 129.83702, 171.26265]
2025-08-07 07:48:55,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 80.0, 82.0, 112.0, 84.0, 115.0, 80.0, 24.0, 25.0, 33.0]
2025-08-07 07:48:55,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 24 minutes, 35 seconds)
2025-08-07 07:50:55,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:50:56,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 367.37854 ± 146.186
2025-08-07 07:50:56,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [467.66376, 441.15985, 529.1074, 430.56924, 418.19016, 470.86267, 124.878586, 141.71718, 468.71033, 180.92601]
2025-08-07 07:50:56,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 83.0, 99.0, 79.0, 86.0, 98.0, 24.0, 27.0, 92.0, 35.0]
2025-08-07 07:50:56,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 22 minutes, 50 seconds)
2025-08-07 07:52:55,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:52:56,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 360.61621 ± 183.179
2025-08-07 07:52:56,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [141.00832, 694.15765, 432.48923, 540.7648, 155.08086, 467.49106, 187.6337, 444.02124, 145.05006, 398.46533]
2025-08-07 07:52:56,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 140.0, 81.0, 102.0, 30.0, 85.0, 36.0, 82.0, 28.0, 74.0]
2025-08-07 07:52:56,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 20 minutes, 41 seconds)
2025-08-07 07:54:56,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:54:57,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 423.45053 ± 106.425
2025-08-07 07:54:57,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [431.8737, 443.4614, 415.37473, 566.3426, 382.65903, 351.239, 433.82623, 444.23492, 180.78537, 584.70825]
2025-08-07 07:54:57,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 84.0, 78.0, 105.0, 72.0, 66.0, 93.0, 83.0, 35.0, 110.0]
2025-08-07 07:54:57,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 18 minutes, 58 seconds)
2025-08-07 07:56:57,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:56:58,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 325.92188 ± 186.915
2025-08-07 07:56:58,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [160.8167, 495.09244, 462.78784, 151.40854, 504.83685, 146.16089, 155.60516, 388.99496, 658.44684, 135.06851]
2025-08-07 07:56:58,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 93.0, 85.0, 29.0, 108.0, 28.0, 30.0, 73.0, 127.0, 26.0]
2025-08-07 07:56:58,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 16 minutes, 58 seconds)
2025-08-07 07:58:57,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:58:59,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 404.64716 ± 186.539
2025-08-07 07:58:59,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [459.62122, 480.6145, 145.27216, 535.8456, 125.25701, 538.6462, 473.27127, 413.98706, 712.46497, 161.4916]
2025-08-07 07:58:59,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 88.0, 28.0, 101.0, 24.0, 101.0, 86.0, 75.0, 148.0, 31.0]
2025-08-07 07:58:59,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 14 minutes, 53 seconds)
2025-08-07 08:00:59,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:01:00,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 469.99554 ± 143.132
2025-08-07 08:01:00,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [608.01526, 416.3978, 140.17613, 388.18082, 450.07306, 580.90265, 528.25134, 552.0335, 376.45413, 659.4708]
2025-08-07 08:01:00,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [127.0, 78.0, 27.0, 72.0, 82.0, 109.0, 99.0, 107.0, 70.0, 129.0]
2025-08-07 08:01:00,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1226 [INFO]: New best (470.00) for latency ExtremeClogL1U23
2025-08-07 08:01:00,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 12 minutes, 57 seconds)
2025-08-07 08:03:00,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:03:01,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 344.75427 ± 175.650
2025-08-07 08:03:01,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [125.342896, 488.54813, 489.06372, 584.44666, 139.94093, 504.5684, 130.00409, 156.59993, 451.19504, 377.8331]
2025-08-07 08:03:01,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 89.0, 93.0, 108.0, 27.0, 96.0, 25.0, 30.0, 88.0, 71.0]
2025-08-07 08:03:01,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 11 minutes)
2025-08-07 08:05:01,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:05:03,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 359.29794 ± 166.396
2025-08-07 08:05:03,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [492.5909, 512.9955, 468.57004, 195.20189, 125.289986, 458.36618, 492.2795, 535.3199, 135.40146, 176.9642]
2025-08-07 08:05:03,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [91.0, 95.0, 98.0, 37.0, 24.0, 88.0, 90.0, 99.0, 26.0, 34.0]
2025-08-07 08:05:03,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 9 minutes, 10 seconds)
2025-08-07 08:07:02,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:07:03,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 430.35684 ± 150.186
2025-08-07 08:07:03,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [580.4489, 517.3315, 522.3228, 380.80994, 443.96695, 596.79193, 177.01912, 449.46375, 500.3869, 135.02701]
2025-08-07 08:07:03,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [106.0, 98.0, 109.0, 70.0, 81.0, 121.0, 34.0, 85.0, 92.0, 26.0]
2025-08-07 08:07:03,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 6 minutes, 59 seconds)
2025-08-07 08:09:03,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:09:04,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 409.43750 ± 128.549
2025-08-07 08:09:04,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [394.40152, 513.9873, 509.8015, 471.6456, 488.41693, 485.16696, 450.35394, 458.8589, 145.21815, 176.52426]
2025-08-07 08:09:04,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 108.0, 99.0, 88.0, 91.0, 90.0, 84.0, 85.0, 28.0, 34.0]
2025-08-07 08:09:04,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 5 minutes, 9 seconds)
2025-08-07 08:11:04,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:11:06,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 382.60818 ± 166.076
2025-08-07 08:11:06,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [161.89204, 504.9104, 532.0993, 406.39127, 150.23592, 368.85925, 425.67993, 477.89386, 647.6204, 150.49959]
2025-08-07 08:11:06,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 94.0, 99.0, 76.0, 29.0, 71.0, 82.0, 89.0, 124.0, 29.0]
2025-08-07 08:11:06,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 3 minutes, 6 seconds)
2025-08-07 08:13:05,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:13:06,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 336.18274 ± 171.708
2025-08-07 08:13:06,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [119.84781, 405.84213, 135.313, 391.08167, 146.32417, 546.034, 434.34592, 145.28015, 577.62354, 460.135]
2025-08-07 08:13:06,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 88.0, 26.0, 75.0, 28.0, 101.0, 80.0, 28.0, 115.0, 85.0]
2025-08-07 08:13:06,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 1 minute, 5 seconds)
2025-08-07 08:15:05,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:15:06,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 330.79465 ± 190.303
2025-08-07 08:15:06,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [548.46063, 424.93597, 135.59982, 135.29094, 151.09834, 140.63718, 163.64917, 551.4558, 584.9991, 471.81964]
2025-08-07 08:15:06,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 80.0, 26.0, 26.0, 29.0, 27.0, 31.0, 105.0, 120.0, 98.0]
2025-08-07 08:15:06,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 58 minutes, 46 seconds)
2025-08-07 08:17:07,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:17:08,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 397.88248 ± 162.735
2025-08-07 08:17:08,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [172.26051, 466.91083, 409.7466, 150.26906, 503.91855, 509.76733, 150.79872, 568.7009, 557.06476, 489.38733]
2025-08-07 08:17:08,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 84.0, 77.0, 29.0, 99.0, 93.0, 29.0, 109.0, 117.0, 92.0]
2025-08-07 08:17:08,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 56 minutes, 56 seconds)
2025-08-07 08:19:09,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:19:10,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 391.42462 ± 156.869
2025-08-07 08:19:10,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [462.92963, 528.3145, 182.47778, 139.29663, 141.11905, 464.76907, 513.36334, 479.4225, 490.79562, 511.7584]
2025-08-07 08:19:10,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 98.0, 35.0, 27.0, 27.0, 87.0, 95.0, 87.0, 90.0, 108.0]
2025-08-07 08:19:10,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 55 minutes, 8 seconds)
2025-08-07 08:21:09,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:21:11,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 405.60431 ± 120.930
2025-08-07 08:21:11,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [194.80238, 454.29108, 552.78595, 404.7465, 444.9073, 420.31137, 171.97903, 509.82214, 507.8025, 394.5947]
2025-08-07 08:21:11,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [38.0, 88.0, 106.0, 78.0, 82.0, 80.0, 33.0, 95.0, 92.0, 75.0]
2025-08-07 08:21:11,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 52 minutes, 56 seconds)
2025-08-07 08:23:11,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:23:13,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 489.00748 ± 130.629
2025-08-07 08:23:13,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [610.08594, 187.52588, 626.19165, 473.30682, 482.82193, 414.71353, 364.59396, 607.92615, 574.6307, 548.2779]
2025-08-07 08:23:13,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [114.0, 36.0, 128.0, 90.0, 107.0, 76.0, 70.0, 111.0, 114.0, 103.0]
2025-08-07 08:23:13,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1226 [INFO]: New best (489.01) for latency ExtremeClogL1U23
2025-08-07 08:23:13,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 51 minutes, 14 seconds)
2025-08-07 08:25:13,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:25:14,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 371.25226 ± 184.514
2025-08-07 08:25:14,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [411.40726, 431.15628, 531.7805, 141.34198, 144.87221, 526.20056, 462.3137, 692.0567, 211.19582, 160.19783]
2025-08-07 08:25:14,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 81.0, 99.0, 27.0, 28.0, 103.0, 86.0, 133.0, 41.0, 31.0]
2025-08-07 08:25:14,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 49 minutes, 24 seconds)
2025-08-07 08:27:13,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:27:14,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 349.10074 ± 158.341
2025-08-07 08:27:14,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [401.50107, 140.39137, 508.92868, 423.30206, 167.55287, 536.5715, 458.1221, 513.4251, 134.2159, 206.99649]
2025-08-07 08:27:14,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 27.0, 94.0, 78.0, 32.0, 100.0, 87.0, 105.0, 26.0, 40.0]
2025-08-07 08:27:14,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 47 minutes, 7 seconds)
2025-08-07 08:29:13,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:29:14,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 387.56821 ± 133.487
2025-08-07 08:29:14,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [478.9699, 411.30292, 188.10104, 398.99518, 409.09897, 525.73157, 530.77124, 160.15266, 249.21759, 523.3411]
2025-08-07 08:29:14,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 80.0, 36.0, 72.0, 74.0, 98.0, 99.0, 31.0, 48.0, 96.0]
2025-08-07 08:29:14,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 44 minutes, 41 seconds)
2025-08-07 08:31:13,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:31:15,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 437.92545 ± 217.098
2025-08-07 08:31:15,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [667.0871, 513.81354, 414.31705, 463.22455, 146.39272, 465.6456, 135.12737, 550.0934, 187.0393, 836.5138]
2025-08-07 08:31:15,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [125.0, 94.0, 80.0, 94.0, 28.0, 87.0, 26.0, 103.0, 36.0, 168.0]
2025-08-07 08:31:15,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 42 minutes, 41 seconds)
2025-08-07 08:33:13,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:33:14,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 319.58777 ± 162.421
2025-08-07 08:33:14,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [154.55713, 156.30977, 175.47964, 434.57037, 438.88416, 150.38203, 532.03186, 175.996, 559.6157, 418.05096]
2025-08-07 08:33:14,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 30.0, 34.0, 81.0, 93.0, 29.0, 104.0, 34.0, 106.0, 77.0]
2025-08-07 08:33:14,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 40 minutes, 14 seconds)
2025-08-07 08:35:13,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:35:14,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 408.62128 ± 217.084
2025-08-07 08:35:14,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [545.24945, 759.98944, 187.61427, 363.81, 521.9948, 171.46109, 717.19354, 176.53839, 171.50693, 470.85498]
2025-08-07 08:35:14,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 145.0, 36.0, 71.0, 108.0, 33.0, 137.0, 34.0, 33.0, 88.0]
2025-08-07 08:35:14,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 37 minutes, 57 seconds)
2025-08-07 08:37:13,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:37:14,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 347.95844 ± 162.553
2025-08-07 08:37:14,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [461.71143, 177.34985, 440.00183, 392.72467, 130.04578, 537.9068, 165.63611, 547.9216, 475.15344, 151.13289]
2025-08-07 08:37:14,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 34.0, 82.0, 73.0, 25.0, 102.0, 32.0, 103.0, 88.0, 29.0]
2025-08-07 08:37:14,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 36 minutes, 1 second)
2025-08-07 08:39:12,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:39:13,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 346.07013 ± 212.871
2025-08-07 08:39:13,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [135.45879, 514.0282, 702.7733, 437.7966, 162.54259, 129.80983, 155.38654, 493.8826, 134.74847, 594.2744]
2025-08-07 08:39:13,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 100.0, 132.0, 81.0, 31.0, 25.0, 30.0, 89.0, 26.0, 108.0]
2025-08-07 08:39:13,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 33 minutes, 50 seconds)
2025-08-07 08:41:12,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:41:14,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 428.08716 ± 196.556
2025-08-07 08:41:14,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [492.7591, 452.5842, 156.24763, 649.44086, 457.4158, 715.0524, 155.61623, 499.77847, 556.1584, 145.8183]
2025-08-07 08:41:14,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [101.0, 94.0, 30.0, 128.0, 84.0, 141.0, 30.0, 92.0, 102.0, 28.0]
2025-08-07 08:41:14,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 31 minutes, 50 seconds)
2025-08-07 08:43:12,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:43:13,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 348.63181 ± 163.509
2025-08-07 08:43:13,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [166.89374, 438.69382, 469.25177, 145.61893, 591.0242, 449.1564, 500.74927, 144.8769, 167.42036, 412.63272]
2025-08-07 08:43:13,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 83.0, 88.0, 28.0, 122.0, 83.0, 93.0, 28.0, 32.0, 76.0]
2025-08-07 08:43:13,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 29 minutes, 47 seconds)
2025-08-07 08:45:12,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:45:14,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 420.13678 ± 186.531
2025-08-07 08:45:14,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [393.2868, 189.27791, 156.29414, 454.22705, 434.3348, 622.0756, 552.71783, 668.249, 145.17221, 585.73236]
2025-08-07 08:45:14,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 36.0, 30.0, 85.0, 79.0, 127.0, 102.0, 128.0, 28.0, 107.0]
2025-08-07 08:45:14,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 27 minutes, 55 seconds)
2025-08-07 08:47:11,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:47:13,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 348.05746 ± 211.130
2025-08-07 08:47:13,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [722.0646, 170.64734, 135.12631, 155.73485, 443.2766, 548.9536, 581.3815, 429.70007, 163.73447, 129.95566]
2025-08-07 08:47:13,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [139.0, 33.0, 26.0, 30.0, 79.0, 103.0, 109.0, 80.0, 31.0, 25.0]
2025-08-07 08:47:13,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 25 minutes, 44 seconds)
2025-08-07 08:49:11,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:49:12,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 443.85815 ± 206.525
2025-08-07 08:49:12,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [160.39967, 150.54305, 550.1085, 467.16968, 181.00258, 566.19366, 425.56265, 683.05066, 486.8865, 767.6644]
2025-08-07 08:49:12,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 29.0, 106.0, 85.0, 35.0, 117.0, 79.0, 143.0, 90.0, 145.0]
2025-08-07 08:49:12,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 23 minutes, 49 seconds)
2025-08-07 08:51:10,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:51:12,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 466.27969 ± 177.193
2025-08-07 08:51:12,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [144.47021, 696.5681, 502.19928, 548.51215, 145.67157, 511.69333, 508.01245, 612.6863, 595.1681, 397.81534]
2025-08-07 08:51:12,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 142.0, 92.0, 103.0, 28.0, 110.0, 99.0, 115.0, 113.0, 73.0]
2025-08-07 08:51:12,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 21 minutes, 44 seconds)
2025-08-07 08:53:10,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:53:11,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 385.48813 ± 178.490
2025-08-07 08:53:11,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [459.6744, 556.12634, 643.8808, 506.40826, 130.28946, 156.31914, 393.12744, 528.4092, 134.72643, 345.91986]
2025-08-07 08:53:11,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 111.0, 122.0, 95.0, 25.0, 30.0, 74.0, 96.0, 26.0, 65.0]
2025-08-07 08:53:11,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 19 minutes, 47 seconds)
2025-08-07 08:55:11,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:55:12,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 329.10016 ± 215.454
2025-08-07 08:55:12,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [124.400795, 510.897, 159.5181, 171.42238, 501.18225, 195.11665, 456.0724, 807.5075, 177.18257, 187.70192]
2025-08-07 08:55:12,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 106.0, 31.0, 33.0, 102.0, 37.0, 86.0, 156.0, 34.0, 36.0]
2025-08-07 08:55:12,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 17 minutes, 45 seconds)
2025-08-07 08:57:10,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:57:11,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 462.73721 ± 129.323
2025-08-07 08:57:11,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [500.85974, 489.61377, 390.78006, 445.33307, 481.78467, 140.59882, 668.0003, 572.73346, 492.7791, 444.88898]
2025-08-07 08:57:11,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 89.0, 73.0, 81.0, 89.0, 27.0, 128.0, 107.0, 94.0, 96.0]
2025-08-07 08:57:11,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 15 minutes, 50 seconds)
2025-08-07 08:59:09,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:59:11,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 389.50250 ± 163.561
2025-08-07 08:59:11,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [141.28253, 423.50735, 603.5677, 497.20868, 456.66968, 165.48518, 521.2086, 410.60803, 151.3545, 524.1329]
2025-08-07 08:59:11,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 89.0, 112.0, 97.0, 92.0, 32.0, 95.0, 77.0, 29.0, 112.0]
2025-08-07 08:59:11,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 13 minutes, 48 seconds)
2025-08-07 09:01:10,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:01:10,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 298.38388 ± 190.443
2025-08-07 09:01:10,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [608.1739, 324.56702, 144.75809, 167.25558, 130.06543, 176.46622, 564.18195, 145.968, 561.53644, 160.86615]
2025-08-07 09:01:10,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [121.0, 64.0, 28.0, 32.0, 25.0, 34.0, 114.0, 28.0, 102.0, 31.0]
2025-08-07 09:01:10,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 11 minutes, 50 seconds)
2025-08-07 09:03:10,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:03:11,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 278.98578 ± 192.302
2025-08-07 09:03:11,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [194.90884, 532.0959, 151.48097, 175.50786, 495.92313, 130.47581, 666.19696, 135.03308, 134.83073, 173.4042]
2025-08-07 09:03:11,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [37.0, 100.0, 29.0, 34.0, 92.0, 25.0, 128.0, 26.0, 26.0, 34.0]
2025-08-07 09:03:11,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 9 minutes, 53 seconds)
2025-08-07 09:05:08,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:05:10,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 555.17188 ± 133.380
2025-08-07 09:05:10,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [668.73206, 511.05066, 541.6335, 664.1303, 604.6106, 465.2074, 212.2301, 614.9423, 576.84503, 692.3366]
2025-08-07 09:05:10,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [130.0, 106.0, 101.0, 128.0, 124.0, 98.0, 41.0, 120.0, 108.0, 143.0]
2025-08-07 09:05:10,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1226 [INFO]: New best (555.17) for latency ExtremeClogL1U23
2025-08-07 09:05:10,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 7 minutes, 46 seconds)
2025-08-07 09:07:09,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:07:10,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 382.17319 ± 150.104
2025-08-07 09:07:10,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [171.44046, 155.99521, 420.1816, 502.69733, 543.7618, 461.59482, 538.772, 170.86404, 491.24088, 365.18378]
2025-08-07 09:07:10,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 30.0, 80.0, 93.0, 100.0, 86.0, 104.0, 33.0, 91.0, 72.0]
2025-08-07 09:07:10,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 5 minutes, 51 seconds)
2025-08-07 09:09:08,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:09:09,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 459.34708 ± 172.656
2025-08-07 09:09:09,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [140.56018, 573.2515, 719.59515, 419.97046, 478.28357, 528.443, 473.5405, 436.97522, 183.61594, 639.2353]
2025-08-07 09:09:09,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 107.0, 151.0, 80.0, 88.0, 96.0, 87.0, 84.0, 35.0, 131.0]
2025-08-07 09:09:09,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 3 minutes, 53 seconds)
2025-08-07 09:11:09,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:11:10,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 420.50684 ± 193.391
2025-08-07 09:11:10,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [473.06866, 704.2313, 527.2852, 520.3014, 151.3975, 544.278, 422.85907, 579.4245, 129.8686, 152.35393]
2025-08-07 09:11:10,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 132.0, 100.0, 95.0, 29.0, 100.0, 77.0, 109.0, 25.0, 29.0]
2025-08-07 09:11:10,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 1 minute, 59 seconds)
2025-08-07 09:13:08,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:13:10,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 444.01318 ± 175.585
2025-08-07 09:13:10,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [668.6398, 512.2472, 603.66797, 584.1886, 411.46527, 570.69556, 140.19688, 444.85458, 140.36378, 363.8122]
2025-08-07 09:13:10,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [126.0, 92.0, 112.0, 106.0, 79.0, 108.0, 27.0, 84.0, 27.0, 71.0]
2025-08-07 09:13:10,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 59 minutes, 55 seconds)
2025-08-07 09:15:09,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:15:11,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 537.56073 ± 166.308
2025-08-07 09:15:11,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [530.0644, 140.68724, 596.37665, 519.4117, 819.85736, 711.2556, 520.0717, 487.86475, 491.25153, 558.76605]
2025-08-07 09:15:11,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 27.0, 113.0, 97.0, 159.0, 134.0, 109.0, 102.0, 88.0, 106.0]
2025-08-07 09:15:11,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 58 minutes, 7 seconds)
2025-08-07 09:17:12,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:17:13,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 322.39252 ± 189.413
2025-08-07 09:17:13,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [151.04483, 559.64966, 130.83961, 543.8914, 530.6847, 196.20871, 161.0858, 577.38043, 178.63466, 194.5052]
2025-08-07 09:17:13,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 112.0, 25.0, 107.0, 103.0, 38.0, 31.0, 107.0, 34.0, 38.0]
2025-08-07 09:17:13,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 56 minutes, 18 seconds)
2025-08-07 09:19:11,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:19:12,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 499.33032 ± 201.868
2025-08-07 09:19:12,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [161.56769, 568.2612, 521.6947, 675.0152, 165.29396, 728.0528, 496.5435, 423.7266, 792.86633, 460.2813]
2025-08-07 09:19:12,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 118.0, 111.0, 144.0, 32.0, 152.0, 97.0, 87.0, 147.0, 85.0]
2025-08-07 09:19:13,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 54 minutes, 16 seconds)
2025-08-07 09:21:12,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:21:13,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 390.05771 ± 257.371
2025-08-07 09:21:13,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [161.46945, 150.89844, 533.03564, 155.27954, 545.67584, 114.525116, 729.8847, 496.415, 840.79846, 172.59521]
2025-08-07 09:21:13,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 29.0, 101.0, 30.0, 99.0, 22.0, 143.0, 96.0, 157.0, 33.0]
2025-08-07 09:21:13,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 52 minutes, 13 seconds)
2025-08-07 09:23:14,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:23:15,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 407.53912 ± 181.669
2025-08-07 09:23:15,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [472.93045, 455.12814, 503.37296, 683.548, 494.2024, 161.64915, 487.23947, 542.05365, 135.00398, 140.26335]
2025-08-07 09:23:15,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 86.0, 94.0, 139.0, 91.0, 31.0, 88.0, 102.0, 26.0, 27.0]
2025-08-07 09:23:15,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 50 minutes, 26 seconds)
2025-08-07 09:25:14,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:25:16,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 457.25568 ± 163.200
2025-08-07 09:25:16,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [456.85358, 590.7584, 655.22284, 481.91632, 159.8318, 512.8701, 546.3208, 146.68945, 451.90677, 570.1867]
2025-08-07 09:25:16,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 108.0, 124.0, 87.0, 31.0, 93.0, 100.0, 28.0, 83.0, 114.0]
2025-08-07 09:25:16,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 48 minutes, 23 seconds)
2025-08-07 09:27:15,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:27:17,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 375.93698 ± 208.313
2025-08-07 09:27:17,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [164.6582, 141.32773, 150.365, 449.22302, 744.5998, 398.62946, 160.2731, 659.5623, 395.4227, 495.30856]
2025-08-07 09:27:17,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 27.0, 29.0, 82.0, 140.0, 83.0, 31.0, 137.0, 71.0, 91.0]
2025-08-07 09:27:17,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 46 minutes, 15 seconds)
2025-08-07 09:29:16,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:29:17,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 415.96948 ± 185.422
2025-08-07 09:29:17,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [442.3665, 177.69435, 649.9927, 483.57898, 552.97723, 135.59962, 135.11198, 436.36942, 595.4605, 550.5433]
2025-08-07 09:29:17,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 34.0, 123.0, 89.0, 124.0, 26.0, 26.0, 84.0, 112.0, 100.0]
2025-08-07 09:29:17,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 44 minutes, 19 seconds)
2025-08-07 09:31:16,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:31:18,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 438.35919 ± 249.891
2025-08-07 09:31:18,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [574.5965, 726.6216, 598.9569, 135.11227, 463.5758, 135.83604, 173.06763, 154.70476, 627.29114, 793.82935]
2025-08-07 09:31:18,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [106.0, 136.0, 124.0, 26.0, 86.0, 26.0, 33.0, 30.0, 116.0, 149.0]
2025-08-07 09:31:18,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 42 minutes, 19 seconds)
2025-08-07 09:33:18,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:33:20,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 438.36810 ± 160.811
2025-08-07 09:33:20,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [438.90338, 141.30092, 470.9688, 523.80786, 600.02673, 688.29675, 501.33038, 445.01224, 395.5724, 178.46182]
2025-08-07 09:33:20,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 27.0, 88.0, 96.0, 127.0, 125.0, 91.0, 81.0, 73.0, 34.0]
2025-08-07 09:33:20,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 40 minutes, 18 seconds)
2025-08-07 09:35:19,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:35:20,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 466.15161 ± 170.585
2025-08-07 09:35:20,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [663.2185, 563.19116, 528.7281, 582.2411, 130.2992, 506.95856, 538.79834, 454.03363, 150.7094, 543.338]
2025-08-07 09:35:20,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [126.0, 109.0, 112.0, 116.0, 25.0, 93.0, 96.0, 84.0, 29.0, 105.0]
2025-08-07 09:35:20,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 38 minutes, 16 seconds)
2025-08-07 09:37:20,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:37:21,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 296.93219 ± 199.996
2025-08-07 09:37:21,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [119.63321, 130.10637, 383.68066, 124.94227, 156.9552, 139.70242, 176.39052, 622.2666, 595.23, 520.4148]
2025-08-07 09:37:21,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 25.0, 83.0, 24.0, 30.0, 27.0, 34.0, 121.0, 113.0, 99.0]
2025-08-07 09:37:21,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 36 minutes, 15 seconds)
2025-08-07 09:39:21,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:39:22,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 381.78229 ± 191.661
2025-08-07 09:39:22,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [135.28162, 172.46046, 610.092, 622.95953, 427.09366, 436.66852, 542.6791, 177.31364, 543.1549, 150.1197]
2025-08-07 09:39:22,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 33.0, 114.0, 115.0, 90.0, 82.0, 101.0, 34.0, 101.0, 29.0]
2025-08-07 09:39:22,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 34 minutes, 16 seconds)
2025-08-07 09:41:22,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:41:24,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 387.58844 ± 256.136
2025-08-07 09:41:24,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [613.5921, 862.5592, 625.589, 166.49213, 140.46947, 475.24548, 145.70682, 139.4166, 150.09872, 556.7148]
2025-08-07 09:41:24,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [113.0, 167.0, 114.0, 32.0, 27.0, 88.0, 28.0, 27.0, 29.0, 104.0]
2025-08-07 09:41:24,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 32 minutes, 18 seconds)
2025-08-07 09:43:24,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:43:25,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 389.89005 ± 185.229
2025-08-07 09:43:25,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [188.71078, 476.5116, 462.14276, 189.85818, 464.24625, 542.9213, 583.85394, 669.28125, 150.93135, 170.4428]
2025-08-07 09:43:25,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [36.0, 90.0, 88.0, 37.0, 86.0, 109.0, 107.0, 124.0, 29.0, 33.0]
2025-08-07 09:43:25,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 30 minutes, 15 seconds)
2025-08-07 09:45:23,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:45:25,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 388.08917 ± 189.852
2025-08-07 09:45:25,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [136.08586, 431.74277, 208.949, 498.2811, 539.58496, 566.6054, 146.1644, 550.0199, 166.3019, 637.156]
2025-08-07 09:45:25,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 83.0, 40.0, 90.0, 103.0, 108.0, 28.0, 102.0, 32.0, 124.0]
2025-08-07 09:45:25,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 28 minutes, 12 seconds)
2025-08-07 09:47:25,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:47:26,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 526.01996 ± 186.599
2025-08-07 09:47:26,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [596.918, 450.55154, 920.4824, 187.91118, 423.42145, 537.8391, 735.0083, 499.6474, 487.37653, 421.0439]
2025-08-07 09:47:26,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [115.0, 97.0, 181.0, 36.0, 80.0, 107.0, 140.0, 93.0, 89.0, 90.0]
2025-08-07 09:47:26,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 26 minutes, 14 seconds)
2025-08-07 09:49:26,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:49:28,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 477.53461 ± 216.417
2025-08-07 09:49:28,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [171.67683, 643.2301, 165.6526, 564.4919, 731.5787, 140.27164, 686.8867, 586.5194, 535.8207, 549.2177]
2025-08-07 09:49:28,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 129.0, 32.0, 104.0, 135.0, 27.0, 130.0, 111.0, 98.0, 100.0]
2025-08-07 09:49:28,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 24 minutes, 14 seconds)
2025-08-07 09:51:28,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:51:29,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 442.44415 ± 258.132
2025-08-07 09:51:29,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [613.8771, 590.5198, 488.2921, 157.28038, 195.44212, 177.32655, 523.4565, 989.9711, 150.99086, 537.28516]
2025-08-07 09:51:29,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [117.0, 113.0, 89.0, 30.0, 37.0, 34.0, 94.0, 181.0, 29.0, 100.0]
2025-08-07 09:51:29,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 22 minutes, 12 seconds)
2025-08-07 09:53:29,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:53:30,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 436.77631 ± 194.724
2025-08-07 09:53:30,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [445.91367, 472.82376, 472.3809, 140.23634, 527.9923, 196.74533, 151.63339, 634.78204, 654.76526, 670.49005]
2025-08-07 09:53:30,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 84.0, 86.0, 27.0, 97.0, 38.0, 29.0, 121.0, 127.0, 122.0]
2025-08-07 09:53:30,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 20 minutes, 10 seconds)
2025-08-07 09:55:30,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:55:32,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 439.42032 ± 185.485
2025-08-07 09:55:32,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [555.5275, 520.1957, 150.78961, 554.8204, 587.1178, 651.0456, 405.23627, 156.92284, 603.3103, 209.2372]
2025-08-07 09:55:32,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [102.0, 113.0, 29.0, 102.0, 108.0, 119.0, 82.0, 30.0, 111.0, 40.0]
2025-08-07 09:55:32,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 18 minutes, 12 seconds)
2025-08-07 09:57:31,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:57:33,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 461.13434 ± 216.349
2025-08-07 09:57:33,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [709.8145, 617.8895, 135.16313, 150.04085, 145.85027, 569.34576, 545.33655, 469.20383, 621.22546, 647.473]
2025-08-07 09:57:33,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [134.0, 117.0, 26.0, 29.0, 28.0, 107.0, 100.0, 88.0, 117.0, 141.0]
2025-08-07 09:57:33,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 16 minutes, 10 seconds)
2025-08-07 09:59:32,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:59:34,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 429.84308 ± 228.893
2025-08-07 09:59:34,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [702.69147, 605.03625, 151.01813, 140.35844, 635.37366, 165.54091, 600.14233, 169.99088, 643.3228, 484.9557]
2025-08-07 09:59:34,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [131.0, 125.0, 29.0, 27.0, 126.0, 32.0, 119.0, 33.0, 137.0, 92.0]
2025-08-07 09:59:34,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 14 minutes, 8 seconds)
2025-08-07 10:01:34,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:01:35,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 420.43173 ± 239.481
2025-08-07 10:01:35,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [135.27644, 164.25975, 744.8118, 825.76447, 166.64201, 505.50436, 181.87428, 402.009, 538.4461, 539.7294]
2025-08-07 10:01:35,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 32.0, 140.0, 155.0, 32.0, 94.0, 35.0, 75.0, 100.0, 98.0]
2025-08-07 10:01:35,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 12 minutes, 6 seconds)
2025-08-07 10:03:34,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:03:35,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 284.74615 ± 209.655
2025-08-07 10:03:35,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [135.06766, 140.7348, 139.21007, 517.4143, 703.60474, 571.6399, 176.4571, 135.30515, 172.17055, 155.85724]
2025-08-07 10:03:35,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 27.0, 27.0, 95.0, 130.0, 105.0, 34.0, 26.0, 33.0, 30.0]
2025-08-07 10:03:35,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 10 minutes, 4 seconds)
2025-08-07 10:05:35,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:05:37,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 500.85782 ± 217.878
2025-08-07 10:05:37,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [162.94533, 770.6847, 542.59766, 590.6365, 119.28725, 540.0802, 499.18973, 842.57434, 548.19965, 392.38272]
2025-08-07 10:05:37,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 144.0, 100.0, 109.0, 23.0, 101.0, 93.0, 158.0, 107.0, 72.0]
2025-08-07 10:05:37,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes, 4 seconds)
2025-08-07 10:07:36,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:07:38,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 572.09973 ± 190.099
2025-08-07 10:07:38,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [587.17584, 435.04068, 550.6185, 725.43713, 842.071, 480.309, 562.25446, 781.9354, 620.63025, 135.52528]
2025-08-07 10:07:38,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [111.0, 79.0, 105.0, 132.0, 156.0, 94.0, 104.0, 146.0, 113.0, 26.0]
2025-08-07 10:07:38,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1226 [INFO]: New best (572.10) for latency ExtremeClogL1U23
2025-08-07 10:07:38,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 2 seconds)
2025-08-07 10:09:38,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:09:39,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 509.64948 ± 243.718
2025-08-07 10:09:39,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [700.2606, 679.8731, 166.21541, 124.52502, 582.5869, 605.9722, 670.5917, 827.4532, 571.53174, 167.48434]
2025-08-07 10:09:39,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [128.0, 124.0, 32.0, 24.0, 107.0, 113.0, 126.0, 157.0, 106.0, 32.0]
2025-08-07 10:09:39,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 2 seconds)
2025-08-07 10:11:40,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:11:42,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 475.31577 ± 221.060
2025-08-07 10:11:42,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [184.09071, 581.849, 145.82286, 747.86615, 500.0101, 539.67645, 688.69446, 621.5715, 130.382, 613.1949]
2025-08-07 10:11:42,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [36.0, 126.0, 28.0, 137.0, 91.0, 102.0, 127.0, 122.0, 25.0, 113.0]
2025-08-07 10:11:42,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 1 second)
2025-08-07 10:13:41,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:13:42,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 544.91541 ± 165.019
2025-08-07 10:13:42,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [546.786, 784.4208, 516.45337, 484.7132, 555.7631, 472.71967, 144.97104, 648.0197, 565.5379, 729.76904]
2025-08-07 10:13:42,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [112.0, 146.0, 109.0, 90.0, 105.0, 86.0, 28.0, 118.0, 119.0, 133.0]
2025-08-07 10:13:42,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1251 [DEBUG]: Training session finished
