2025-08-07 07:00:07,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc20-humanoid/ExtremeClogL1U23-bpql-mem24
2025-08-07 07:00:07,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc20-humanoid/ExtremeClogL1U23-bpql-mem24
2025-08-07 07:00:07,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x14ed1c36a650>}
2025-08-07 07:00:07,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1111 [DEBUG]: using device: cuda
2025-08-07 07:00:07,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1133 [INFO]: Creating new trainer
2025-08-07 07:00:07,298 baseline-bpql-noiseperc20-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=784, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-08-07 07:00:07,298 baseline-bpql-noiseperc20-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 07:00:09,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1194 [DEBUG]: Starting training session...
2025-08-07 07:00:09,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 1/100
2025-08-07 07:01:58,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:01:59,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 120.50973 ± 20.801
2025-08-07 07:01:59,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [108.14542, 106.30089, 133.8911, 106.71353, 123.81277, 164.35081, 135.35732, 90.63121, 134.4321, 101.462135]
2025-08-07 07:01:59,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 21.0, 26.0, 21.0, 24.0, 31.0, 26.0, 18.0, 26.0, 20.0]
2025-08-07 07:01:59,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1226 [INFO]: New best (120.51) for latency ExtremeClogL1U23
2025-08-07 07:01:59,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 1 minute, 25 seconds)
2025-08-07 07:03:57,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:03:57,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 118.52071 ± 21.777
2025-08-07 07:03:57,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [102.54011, 156.05014, 95.87385, 110.862885, 106.02563, 136.04713, 112.87494, 95.76578, 112.47051, 156.69612]
2025-08-07 07:03:57,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 30.0, 19.0, 22.0, 21.0, 26.0, 22.0, 19.0, 22.0, 30.0]
2025-08-07 07:03:57,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 6 minutes, 26 seconds)
2025-08-07 07:05:56,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:05:57,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 189.03430 ± 128.971
2025-08-07 07:05:57,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [122.58216, 116.849815, 163.29768, 243.18915, 305.29056, 113.5335, 108.99496, 103.950935, 90.99229, 521.6619]
2025-08-07 07:05:57,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 23.0, 31.0, 48.0, 58.0, 22.0, 21.0, 21.0, 18.0, 99.0]
2025-08-07 07:05:57,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1226 [INFO]: New best (189.03) for latency ExtremeClogL1U23
2025-08-07 07:05:57,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 7 minutes, 33 seconds)
2025-08-07 07:07:55,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:07:55,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 137.34100 ± 58.032
2025-08-07 07:07:55,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [147.97115, 110.64358, 101.727295, 96.34042, 295.95523, 101.69799, 119.378815, 113.70394, 107.445816, 178.54582]
2025-08-07 07:07:55,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 22.0, 20.0, 19.0, 57.0, 20.0, 23.0, 22.0, 21.0, 35.0]
2025-08-07 07:07:55,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 6 minutes, 42 seconds)
2025-08-07 07:09:53,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:09:54,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 121.60000 ± 61.234
2025-08-07 07:09:54,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [113.05348, 90.0226, 108.70413, 89.39665, 90.78416, 123.37395, 90.00486, 105.45762, 102.87897, 302.32352]
2025-08-07 07:09:54,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 18.0, 21.0, 18.0, 18.0, 24.0, 18.0, 21.0, 20.0, 59.0]
2025-08-07 07:09:54,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 5 minutes, 15 seconds)
2025-08-07 07:11:53,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:11:53,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 185.69580 ± 139.470
2025-08-07 07:11:53,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [517.9053, 102.539856, 96.281204, 117.88892, 95.35121, 167.06674, 119.93092, 144.74573, 101.906, 393.3421]
2025-08-07 07:11:53,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [107.0, 20.0, 19.0, 23.0, 19.0, 33.0, 23.0, 28.0, 20.0, 75.0]
2025-08-07 07:11:53,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 6 minutes, 20 seconds)
2025-08-07 07:13:51,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:13:52,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 172.60153 ± 111.344
2025-08-07 07:13:52,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [112.03435, 459.2486, 133.85068, 299.90796, 175.7675, 114.7941, 118.66479, 107.37237, 102.09407, 102.280785]
2025-08-07 07:13:52,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 86.0, 26.0, 56.0, 34.0, 22.0, 23.0, 21.0, 20.0, 20.0]
2025-08-07 07:13:52,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 4 minutes, 26 seconds)
2025-08-07 07:15:51,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:15:51,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 158.26614 ± 93.729
2025-08-07 07:15:51,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [112.188835, 142.81227, 118.275, 140.07133, 89.31924, 149.31429, 112.29442, 161.84319, 432.8613, 123.6815]
2025-08-07 07:15:51,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 28.0, 23.0, 27.0, 18.0, 29.0, 22.0, 31.0, 82.0, 24.0]
2025-08-07 07:15:51,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 2 minutes, 15 seconds)
2025-08-07 07:17:50,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:17:51,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 194.47546 ± 95.353
2025-08-07 07:17:51,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [161.5724, 138.71092, 89.487175, 402.9219, 307.0804, 150.91118, 129.52724, 170.60878, 116.71781, 277.21698]
2025-08-07 07:17:51,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 28.0, 18.0, 80.0, 59.0, 29.0, 25.0, 33.0, 23.0, 57.0]
2025-08-07 07:17:51,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1226 [INFO]: New best (194.48) for latency ExtremeClogL1U23
2025-08-07 07:17:51,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 3 hours, 32 seconds)
2025-08-07 07:19:49,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:19:50,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 170.19302 ± 101.138
2025-08-07 07:19:50,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [103.551994, 115.04032, 110.640854, 112.81161, 391.56622, 84.226654, 154.9427, 96.182526, 206.00256, 326.96487]
2025-08-07 07:19:50,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 22.0, 22.0, 22.0, 73.0, 17.0, 30.0, 19.0, 40.0, 62.0]
2025-08-07 07:19:50,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 58 minutes, 46 seconds)
2025-08-07 07:21:48,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:21:49,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 182.74736 ± 124.728
2025-08-07 07:21:49,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [118.63311, 150.54936, 101.18716, 101.4478, 124.803856, 487.5455, 358.78342, 113.47464, 117.43424, 153.61456]
2025-08-07 07:21:49,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 29.0, 20.0, 20.0, 24.0, 110.0, 78.0, 22.0, 23.0, 30.0]
2025-08-07 07:21:49,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 56 minutes, 34 seconds)
2025-08-07 07:23:48,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:23:48,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 189.75714 ± 101.788
2025-08-07 07:23:48,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [382.85605, 163.2347, 112.60328, 148.34885, 161.16476, 100.43621, 228.47797, 95.66741, 127.97974, 376.80252]
2025-08-07 07:23:48,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [75.0, 32.0, 22.0, 29.0, 31.0, 20.0, 44.0, 19.0, 25.0, 71.0]
2025-08-07 07:23:48,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 54 minutes, 58 seconds)
2025-08-07 07:25:47,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:25:48,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 237.86536 ± 177.025
2025-08-07 07:25:48,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [118.96438, 113.10792, 192.02397, 169.83784, 117.89107, 108.37481, 133.81125, 356.90918, 684.78546, 382.94772]
2025-08-07 07:25:48,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 22.0, 37.0, 33.0, 23.0, 21.0, 26.0, 69.0, 131.0, 74.0]
2025-08-07 07:25:48,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1226 [INFO]: New best (237.87) for latency ExtremeClogL1U23
2025-08-07 07:25:48,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 53 minutes, 6 seconds)
2025-08-07 07:27:47,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:27:47,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 153.08780 ± 91.795
2025-08-07 07:27:47,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [324.97452, 142.56532, 149.4543, 89.74697, 89.59357, 339.42563, 96.48085, 101.97589, 101.04172, 95.61928]
2025-08-07 07:27:47,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [71.0, 28.0, 29.0, 18.0, 18.0, 67.0, 19.0, 20.0, 20.0, 19.0]
2025-08-07 07:27:47,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 50 minutes, 57 seconds)
2025-08-07 07:29:46,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:29:46,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 239.36650 ± 174.368
2025-08-07 07:29:46,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [125.39421, 456.0572, 100.46399, 325.4497, 89.84967, 613.0037, 169.3517, 89.85939, 327.6471, 96.588394]
2025-08-07 07:29:46,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 89.0, 20.0, 67.0, 18.0, 124.0, 33.0, 18.0, 64.0, 19.0]
2025-08-07 07:29:46,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1226 [INFO]: New best (239.37) for latency ExtremeClogL1U23
2025-08-07 07:29:46,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 49 minutes, 6 seconds)
2025-08-07 07:31:45,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:31:46,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 137.41006 ± 66.803
2025-08-07 07:31:46,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [95.11634, 144.65744, 100.567505, 102.40802, 102.7669, 124.80042, 146.20287, 113.25574, 331.28934, 113.036095]
2025-08-07 07:31:46,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 28.0, 20.0, 20.0, 20.0, 24.0, 29.0, 22.0, 63.0, 22.0]
2025-08-07 07:31:46,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 47 minutes, 9 seconds)
2025-08-07 07:33:44,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:33:45,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 232.51407 ± 132.140
2025-08-07 07:33:45,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [166.36195, 161.3005, 379.55045, 119.266235, 108.66038, 419.0005, 96.60215, 344.64627, 418.9217, 110.83059]
2025-08-07 07:33:45,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 31.0, 71.0, 23.0, 21.0, 78.0, 19.0, 64.0, 81.0, 22.0]
2025-08-07 07:33:45,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 45 minutes, 2 seconds)
2025-08-07 07:35:43,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:35:44,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 159.84888 ± 96.371
2025-08-07 07:35:44,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [89.89016, 101.76651, 125.6324, 89.75052, 306.59894, 96.242676, 102.21658, 227.26154, 362.51697, 96.61237]
2025-08-07 07:35:44,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 20.0, 24.0, 18.0, 59.0, 19.0, 20.0, 43.0, 66.0, 19.0]
2025-08-07 07:35:44,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 42 minutes, 53 seconds)
2025-08-07 07:37:43,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:37:43,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 130.96548 ± 27.696
2025-08-07 07:37:43,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [135.28352, 147.92035, 129.91391, 95.57158, 200.61555, 114.127266, 118.29268, 102.583084, 134.05412, 131.29294]
2025-08-07 07:37:43,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 29.0, 25.0, 19.0, 38.0, 22.0, 23.0, 20.0, 26.0, 26.0]
2025-08-07 07:37:43,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 40 minutes, 55 seconds)
2025-08-07 07:39:43,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:39:43,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 191.27733 ± 122.313
2025-08-07 07:39:43,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [134.0849, 438.16455, 178.03423, 103.08672, 97.20935, 283.14185, 102.3983, 96.30727, 378.80722, 101.53891]
2025-08-07 07:39:43,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 82.0, 35.0, 20.0, 19.0, 59.0, 20.0, 19.0, 72.0, 20.0]
2025-08-07 07:39:43,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 39 minutes, 10 seconds)
2025-08-07 07:41:42,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:41:42,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 172.78627 ± 80.213
2025-08-07 07:41:42,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [101.86459, 124.73998, 297.36346, 101.44178, 94.93024, 185.91583, 287.2017, 283.36893, 139.15964, 111.87654]
2025-08-07 07:41:42,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 24.0, 56.0, 20.0, 19.0, 36.0, 57.0, 56.0, 27.0, 22.0]
2025-08-07 07:41:42,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 37 minutes, 10 seconds)
2025-08-07 07:43:41,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:43:41,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 141.91661 ± 65.946
2025-08-07 07:43:41,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [203.59387, 89.27767, 302.99783, 148.3141, 139.32846, 84.33678, 168.8813, 97.13607, 89.72382, 95.57612]
2025-08-07 07:43:41,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [39.0, 18.0, 59.0, 29.0, 30.0, 17.0, 32.0, 19.0, 18.0, 19.0]
2025-08-07 07:43:41,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 35 minutes, 3 seconds)
2025-08-07 07:45:40,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:45:40,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 249.94734 ± 144.374
2025-08-07 07:45:40,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [107.47101, 412.5419, 413.75156, 125.06504, 335.73535, 441.84048, 352.86472, 106.96845, 96.43124, 106.80381]
2025-08-07 07:45:40,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 81.0, 77.0, 24.0, 77.0, 92.0, 67.0, 21.0, 19.0, 21.0]
2025-08-07 07:45:40,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1226 [INFO]: New best (249.95) for latency ExtremeClogL1U23
2025-08-07 07:45:40,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 33 minutes, 4 seconds)
2025-08-07 07:47:37,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:47:38,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 259.41068 ± 154.879
2025-08-07 07:47:38,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [138.1682, 438.56073, 113.66887, 532.8882, 140.51854, 90.269356, 96.372, 361.08862, 385.89227, 296.68]
2025-08-07 07:47:38,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 81.0, 22.0, 110.0, 27.0, 18.0, 19.0, 71.0, 75.0, 64.0]
2025-08-07 07:47:38,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1226 [INFO]: New best (259.41) for latency ExtremeClogL1U23
2025-08-07 07:47:38,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 30 minutes, 46 seconds)
2025-08-07 07:49:37,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:49:37,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 231.43205 ± 128.254
2025-08-07 07:49:37,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [127.62026, 154.3421, 102.105545, 95.78194, 424.84525, 129.16002, 348.01614, 172.77977, 336.83167, 422.8379]
2025-08-07 07:49:37,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 30.0, 20.0, 19.0, 82.0, 26.0, 66.0, 33.0, 70.0, 79.0]
2025-08-07 07:49:37,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 28 minutes, 30 seconds)
2025-08-07 07:51:33,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:51:34,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 144.23512 ± 71.460
2025-08-07 07:51:34,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [129.62914, 102.57807, 354.56378, 155.41255, 116.69346, 116.47532, 118.69786, 113.190506, 127.156525, 107.95403]
2025-08-07 07:51:34,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 20.0, 66.0, 30.0, 23.0, 23.0, 24.0, 22.0, 25.0, 21.0]
2025-08-07 07:51:34,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 25 minutes, 52 seconds)
2025-08-07 07:53:31,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:53:31,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 212.83463 ± 108.244
2025-08-07 07:53:31,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [90.56662, 324.12897, 124.43374, 339.31616, 133.76807, 338.41702, 319.31137, 273.43066, 89.3067, 95.666954]
2025-08-07 07:53:31,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 69.0, 24.0, 66.0, 26.0, 64.0, 63.0, 53.0, 18.0, 19.0]
2025-08-07 07:53:31,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 23 minutes, 32 seconds)
2025-08-07 07:55:28,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:55:29,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 192.78749 ± 118.036
2025-08-07 07:55:29,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [96.6051, 95.928215, 404.89102, 145.28741, 129.34538, 334.62836, 89.26425, 368.89154, 125.7892, 137.24442]
2025-08-07 07:55:29,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 19.0, 80.0, 28.0, 25.0, 61.0, 18.0, 69.0, 24.0, 27.0]
2025-08-07 07:55:29,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 21 minutes, 14 seconds)
2025-08-07 07:57:26,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:57:27,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 168.58546 ± 94.809
2025-08-07 07:57:27,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [145.50479, 84.32153, 182.18446, 391.90256, 90.048065, 134.35518, 300.48682, 126.61354, 113.777596, 116.66]
2025-08-07 07:57:27,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 17.0, 35.0, 82.0, 18.0, 26.0, 56.0, 25.0, 22.0, 23.0]
2025-08-07 07:57:27,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 19 minutes, 15 seconds)
2025-08-07 07:59:24,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:59:25,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 257.04190 ± 135.227
2025-08-07 07:59:25,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [108.47372, 123.63781, 142.66252, 384.12057, 401.2476, 401.97177, 353.45282, 96.281, 145.43216, 413.13925]
2025-08-07 07:59:25,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 24.0, 28.0, 72.0, 77.0, 76.0, 67.0, 19.0, 28.0, 77.0]
2025-08-07 07:59:25,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 17 minutes)
2025-08-07 08:01:22,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:01:23,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 256.89838 ± 147.229
2025-08-07 08:01:23,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [113.26535, 169.83652, 156.37816, 143.32532, 387.74973, 131.59386, 453.37384, 517.8632, 134.09732, 361.5004]
2025-08-07 08:01:23,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 33.0, 30.0, 28.0, 75.0, 26.0, 102.0, 115.0, 26.0, 70.0]
2025-08-07 08:01:23,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 15 minutes, 35 seconds)
2025-08-07 08:03:20,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:03:21,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 139.68527 ± 71.108
2025-08-07 08:03:21,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [104.73695, 90.02462, 84.02334, 336.64655, 174.49734, 95.88703, 155.73473, 107.802086, 117.75705, 129.74306]
2025-08-07 08:03:21,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 18.0, 17.0, 64.0, 34.0, 19.0, 30.0, 21.0, 23.0, 25.0]
2025-08-07 08:03:21,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 13 minutes, 38 seconds)
2025-08-07 08:05:18,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:05:19,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 125.94784 ± 18.253
2025-08-07 08:05:19,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [161.45638, 129.14302, 132.41765, 134.58923, 100.67887, 103.26693, 122.31131, 124.83735, 145.08803, 105.68952]
2025-08-07 08:05:19,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 25.0, 26.0, 26.0, 20.0, 20.0, 24.0, 24.0, 28.0, 21.0]
2025-08-07 08:05:19,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 11 minutes, 43 seconds)
2025-08-07 08:07:15,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:07:16,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 159.88370 ± 90.200
2025-08-07 08:07:16,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [118.258286, 115.84534, 101.74563, 108.08994, 359.03864, 96.04264, 316.58374, 140.11043, 120.09166, 123.030594]
2025-08-07 08:07:16,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 23.0, 20.0, 21.0, 69.0, 19.0, 66.0, 27.0, 24.0, 25.0]
2025-08-07 08:07:16,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 9 minutes, 38 seconds)
2025-08-07 08:09:13,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:09:14,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 115.76768 ± 17.532
2025-08-07 08:09:14,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [90.233734, 127.29176, 102.89832, 149.9454, 123.7399, 102.73702, 138.45396, 107.461174, 107.65365, 107.26194]
2025-08-07 08:09:14,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 25.0, 20.0, 30.0, 24.0, 20.0, 27.0, 21.0, 21.0, 21.0]
2025-08-07 08:09:14,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 7 minutes, 37 seconds)
2025-08-07 08:11:11,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:11:11,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 151.87750 ± 86.740
2025-08-07 08:11:11,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [279.17215, 131.38031, 112.11006, 107.0634, 89.408516, 359.1361, 89.47137, 136.16844, 113.51819, 101.3464]
2025-08-07 08:11:11,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [54.0, 26.0, 22.0, 21.0, 18.0, 66.0, 18.0, 27.0, 22.0, 20.0]
2025-08-07 08:11:11,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 5 minutes, 25 seconds)
2025-08-07 08:13:08,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:13:09,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 107.28496 ± 11.120
2025-08-07 08:13:09,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [123.98919, 124.6192, 89.61239, 96.073364, 107.611244, 100.60245, 96.72891, 111.72967, 108.08198, 113.80122]
2025-08-07 08:13:09,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 24.0, 18.0, 19.0, 21.0, 20.0, 19.0, 22.0, 21.0, 23.0]
2025-08-07 08:13:09,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 3 minutes, 24 seconds)
2025-08-07 08:15:06,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:15:06,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 181.67345 ± 145.963
2025-08-07 08:15:06,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [128.52797, 595.052, 102.1151, 141.1103, 95.96195, 270.4068, 96.06506, 146.63979, 127.28012, 113.575424]
2025-08-07 08:15:06,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 112.0, 20.0, 27.0, 19.0, 53.0, 19.0, 28.0, 25.0, 22.0]
2025-08-07 08:15:06,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 1 minute, 27 seconds)
2025-08-07 08:17:04,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:17:05,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 193.00719 ± 130.795
2025-08-07 08:17:05,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [109.98993, 525.483, 154.394, 99.833725, 172.91661, 137.89424, 89.49466, 349.9828, 150.98773, 139.09506]
2025-08-07 08:17:05,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 101.0, 31.0, 20.0, 34.0, 27.0, 18.0, 65.0, 29.0, 27.0]
2025-08-07 08:17:05,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 59 minutes, 42 seconds)
2025-08-07 08:19:01,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:19:02,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 175.79948 ± 127.663
2025-08-07 08:19:02,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [118.84011, 95.4896, 109.02537, 457.66733, 113.213844, 118.50323, 96.7213, 399.7235, 140.178, 108.63253]
2025-08-07 08:19:02,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 19.0, 21.0, 104.0, 22.0, 23.0, 19.0, 86.0, 27.0, 21.0]
2025-08-07 08:19:02,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 57 minutes, 38 seconds)
2025-08-07 08:21:00,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:21:01,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 286.28748 ± 205.438
2025-08-07 08:21:01,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [144.34647, 96.08919, 346.8699, 263.85873, 110.63426, 511.73752, 704.0748, 96.64658, 480.3637, 108.25339]
2025-08-07 08:21:01,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 19.0, 66.0, 50.0, 22.0, 101.0, 139.0, 19.0, 92.0, 21.0]
2025-08-07 08:21:01,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1226 [INFO]: New best (286.29) for latency ExtremeClogL1U23
2025-08-07 08:21:01,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 55 minutes, 54 seconds)
2025-08-07 08:22:57,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:22:58,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 130.40857 ± 95.790
2025-08-07 08:22:58,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [118.35871, 103.032005, 89.79883, 416.38593, 96.445076, 89.6101, 108.510544, 84.27547, 101.24913, 96.41989]
2025-08-07 08:22:58,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 20.0, 18.0, 77.0, 19.0, 18.0, 21.0, 17.0, 20.0, 19.0]
2025-08-07 08:22:58,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 53 minutes, 53 seconds)
2025-08-07 08:24:55,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:24:55,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 137.87651 ± 60.011
2025-08-07 08:24:55,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [95.28639, 146.36588, 119.996216, 90.52247, 311.17548, 134.42403, 107.186874, 130.17902, 120.51544, 123.11334]
2025-08-07 08:24:55,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 28.0, 23.0, 18.0, 60.0, 26.0, 21.0, 25.0, 23.0, 24.0]
2025-08-07 08:24:55,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 51 minutes, 53 seconds)
2025-08-07 08:26:53,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:26:53,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 177.81287 ± 92.021
2025-08-07 08:26:53,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [109.449135, 116.5742, 257.50006, 111.263115, 167.7418, 150.46034, 175.36295, 122.606316, 143.5907, 423.5801]
2025-08-07 08:26:53,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 23.0, 51.0, 22.0, 32.0, 29.0, 34.0, 24.0, 28.0, 80.0]
2025-08-07 08:26:53,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 49 minutes, 53 seconds)
2025-08-07 08:28:51,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:28:51,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 243.60542 ± 158.333
2025-08-07 08:28:51,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [357.00067, 485.46387, 515.3424, 348.07635, 108.516594, 137.64305, 89.194275, 178.81598, 120.02756, 95.97347]
2025-08-07 08:28:51,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [69.0, 94.0, 98.0, 64.0, 21.0, 28.0, 18.0, 34.0, 23.0, 19.0]
2025-08-07 08:28:51,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 48 minutes, 6 seconds)
2025-08-07 08:30:49,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:30:49,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 116.60962 ± 15.800
2025-08-07 08:30:49,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [141.12776, 128.98143, 125.285095, 97.092125, 124.09788, 89.24704, 117.64088, 131.45894, 108.442535, 102.72257]
2025-08-07 08:30:49,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 25.0, 25.0, 19.0, 24.0, 18.0, 23.0, 25.0, 21.0, 20.0]
2025-08-07 08:30:50,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 45 minutes, 59 seconds)
2025-08-07 08:32:48,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:32:48,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 156.12497 ± 80.375
2025-08-07 08:32:48,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [107.96073, 153.49487, 323.14127, 118.45322, 113.493484, 108.28976, 129.20363, 113.68802, 89.461845, 304.06305]
2025-08-07 08:32:48,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 30.0, 63.0, 23.0, 22.0, 21.0, 25.0, 22.0, 18.0, 68.0]
2025-08-07 08:32:48,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 44 minutes, 22 seconds)
2025-08-07 08:34:44,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:34:45,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 181.81198 ± 111.528
2025-08-07 08:34:45,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [110.31452, 89.359886, 136.48796, 386.43378, 322.04968, 114.362076, 120.81467, 340.0073, 108.416405, 89.87369]
2025-08-07 08:34:45,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 18.0, 27.0, 85.0, 59.0, 22.0, 24.0, 69.0, 21.0, 18.0]
2025-08-07 08:34:45,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 42 minutes, 12 seconds)
2025-08-07 08:36:43,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:36:43,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 129.26620 ± 62.835
2025-08-07 08:36:43,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [91.17792, 122.46805, 107.66592, 107.85269, 106.811646, 313.27048, 89.524055, 101.46548, 139.01218, 113.413666]
2025-08-07 08:36:43,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 24.0, 21.0, 21.0, 21.0, 64.0, 18.0, 20.0, 27.0, 22.0]
2025-08-07 08:36:43,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 40 minutes, 14 seconds)
2025-08-07 08:38:40,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:38:41,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 225.67998 ± 110.168
2025-08-07 08:38:41,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [392.50436, 140.63254, 370.2254, 134.12938, 142.3571, 321.8261, 351.45975, 143.01021, 132.08798, 128.56683]
2025-08-07 08:38:41,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 27.0, 70.0, 26.0, 28.0, 60.0, 64.0, 28.0, 26.0, 25.0]
2025-08-07 08:38:41,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 38 minutes, 13 seconds)
2025-08-07 08:40:38,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:40:39,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 168.67117 ± 116.339
2025-08-07 08:40:39,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [129.23288, 95.52974, 125.11262, 101.224205, 96.46687, 129.25215, 122.27213, 443.4768, 95.868385, 348.27585]
2025-08-07 08:40:39,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 19.0, 24.0, 20.0, 19.0, 25.0, 24.0, 89.0, 19.0, 67.0]
2025-08-07 08:40:39,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 36 minutes, 16 seconds)
2025-08-07 08:42:37,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:42:37,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 140.57930 ± 31.512
2025-08-07 08:42:37,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [181.10103, 165.379, 102.82294, 186.26938, 114.77117, 106.90192, 146.68906, 130.70943, 102.59971, 168.54921]
2025-08-07 08:42:37,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 32.0, 20.0, 37.0, 22.0, 21.0, 29.0, 25.0, 20.0, 32.0]
2025-08-07 08:42:37,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 34 minutes, 10 seconds)
2025-08-07 08:44:35,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:44:35,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 116.34996 ± 27.869
2025-08-07 08:44:35,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [152.63245, 89.43168, 102.60678, 89.90393, 90.057365, 107.53265, 116.19883, 174.43729, 102.15365, 138.54503]
2025-08-07 08:44:35,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 18.0, 20.0, 18.0, 18.0, 21.0, 23.0, 33.0, 20.0, 27.0]
2025-08-07 08:44:35,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 32 minutes, 30 seconds)
2025-08-07 08:46:32,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:46:32,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 205.43681 ± 132.547
2025-08-07 08:46:32,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [494.51825, 107.17966, 106.18365, 117.20303, 348.06302, 95.52128, 156.14409, 350.96268, 144.2386, 134.35394]
2025-08-07 08:46:32,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [108.0, 21.0, 21.0, 23.0, 65.0, 19.0, 30.0, 66.0, 28.0, 26.0]
2025-08-07 08:46:32,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 30 minutes, 21 seconds)
2025-08-07 08:48:30,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:48:30,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 190.41194 ± 153.943
2025-08-07 08:48:30,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [84.42906, 102.46826, 159.47089, 94.79808, 583.0849, 113.48488, 140.53722, 380.43173, 117.86043, 127.553894]
2025-08-07 08:48:30,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [17.0, 20.0, 31.0, 19.0, 121.0, 22.0, 28.0, 79.0, 23.0, 25.0]
2025-08-07 08:48:30,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 28 minutes, 23 seconds)
2025-08-07 08:50:28,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:50:28,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 227.20834 ± 111.267
2025-08-07 08:50:28,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [341.8655, 140.41904, 137.62263, 307.13358, 343.78714, 284.53677, 124.011665, 395.10864, 96.01424, 101.58455]
2025-08-07 08:50:28,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [64.0, 27.0, 27.0, 66.0, 70.0, 54.0, 24.0, 75.0, 19.0, 20.0]
2025-08-07 08:50:28,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 26 minutes, 27 seconds)
2025-08-07 08:52:26,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:52:26,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 143.56729 ± 80.820
2025-08-07 08:52:26,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [108.106094, 119.611855, 266.10986, 150.04037, 96.473015, 331.04672, 96.45779, 89.551025, 89.04524, 89.23089]
2025-08-07 08:52:26,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 23.0, 49.0, 29.0, 19.0, 65.0, 19.0, 18.0, 18.0, 18.0]
2025-08-07 08:52:26,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 24 minutes, 26 seconds)
2025-08-07 08:54:24,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:54:25,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 173.62772 ± 120.032
2025-08-07 08:54:25,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [101.95712, 138.21817, 118.82931, 181.63995, 158.00961, 149.97353, 525.7895, 95.19582, 147.45735, 119.206665]
2025-08-07 08:54:25,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 27.0, 23.0, 35.0, 31.0, 29.0, 100.0, 19.0, 28.0, 23.0]
2025-08-07 08:54:25,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 22 minutes, 29 seconds)
2025-08-07 08:56:22,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:56:22,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 169.32547 ± 80.425
2025-08-07 08:56:22,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [149.68501, 94.795006, 101.4492, 186.52354, 150.68463, 104.502014, 122.17221, 330.97852, 141.21245, 311.252]
2025-08-07 08:56:22,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 19.0, 20.0, 36.0, 30.0, 21.0, 24.0, 64.0, 27.0, 64.0]
2025-08-07 08:56:22,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 20 minutes, 38 seconds)
2025-08-07 08:58:20,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:58:20,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 204.14371 ± 163.332
2025-08-07 08:58:20,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [600.5764, 132.6357, 426.70093, 97.48012, 117.21691, 222.50066, 102.844315, 138.31049, 95.854836, 107.316895]
2025-08-07 08:58:20,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [114.0, 26.0, 80.0, 19.0, 23.0, 45.0, 20.0, 27.0, 19.0, 21.0]
2025-08-07 08:58:20,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 18 minutes, 40 seconds)
2025-08-07 09:00:18,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:00:18,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 275.52942 ± 189.338
2025-08-07 09:00:18,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [138.87462, 346.95844, 363.59494, 408.68723, 328.65247, 113.161156, 721.5595, 111.96979, 102.18387, 119.652245]
2025-08-07 09:00:18,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 66.0, 67.0, 78.0, 61.0, 22.0, 143.0, 22.0, 20.0, 24.0]
2025-08-07 09:00:18,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 16 minutes, 42 seconds)
2025-08-07 09:02:16,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:02:16,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 204.90707 ± 113.754
2025-08-07 09:02:16,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [419.4839, 304.4031, 155.80241, 96.059784, 84.248764, 124.5878, 132.57483, 268.18753, 349.11795, 114.60485]
2025-08-07 09:02:16,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 57.0, 30.0, 19.0, 17.0, 24.0, 26.0, 51.0, 70.0, 22.0]
2025-08-07 09:02:17,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 14 minutes, 45 seconds)
2025-08-07 09:04:13,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:04:14,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 246.38657 ± 157.763
2025-08-07 09:04:14,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [134.07187, 129.23167, 474.74835, 96.58265, 334.56024, 138.67468, 147.01382, 351.84686, 111.67985, 545.4557]
2025-08-07 09:04:14,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 25.0, 97.0, 19.0, 63.0, 27.0, 28.0, 66.0, 22.0, 103.0]
2025-08-07 09:04:14,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 12 minutes, 43 seconds)
2025-08-07 09:06:12,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:06:13,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 187.58176 ± 106.078
2025-08-07 09:06:13,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [401.38504, 102.75595, 151.21317, 96.0436, 112.05345, 130.72041, 296.5407, 132.162, 119.210495, 333.7329]
2025-08-07 09:06:13,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 20.0, 29.0, 19.0, 22.0, 25.0, 58.0, 26.0, 23.0, 73.0]
2025-08-07 09:06:13,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 10 minutes, 52 seconds)
2025-08-07 09:08:10,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:08:10,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 132.68788 ± 55.453
2025-08-07 09:08:10,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [100.076645, 96.566154, 89.870964, 177.05159, 129.19655, 96.90679, 89.873566, 127.72612, 278.8582, 140.7523]
2025-08-07 09:08:10,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 19.0, 18.0, 35.0, 25.0, 19.0, 18.0, 25.0, 51.0, 27.0]
2025-08-07 09:08:10,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 8 minutes, 51 seconds)
2025-08-07 09:10:08,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:10:09,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 298.31393 ± 200.464
2025-08-07 09:10:09,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [672.8231, 542.0952, 130.7289, 360.38248, 123.37342, 112.538635, 373.08105, 116.43072, 94.96225, 456.7237]
2025-08-07 09:10:09,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [143.0, 107.0, 25.0, 66.0, 24.0, 22.0, 68.0, 23.0, 19.0, 83.0]
2025-08-07 09:10:09,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1226 [INFO]: New best (298.31) for latency ExtremeClogL1U23
2025-08-07 09:10:09,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 6 minutes, 56 seconds)
2025-08-07 09:12:07,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:12:07,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 269.71710 ± 273.061
2025-08-07 09:12:07,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [113.50459, 1042.3997, 100.63324, 95.25855, 179.64935, 300.09674, 308.69095, 108.51369, 113.59208, 334.83212]
2025-08-07 09:12:07,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 203.0, 20.0, 19.0, 35.0, 64.0, 55.0, 21.0, 22.0, 62.0]
2025-08-07 09:12:07,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 5 minutes)
2025-08-07 09:14:05,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:14:06,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 118.96011 ± 15.968
2025-08-07 09:14:06,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [129.36128, 102.6253, 117.825905, 134.10556, 128.45766, 114.68053, 101.61193, 150.768, 113.3771, 96.78788]
2025-08-07 09:14:06,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 20.0, 23.0, 26.0, 25.0, 22.0, 20.0, 29.0, 22.0, 19.0]
2025-08-07 09:14:06,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 3 minutes, 4 seconds)
2025-08-07 09:16:03,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:16:03,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 191.90701 ± 119.515
2025-08-07 09:16:03,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [343.81534, 123.11252, 118.99383, 112.15372, 458.57626, 287.80347, 112.31974, 95.48368, 116.55116, 150.26048]
2025-08-07 09:16:03,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [64.0, 24.0, 23.0, 22.0, 103.0, 54.0, 22.0, 19.0, 23.0, 30.0]
2025-08-07 09:16:03,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 1 minute)
2025-08-07 09:18:00,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:18:01,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 142.97131 ± 84.923
2025-08-07 09:18:01,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [90.41485, 145.46625, 378.08997, 84.16187, 89.2098, 194.70085, 112.81228, 89.52977, 137.73885, 107.58867]
2025-08-07 09:18:01,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 28.0, 71.0, 17.0, 18.0, 37.0, 22.0, 18.0, 27.0, 21.0]
2025-08-07 09:18:01,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 59 minutes, 2 seconds)
2025-08-07 09:19:59,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:19:59,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 117.34666 ± 35.739
2025-08-07 09:19:59,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [89.156624, 191.23683, 145.47664, 90.12647, 84.32068, 122.79532, 84.554306, 162.33257, 113.542465, 89.92467]
2025-08-07 09:19:59,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 36.0, 28.0, 18.0, 17.0, 25.0, 17.0, 31.0, 22.0, 18.0]
2025-08-07 09:19:59,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 57 minutes, 3 seconds)
2025-08-07 09:21:57,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:21:57,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 221.30415 ± 139.191
2025-08-07 09:21:57,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [429.5905, 106.799774, 118.50355, 94.82115, 95.94199, 101.18286, 163.26248, 457.45013, 343.0057, 302.48315]
2025-08-07 09:21:57,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 21.0, 23.0, 19.0, 19.0, 20.0, 32.0, 87.0, 62.0, 58.0]
2025-08-07 09:21:57,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 55 minutes, 2 seconds)
2025-08-07 09:23:55,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:23:56,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 264.46487 ± 164.549
2025-08-07 09:23:56,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [95.01915, 279.99725, 112.12972, 469.81528, 111.27675, 403.32904, 560.6156, 129.2194, 357.04495, 126.20139]
2025-08-07 09:23:56,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 56.0, 22.0, 86.0, 22.0, 75.0, 117.0, 25.0, 66.0, 24.0]
2025-08-07 09:23:56,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 53 minutes, 8 seconds)
2025-08-07 09:25:53,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:25:54,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 188.15532 ± 138.545
2025-08-07 09:25:54,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [344.012, 133.37598, 106.51423, 102.380875, 118.89794, 155.14005, 94.94955, 95.259895, 185.90257, 545.1201]
2025-08-07 09:25:54,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 26.0, 21.0, 20.0, 23.0, 30.0, 19.0, 19.0, 36.0, 102.0]
2025-08-07 09:25:54,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 51 minutes, 11 seconds)
2025-08-07 09:27:51,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:27:52,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 213.46707 ± 148.445
2025-08-07 09:27:52,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [537.07135, 101.61063, 407.10406, 136.91551, 126.03331, 127.1008, 170.678, 102.67469, 336.3591, 89.12306]
2025-08-07 09:27:52,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 20.0, 81.0, 26.0, 24.0, 26.0, 33.0, 20.0, 63.0, 18.0]
2025-08-07 09:27:52,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 49 minutes, 14 seconds)
2025-08-07 09:29:50,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:29:50,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 173.23500 ± 119.478
2025-08-07 09:29:50,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [172.72313, 445.30347, 95.68353, 114.19674, 107.84869, 122.38448, 116.28142, 101.2098, 365.94406, 90.774704]
2025-08-07 09:29:50,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 83.0, 19.0, 22.0, 21.0, 24.0, 23.0, 20.0, 73.0, 18.0]
2025-08-07 09:29:50,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 47 minutes, 16 seconds)
2025-08-07 09:31:48,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:31:48,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 149.08879 ± 100.645
2025-08-07 09:31:48,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [128.5626, 138.77539, 108.63886, 123.007286, 95.74159, 95.41554, 133.24606, 96.76161, 123.213974, 447.525]
2025-08-07 09:31:48,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 27.0, 21.0, 24.0, 19.0, 19.0, 26.0, 19.0, 24.0, 91.0]
2025-08-07 09:31:48,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 45 minutes, 17 seconds)
2025-08-07 09:33:46,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:33:46,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 180.76312 ± 127.418
2025-08-07 09:33:46,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [118.35053, 189.69598, 458.96942, 103.11602, 113.881805, 89.29439, 398.9855, 122.10682, 111.82395, 101.40671]
2025-08-07 09:33:46,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 36.0, 90.0, 20.0, 22.0, 18.0, 74.0, 24.0, 22.0, 20.0]
2025-08-07 09:33:46,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 43 minutes, 16 seconds)
2025-08-07 09:35:44,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:35:45,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 128.06265 ± 52.811
2025-08-07 09:35:45,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [124.99402, 102.52885, 90.36037, 279.39273, 122.96838, 102.808304, 118.802765, 106.977936, 142.50848, 89.28466]
2025-08-07 09:35:45,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 20.0, 18.0, 55.0, 25.0, 20.0, 23.0, 21.0, 28.0, 18.0]
2025-08-07 09:35:45,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 41 minutes, 20 seconds)
2025-08-07 09:37:42,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:37:42,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 138.70670 ± 55.102
2025-08-07 09:37:42,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [96.76591, 173.95644, 288.61127, 141.5802, 101.58806, 95.36971, 102.440025, 131.88568, 129.64061, 125.229034]
2025-08-07 09:37:42,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 34.0, 57.0, 27.0, 20.0, 19.0, 20.0, 25.0, 25.0, 24.0]
2025-08-07 09:37:42,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 39 minutes, 22 seconds)
2025-08-07 09:39:41,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:39:41,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 161.14485 ± 111.028
2025-08-07 09:39:41,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [129.5485, 112.19619, 112.66198, 192.62515, 102.817406, 135.45728, 485.92294, 118.7943, 118.23925, 103.18545]
2025-08-07 09:39:41,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 22.0, 22.0, 38.0, 20.0, 26.0, 91.0, 23.0, 23.0, 20.0]
2025-08-07 09:39:41,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 37 minutes, 25 seconds)
2025-08-07 09:41:38,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:41:39,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 244.20747 ± 130.064
2025-08-07 09:41:39,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [144.79579, 107.96394, 118.877075, 362.5872, 117.9983, 407.39343, 339.53354, 89.373055, 387.55457, 365.99774]
2025-08-07 09:41:39,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 21.0, 23.0, 68.0, 23.0, 76.0, 64.0, 18.0, 83.0, 69.0]
2025-08-07 09:41:39,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 35 minutes, 27 seconds)
2025-08-07 09:43:37,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:43:38,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 199.70157 ± 157.343
2025-08-07 09:43:38,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [118.47775, 399.28082, 96.74771, 133.45563, 124.19357, 129.0889, 132.93167, 125.0664, 136.30539, 601.4677]
2025-08-07 09:43:38,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 83.0, 19.0, 26.0, 24.0, 26.0, 26.0, 24.0, 26.0, 116.0]
2025-08-07 09:43:38,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 33 minutes, 30 seconds)
2025-08-07 09:45:34,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:45:35,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 210.01669 ± 139.562
2025-08-07 09:45:35,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [117.51977, 89.5338, 493.40945, 136.33296, 101.73808, 124.03203, 348.4439, 405.81952, 140.5097, 142.82753]
2025-08-07 09:45:35,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 18.0, 100.0, 26.0, 20.0, 24.0, 66.0, 76.0, 27.0, 28.0]
2025-08-07 09:45:35,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 31 minutes, 29 seconds)
2025-08-07 09:47:33,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:47:34,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 189.67142 ± 102.226
2025-08-07 09:47:34,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [95.73825, 101.50249, 348.49115, 120.32632, 129.70354, 291.0546, 377.47235, 177.23831, 148.39607, 106.79083]
2025-08-07 09:47:34,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 20.0, 78.0, 24.0, 25.0, 65.0, 73.0, 35.0, 29.0, 21.0]
2025-08-07 09:47:34,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 29 minutes, 35 seconds)
2025-08-07 09:49:31,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:49:31,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 166.93059 ± 85.404
2025-08-07 09:49:31,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [340.2391, 135.21999, 140.04945, 135.06206, 112.16316, 117.11531, 125.98526, 89.29547, 329.63528, 144.54076]
2025-08-07 09:49:31,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 26.0, 27.0, 26.0, 22.0, 24.0, 24.0, 18.0, 62.0, 28.0]
2025-08-07 09:49:31,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 27 minutes, 32 seconds)
2025-08-07 09:51:29,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:51:30,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 213.28479 ± 135.886
2025-08-07 09:51:30,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [140.3946, 381.16797, 105.66174, 201.47801, 120.91291, 494.30112, 125.677376, 101.1632, 355.64514, 106.44577]
2025-08-07 09:51:30,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 73.0, 21.0, 39.0, 24.0, 93.0, 24.0, 20.0, 79.0, 21.0]
2025-08-07 09:51:30,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 25 minutes, 36 seconds)
2025-08-07 09:53:27,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:53:28,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 271.90826 ± 170.160
2025-08-07 09:53:28,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [89.2095, 112.20505, 136.08803, 276.05585, 339.4094, 193.98207, 660.4483, 134.96448, 436.98267, 339.73743]
2025-08-07 09:53:28,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 22.0, 26.0, 56.0, 70.0, 37.0, 132.0, 27.0, 82.0, 63.0]
2025-08-07 09:53:28,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 23 minutes, 36 seconds)
2025-08-07 09:55:25,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:55:26,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 144.15627 ± 105.387
2025-08-07 09:55:26,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [96.59992, 151.53484, 97.298515, 95.93143, 108.54168, 112.77939, 119.83316, 101.48092, 100.85827, 456.70444]
2025-08-07 09:55:26,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 30.0, 19.0, 19.0, 21.0, 22.0, 23.0, 20.0, 20.0, 85.0]
2025-08-07 09:55:26,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 21 minutes, 39 seconds)
2025-08-07 09:57:24,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:57:25,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 236.63374 ± 135.126
2025-08-07 09:57:25,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [108.016754, 389.57294, 308.52164, 344.09836, 473.27298, 116.852264, 95.48676, 131.23198, 95.17214, 304.11145]
2025-08-07 09:57:25,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 75.0, 57.0, 63.0, 87.0, 23.0, 19.0, 25.0, 19.0, 57.0]
2025-08-07 09:57:25,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 19 minutes, 40 seconds)
2025-08-07 09:59:22,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:59:23,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 181.82155 ± 135.638
2025-08-07 09:59:23,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [160.85516, 140.18575, 89.19886, 563.1418, 107.764946, 136.59389, 95.753555, 173.32306, 255.49173, 95.906876]
2025-08-07 09:59:23,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 27.0, 18.0, 112.0, 21.0, 26.0, 19.0, 33.0, 50.0, 19.0]
2025-08-07 09:59:23,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 17 minutes, 44 seconds)
2025-08-07 10:01:21,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:01:21,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 138.70261 ± 70.685
2025-08-07 10:01:21,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [123.9425, 342.40082, 96.05815, 112.757744, 102.02008, 161.41037, 138.19493, 101.30508, 113.32512, 95.61139]
2025-08-07 10:01:21,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 68.0, 19.0, 22.0, 20.0, 31.0, 27.0, 20.0, 22.0, 19.0]
2025-08-07 10:01:21,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 15 minutes, 45 seconds)
2025-08-07 10:03:19,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:03:20,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 188.70389 ± 107.073
2025-08-07 10:03:20,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [128.83372, 106.23811, 101.92665, 145.29599, 145.70746, 372.4871, 101.70054, 342.7807, 107.405556, 334.66315]
2025-08-07 10:03:20,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 21.0, 20.0, 28.0, 29.0, 71.0, 20.0, 66.0, 21.0, 73.0]
2025-08-07 10:03:20,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 13 minutes, 48 seconds)
2025-08-07 10:05:16,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:05:16,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 119.91148 ± 22.170
2025-08-07 10:05:16,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [108.55072, 118.418915, 148.24576, 130.11526, 90.76347, 100.74371, 90.10939, 122.66545, 128.47064, 161.03151]
2025-08-07 10:05:16,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 23.0, 29.0, 25.0, 18.0, 20.0, 18.0, 24.0, 26.0, 32.0]
2025-08-07 10:05:16,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 11 minutes, 48 seconds)
2025-08-07 10:07:12,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:07:13,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 189.37889 ± 123.180
2025-08-07 10:07:13,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [113.21069, 125.466354, 139.34074, 268.96045, 497.28763, 134.98328, 96.14449, 106.15434, 303.7528, 108.488106]
2025-08-07 10:07:13,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 24.0, 27.0, 55.0, 95.0, 26.0, 19.0, 21.0, 60.0, 21.0]
2025-08-07 10:07:13,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes, 48 seconds)
2025-08-07 10:09:10,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:09:11,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 169.01703 ± 92.607
2025-08-07 10:09:11,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [391.54904, 113.374626, 125.87984, 125.082985, 103.14023, 140.53476, 305.82755, 127.174095, 113.00775, 144.5995]
2025-08-07 10:09:11,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [71.0, 22.0, 24.0, 24.0, 20.0, 27.0, 69.0, 25.0, 22.0, 28.0]
2025-08-07 10:09:11,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 50 seconds)
2025-08-07 10:11:07,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:11:08,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 215.76335 ± 125.293
2025-08-07 10:11:08,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [270.02777, 106.52592, 447.54758, 370.6431, 117.330215, 131.35936, 89.31968, 95.61391, 185.95715, 343.30875]
2025-08-07 10:11:08,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [55.0, 21.0, 89.0, 68.0, 23.0, 25.0, 18.0, 19.0, 36.0, 64.0]
2025-08-07 10:11:08,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 52 seconds)
2025-08-07 10:13:05,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:13:05,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 167.61032 ± 110.317
2025-08-07 10:13:05,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [96.06411, 133.16104, 118.5949, 416.13016, 101.16734, 353.35666, 96.39496, 130.08734, 101.13035, 130.0164]
2025-08-07 10:13:05,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 26.0, 24.0, 78.0, 20.0, 64.0, 19.0, 25.0, 20.0, 25.0]
2025-08-07 10:13:05,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 54 seconds)
2025-08-07 10:15:02,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:15:03,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 199.23862 ± 125.935
2025-08-07 10:15:03,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [110.886604, 136.44452, 112.34234, 134.76234, 347.27554, 434.96985, 383.24576, 95.78032, 117.08439, 119.59447]
2025-08-07 10:15:03,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 26.0, 22.0, 27.0, 64.0, 81.0, 72.0, 19.0, 23.0, 23.0]
2025-08-07 10:15:03,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 57 seconds)
2025-08-07 10:16:59,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:16:59,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 188.30264 ± 157.089
2025-08-07 10:16:59,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [373.70425, 108.07879, 101.413315, 101.53499, 113.3564, 119.58829, 107.67421, 108.06855, 153.39862, 596.2089]
2025-08-07 10:16:59,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [75.0, 21.0, 20.0, 20.0, 22.0, 23.0, 21.0, 21.0, 29.0, 114.0]
2025-08-07 10:16:59,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1251 [DEBUG]: Training session finished
