2025-08-07 07:17:05,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc5-walker2d/ExtremeClogL1U23-bpql-mem24
2025-08-07 07:17:05,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc5-walker2d/ExtremeClogL1U23-bpql-mem24
2025-08-07 07:17:05,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x15102bbb9550>}
2025-08-07 07:17:05,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1111 [DEBUG]: using device: cuda
2025-08-07 07:17:05,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1133 [INFO]: Creating new trainer
2025-08-07 07:17:05,539 baseline-bpql-noiseperc5-walker2d:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=161, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 07:17:05,539 baseline-bpql-noiseperc5-walker2d:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 07:17:07,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1194 [DEBUG]: Starting training session...
2025-08-07 07:17:07,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 1/100
2025-08-07 07:18:41,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:18:43,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 19.79139 ± 3.824
2025-08-07 07:18:43,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [24.559364, 19.445765, 21.37804, 21.641344, 11.961007, 17.238108, 24.8799, 15.517983, 19.234121, 22.05824]
2025-08-07 07:18:43,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [96.0, 97.0, 95.0, 96.0, 90.0, 94.0, 104.0, 92.0, 96.0, 95.0]
2025-08-07 07:18:43,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (19.79) for latency ExtremeClogL1U23
2025-08-07 07:18:43,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 37 minutes, 28 seconds)
2025-08-07 07:20:24,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:20:25,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 38.95744 ± 29.697
2025-08-07 07:20:25,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [74.87863, 9.789516, 44.33166, 18.269243, -5.6449156, 24.767054, 25.412426, 49.252224, 49.2715, 99.24708]
2025-08-07 07:20:25,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [93.0, 138.0, 58.0, 27.0, 174.0, 50.0, 43.0, 103.0, 63.0, 117.0]
2025-08-07 07:20:25,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (38.96) for latency ExtremeClogL1U23
2025-08-07 07:20:25,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 41 minutes, 40 seconds)
2025-08-07 07:22:06,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:22:08,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 69.43653 ± 74.176
2025-08-07 07:22:08,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [252.85992, 140.3777, 44.3984, -35.585625, 55.58304, 32.452843, 28.080341, 50.770405, 40.602306, 84.82597]
2025-08-07 07:22:08,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [254.0, 162.0, 75.0, 153.0, 68.0, 40.0, 157.0, 65.0, 273.0, 109.0]
2025-08-07 07:22:08,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (69.44) for latency ExtremeClogL1U23
2025-08-07 07:22:08,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 41 minutes, 58 seconds)
2025-08-07 07:23:50,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:23:51,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 49.72761 ± 38.462
2025-08-07 07:23:51,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [103.03799, 20.30021, 33.54307, -27.26062, 96.8075, 49.35731, 67.70822, 26.83803, 90.439186, 36.50521]
2025-08-07 07:23:51,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [146.0, 72.0, 169.0, 162.0, 113.0, 66.0, 102.0, 34.0, 130.0, 79.0]
2025-08-07 07:23:51,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 41 minutes, 29 seconds)
2025-08-07 07:25:33,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:25:34,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 41.44080 ± 16.054
2025-08-07 07:25:34,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [45.303997, 42.45366, 48.345234, 58.564648, -1.5700467, 49.652435, 33.272667, 56.061287, 44.5032, 37.820965]
2025-08-07 07:25:34,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [75.0, 128.0, 95.0, 184.0, 184.0, 181.0, 40.0, 164.0, 126.0, 56.0]
2025-08-07 07:25:34,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 40 minutes, 33 seconds)
2025-08-07 07:27:16,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:27:18,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 77.76689 ± 65.089
2025-08-07 07:27:18,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [22.07465, 57.78889, 91.06033, 235.96692, 46.798256, 72.62179, 20.409718, 89.62967, 138.2756, 3.0430655]
2025-08-07 07:27:18,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [34.0, 169.0, 104.0, 169.0, 76.0, 116.0, 30.0, 131.0, 153.0, 236.0]
2025-08-07 07:27:18,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (77.77) for latency ExtremeClogL1U23
2025-08-07 07:27:18,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 41 minutes, 24 seconds)
2025-08-07 07:28:59,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:29:01,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 82.53053 ± 65.768
2025-08-07 07:29:01,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [75.58912, 70.940346, 42.85752, 36.080368, 67.50682, 90.92525, 163.17776, 18.50654, 239.31427, 20.407207]
2025-08-07 07:29:01,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [93.0, 93.0, 67.0, 41.0, 79.0, 102.0, 151.0, 29.0, 170.0, 33.0]
2025-08-07 07:29:01,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (82.53) for latency ExtremeClogL1U23
2025-08-07 07:29:01,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 39 minutes, 46 seconds)
2025-08-07 07:30:41,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:30:42,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 89.48192 ± 45.261
2025-08-07 07:30:42,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [83.46851, 192.25949, 23.433157, 140.44882, 92.90048, 65.26256, 61.98481, 90.98062, 50.117878, 93.96288]
2025-08-07 07:30:42,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [93.0, 117.0, 31.0, 98.0, 88.0, 98.0, 85.0, 88.0, 57.0, 97.0]
2025-08-07 07:30:42,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (89.48) for latency ExtremeClogL1U23
2025-08-07 07:30:42,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 37 minutes, 48 seconds)
2025-08-07 07:32:24,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:32:25,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 110.65574 ± 92.949
2025-08-07 07:32:25,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [22.37749, 48.530655, 25.77138, 83.512924, 220.80882, 107.048485, 70.23187, 339.48938, 86.43029, 102.35604]
2025-08-07 07:32:25,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [33.0, 52.0, 34.0, 78.0, 131.0, 82.0, 77.0, 221.0, 78.0, 149.0]
2025-08-07 07:32:25,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (110.66) for latency ExtremeClogL1U23
2025-08-07 07:32:25,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 35 minutes, 50 seconds)
2025-08-07 07:34:06,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:34:08,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 167.03206 ± 83.495
2025-08-07 07:34:08,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [192.12593, 168.97214, 82.62808, 244.39888, 161.58406, 208.78163, 324.98038, 71.10119, 27.100645, 188.64775]
2025-08-07 07:34:08,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [119.0, 103.0, 78.0, 144.0, 109.0, 119.0, 216.0, 76.0, 120.0, 214.0]
2025-08-07 07:34:08,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (167.03) for latency ExtremeClogL1U23
2025-08-07 07:34:08,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 33 minutes, 59 seconds)
2025-08-07 07:35:50,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:35:51,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 193.01358 ± 120.336
2025-08-07 07:35:51,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [292.69406, 246.01541, 332.50696, 153.31218, 217.49425, 11.400214, 204.39433, 377.52667, 26.632946, 68.15889]
2025-08-07 07:35:51,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [142.0, 133.0, 189.0, 93.0, 147.0, 23.0, 111.0, 218.0, 34.0, 65.0]
2025-08-07 07:35:51,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (193.01) for latency ExtremeClogL1U23
2025-08-07 07:35:51,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 32 minutes, 18 seconds)
2025-08-07 07:37:32,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:37:33,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 81.97546 ± 63.091
2025-08-07 07:37:33,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [54.997562, 58.726707, 73.84762, 39.694046, 22.504519, 38.60409, 147.28168, 46.838085, 241.93549, 95.32475]
2025-08-07 07:37:33,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [93.0, 56.0, 80.0, 51.0, 33.0, 45.0, 140.0, 113.0, 123.0, 96.0]
2025-08-07 07:37:33,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 30 minutes, 25 seconds)
2025-08-07 07:39:14,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:39:16,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 126.23869 ± 70.657
2025-08-07 07:39:16,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [133.66423, 173.69453, 51.658318, 44.259804, 256.34048, 35.2615, 105.45189, 103.70111, 221.51625, 136.83891]
2025-08-07 07:39:16,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [134.0, 99.0, 54.0, 48.0, 137.0, 41.0, 118.0, 84.0, 150.0, 116.0]
2025-08-07 07:39:16,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 28 minutes, 49 seconds)
2025-08-07 07:40:57,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:40:59,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 125.65965 ± 104.608
2025-08-07 07:40:59,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [41.679512, 21.118696, 78.672905, 258.2705, 41.48382, 308.8176, 272.15088, 43.21665, 117.461395, 73.72435]
2025-08-07 07:40:59,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [49.0, 32.0, 140.0, 151.0, 48.0, 221.0, 144.0, 47.0, 111.0, 105.0]
2025-08-07 07:40:59,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 27 minutes, 19 seconds)
2025-08-07 07:42:41,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:42:42,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 150.59352 ± 85.887
2025-08-07 07:42:42,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [113.77279, 218.82591, 223.37857, 279.31122, 246.56866, 171.97057, 46.143, 122.33984, 62.364357, 21.260311]
2025-08-07 07:42:42,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [86.0, 124.0, 109.0, 148.0, 121.0, 109.0, 55.0, 102.0, 88.0, 31.0]
2025-08-07 07:42:42,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 25 minutes, 42 seconds)
2025-08-07 07:44:22,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:44:24,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 165.06316 ± 111.491
2025-08-07 07:44:24,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [229.9101, 40.71348, 384.6362, 157.35977, 294.9477, 41.60288, 220.27205, 143.914, 35.77274, 101.50279]
2025-08-07 07:44:24,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [125.0, 45.0, 176.0, 108.0, 154.0, 47.0, 121.0, 131.0, 40.0, 114.0]
2025-08-07 07:44:24,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 23 minutes, 26 seconds)
2025-08-07 07:46:05,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:46:07,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 175.81604 ± 122.123
2025-08-07 07:46:07,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [204.62593, 108.24073, 317.90594, 39.051273, 359.14902, 285.89758, 278.84128, 40.147156, 15.988066, 108.31349]
2025-08-07 07:46:07,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [119.0, 160.0, 156.0, 48.0, 173.0, 197.0, 131.0, 46.0, 27.0, 120.0]
2025-08-07 07:46:07,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 22 minutes, 2 seconds)
2025-08-07 07:47:48,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:47:50,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 159.28282 ± 150.102
2025-08-07 07:47:50,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [415.36444, 35.180023, 71.80675, 29.58509, 43.803055, 334.0972, 367.26324, 45.59772, 218.43391, 31.696732]
2025-08-07 07:47:50,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [194.0, 40.0, 73.0, 37.0, 49.0, 211.0, 193.0, 46.0, 160.0, 39.0]
2025-08-07 07:47:50,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 20 minutes, 29 seconds)
2025-08-07 07:49:32,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:49:32,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 75.61903 ± 65.981
2025-08-07 07:49:32,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [237.34174, 25.712643, 97.83487, 49.07208, 38.753193, 40.93351, 28.71818, 29.773712, 52.96246, 155.08789]
2025-08-07 07:49:32,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [130.0, 37.0, 102.0, 110.0, 45.0, 45.0, 35.0, 36.0, 55.0, 115.0]
2025-08-07 07:49:32,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 18 minutes, 42 seconds)
2025-08-07 07:51:14,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:51:15,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 129.68675 ± 170.218
2025-08-07 07:51:15,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [25.91073, 46.595325, 281.92535, 23.259317, 44.55879, 27.093319, 568.587, 217.08131, 28.219591, 33.636795]
2025-08-07 07:51:15,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [38.0, 49.0, 273.0, 35.0, 47.0, 34.0, 266.0, 147.0, 35.0, 43.0]
2025-08-07 07:51:15,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 16 minutes, 46 seconds)
2025-08-07 07:52:57,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:52:59,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 233.58310 ± 188.991
2025-08-07 07:52:59,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [45.76416, 24.80147, 406.99805, 273.08783, 38.25195, 381.1507, 263.202, 286.38324, 601.33514, 14.856427]
2025-08-07 07:52:59,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [50.0, 34.0, 182.0, 194.0, 43.0, 211.0, 135.0, 137.0, 259.0, 26.0]
2025-08-07 07:52:59,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (233.58) for latency ExtremeClogL1U23
2025-08-07 07:52:59,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 15 minutes, 46 seconds)
2025-08-07 07:54:41,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:54:42,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 188.63896 ± 186.580
2025-08-07 07:54:42,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [501.417, 449.69104, 395.6238, 12.288568, 28.625895, 100.55263, 45.997448, 46.754784, 26.66786, 278.77063]
2025-08-07 07:54:42,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [265.0, 210.0, 170.0, 22.0, 36.0, 157.0, 52.0, 51.0, 35.0, 142.0]
2025-08-07 07:54:42,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 14 minutes, 3 seconds)
2025-08-07 07:56:25,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:56:26,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 236.47702 ± 165.524
2025-08-07 07:56:26,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [442.32333, 46.316326, 436.31256, 55.27179, 284.36346, 18.56302, 45.27365, 348.3088, 325.27072, 362.76654]
2025-08-07 07:56:26,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [205.0, 49.0, 197.0, 56.0, 147.0, 29.0, 51.0, 154.0, 162.0, 161.0]
2025-08-07 07:56:26,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (236.48) for latency ExtremeClogL1U23
2025-08-07 07:56:26,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 12 minutes, 36 seconds)
2025-08-07 07:58:07,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:58:08,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 123.44099 ± 104.603
2025-08-07 07:58:08,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [41.178173, 53.4737, 69.67214, 93.09099, 165.15628, 199.29262, 85.73329, 76.342125, 49.04527, 401.42535]
2025-08-07 07:58:08,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [91.0, 53.0, 88.0, 71.0, 136.0, 140.0, 75.0, 70.0, 52.0, 190.0]
2025-08-07 07:58:08,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 10 minutes, 33 seconds)
2025-08-07 07:59:49,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:59:51,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 191.60704 ± 126.881
2025-08-07 07:59:51,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [327.81277, 251.16528, 363.5488, 41.08361, 257.87463, 186.77853, 15.335812, 331.75256, 103.047874, 37.670544]
2025-08-07 07:59:51,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [216.0, 239.0, 151.0, 47.0, 188.0, 123.0, 26.0, 168.0, 157.0, 46.0]
2025-08-07 07:59:51,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 9 minutes, 2 seconds)
2025-08-07 08:01:33,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:01:34,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 92.72734 ± 74.642
2025-08-07 08:01:34,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [59.209755, 259.92596, 48.153446, 37.850067, 59.749107, 34.554146, 114.673744, 49.270668, 208.49402, 55.392525]
2025-08-07 08:01:34,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [54.0, 134.0, 53.0, 43.0, 55.0, 41.0, 66.0, 51.0, 126.0, 55.0]
2025-08-07 08:01:34,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 6 minutes, 56 seconds)
2025-08-07 08:03:16,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:03:17,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 116.43203 ± 139.312
2025-08-07 08:03:17,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [45.145355, 30.679726, 79.96351, 266.69257, 151.07948, 27.086649, 471.8044, 20.631075, 32.266357, 38.97121]
2025-08-07 08:03:17,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [49.0, 36.0, 68.0, 148.0, 160.0, 34.0, 210.0, 32.0, 39.0, 40.0]
2025-08-07 08:03:17,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 5 minutes, 16 seconds)
2025-08-07 08:04:58,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:04:59,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 123.84419 ± 77.007
2025-08-07 08:04:59,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [221.87022, 76.65384, 38.962975, 176.10548, 41.96834, 49.130028, 85.99283, 128.47131, 276.9714, 142.31548]
2025-08-07 08:04:59,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [150.0, 97.0, 44.0, 171.0, 44.0, 49.0, 89.0, 115.0, 182.0, 116.0]
2025-08-07 08:04:59,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 3 minutes, 7 seconds)
2025-08-07 08:06:41,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:06:43,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 256.06113 ± 124.356
2025-08-07 08:06:43,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [191.83266, 331.88824, 16.889673, 143.94148, 227.961, 155.37393, 453.33725, 363.4598, 347.9269, 328.0003]
2025-08-07 08:06:43,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [149.0, 239.0, 28.0, 115.0, 127.0, 129.0, 247.0, 143.0, 165.0, 214.0]
2025-08-07 08:06:43,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (256.06) for latency ExtremeClogL1U23
2025-08-07 08:06:43,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 2 minutes, 2 seconds)
2025-08-07 08:08:27,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:08:28,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 226.37405 ± 90.588
2025-08-07 08:08:28,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [204.38062, 282.10333, 223.02065, 314.4671, 278.57483, 20.016212, 223.02597, 341.6602, 256.0033, 120.48818]
2025-08-07 08:08:28,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [130.0, 129.0, 169.0, 156.0, 139.0, 32.0, 171.0, 166.0, 159.0, 108.0]
2025-08-07 08:08:28,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 43 seconds)
2025-08-07 08:10:08,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:10:10,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 167.77368 ± 107.078
2025-08-07 08:10:10,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [257.9117, 27.055525, 300.00085, 18.990664, 319.11313, 172.2289, 107.24972, 256.8562, 55.698265, 162.6319]
2025-08-07 08:10:10,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [162.0, 43.0, 178.0, 32.0, 154.0, 135.0, 111.0, 150.0, 62.0, 89.0]
2025-08-07 08:10:10,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 58 minutes, 40 seconds)
2025-08-07 08:11:52,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:11:54,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 265.33893 ± 158.630
2025-08-07 08:11:54,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [79.130714, 314.4306, 257.7861, 443.43494, 440.1154, 45.93164, 253.14162, 50.98123, 262.30472, 506.1324]
2025-08-07 08:11:54,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [70.0, 152.0, 180.0, 189.0, 214.0, 57.0, 134.0, 54.0, 214.0, 254.0]
2025-08-07 08:11:54,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (265.34) for latency ExtremeClogL1U23
2025-08-07 08:11:54,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 57 minutes, 8 seconds)
2025-08-07 08:13:37,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:13:38,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 258.56256 ± 140.919
2025-08-07 08:13:38,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [17.0462, 336.03827, 341.10608, 354.92566, 424.60178, 135.91272, 348.40106, 350.07196, 15.496805, 262.02493]
2025-08-07 08:13:38,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [28.0, 154.0, 161.0, 148.0, 242.0, 124.0, 138.0, 157.0, 27.0, 174.0]
2025-08-07 08:13:38,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 55 minutes, 58 seconds)
2025-08-07 08:15:20,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:15:21,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 242.17624 ± 165.508
2025-08-07 08:15:21,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [271.2351, 271.7132, 360.46664, 17.3125, 341.01855, 563.29285, 19.923021, 49.71799, 202.62993, 324.45264]
2025-08-07 08:15:21,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [124.0, 129.0, 155.0, 28.0, 162.0, 349.0, 31.0, 56.0, 108.0, 168.0]
2025-08-07 08:15:21,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 53 minutes, 57 seconds)
2025-08-07 08:17:03,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:17:05,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 275.21231 ± 89.872
2025-08-07 08:17:05,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [234.59898, 337.2488, 355.3561, 87.50634, 384.70453, 165.85258, 366.1519, 245.98763, 276.2312, 298.4852]
2025-08-07 08:17:05,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [148.0, 201.0, 152.0, 74.0, 191.0, 136.0, 153.0, 148.0, 123.0, 131.0]
2025-08-07 08:17:05,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (275.21) for latency ExtremeClogL1U23
2025-08-07 08:17:05,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 51 minutes, 57 seconds)
2025-08-07 08:18:47,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:18:49,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 277.63632 ± 105.927
2025-08-07 08:18:49,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [332.81082, 402.83978, 408.29462, 271.32162, 304.7781, 186.93085, 42.189198, 190.90755, 293.96564, 342.3251]
2025-08-07 08:18:49,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [171.0, 205.0, 232.0, 134.0, 142.0, 163.0, 52.0, 134.0, 138.0, 182.0]
2025-08-07 08:18:49,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (277.64) for latency ExtremeClogL1U23
2025-08-07 08:18:49,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 50 minutes, 46 seconds)
2025-08-07 08:20:30,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:20:32,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 208.05266 ± 121.650
2025-08-07 08:20:32,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [117.97065, 283.34726, 424.52472, 54.114254, 172.44383, 280.75122, 119.78663, 323.13675, 274.33957, 30.111824]
2025-08-07 08:20:32,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [117.0, 143.0, 253.0, 53.0, 143.0, 146.0, 102.0, 147.0, 166.0, 42.0]
2025-08-07 08:20:32,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 48 minutes, 46 seconds)
2025-08-07 08:22:14,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:22:17,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 289.96124 ± 123.641
2025-08-07 08:22:17,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [337.56036, 276.29678, 15.1651325, 300.7931, 530.2267, 226.83353, 226.19527, 345.93127, 275.38116, 365.22888]
2025-08-07 08:22:17,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [227.0, 152.0, 26.0, 168.0, 243.0, 145.0, 133.0, 238.0, 153.0, 191.0]
2025-08-07 08:22:17,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (289.96) for latency ExtremeClogL1U23
2025-08-07 08:22:17,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 47 minutes, 5 seconds)
2025-08-07 08:23:57,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:23:59,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 185.81348 ± 103.204
2025-08-07 08:23:59,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [289.0219, 51.42462, 176.72147, 229.97505, 113.19115, 192.02902, 157.5169, 279.6543, 9.884597, 358.71573]
2025-08-07 08:23:59,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [186.0, 53.0, 136.0, 187.0, 100.0, 92.0, 134.0, 155.0, 21.0, 184.0]
2025-08-07 08:23:59,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 45 minutes, 13 seconds)
2025-08-07 08:25:42,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:25:44,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 318.67206 ± 147.495
2025-08-07 08:25:44,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [376.8875, 63.554237, 364.2099, 556.7497, 300.52036, 360.57574, 432.65552, 47.484512, 359.43604, 324.6472]
2025-08-07 08:25:44,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [171.0, 62.0, 178.0, 333.0, 131.0, 270.0, 199.0, 51.0, 203.0, 174.0]
2025-08-07 08:25:44,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (318.67) for latency ExtremeClogL1U23
2025-08-07 08:25:44,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 43 minutes, 51 seconds)
2025-08-07 08:27:26,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:27:27,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 157.63330 ± 112.392
2025-08-07 08:27:27,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [47.20559, 46.229607, 51.210125, 216.4805, 268.9545, 34.79807, 111.22141, 146.81717, 326.49426, 326.9219]
2025-08-07 08:27:27,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [49.0, 50.0, 56.0, 133.0, 146.0, 41.0, 97.0, 134.0, 162.0, 162.0]
2025-08-07 08:27:27,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 41 minutes, 55 seconds)
2025-08-07 08:29:09,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:29:12,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 315.54883 ± 109.426
2025-08-07 08:29:12,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [111.07602, 395.4921, 338.95947, 428.81857, 354.5475, 212.56537, 505.2137, 223.91995, 275.12582, 309.7699]
2025-08-07 08:29:12,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [94.0, 206.0, 164.0, 259.0, 251.0, 156.0, 252.0, 139.0, 152.0, 171.0]
2025-08-07 08:29:12,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 40 minutes, 26 seconds)
2025-08-07 08:30:53,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:30:55,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 275.64471 ± 180.110
2025-08-07 08:30:55,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [184.83067, 483.69696, 411.76282, 364.77676, 30.619152, 229.89413, 430.4765, 56.76413, 518.7762, 44.84978]
2025-08-07 08:30:55,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [147.0, 244.0, 215.0, 199.0, 39.0, 134.0, 221.0, 57.0, 270.0, 49.0]
2025-08-07 08:30:55,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 38 minutes, 31 seconds)
2025-08-07 08:32:40,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:32:42,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 286.04623 ± 201.381
2025-08-07 08:32:42,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [72.06428, 456.89044, 48.0912, 46.962086, 465.319, 457.2759, 49.17435, 339.04843, 332.6034, 593.0333]
2025-08-07 08:32:42,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [63.0, 268.0, 51.0, 49.0, 245.0, 313.0, 54.0, 151.0, 184.0, 295.0]
2025-08-07 08:32:42,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 37 minutes, 40 seconds)
2025-08-07 08:34:22,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:34:23,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 213.16301 ± 178.332
2025-08-07 08:34:23,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [498.5413, 282.10632, 30.415009, 314.71124, 33.438774, 49.573463, 444.05984, 39.369972, 370.46378, 68.95036]
2025-08-07 08:34:23,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [228.0, 147.0, 38.0, 188.0, 41.0, 49.0, 178.0, 42.0, 200.0, 64.0]
2025-08-07 08:34:23,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 35 minutes, 5 seconds)
2025-08-07 08:36:07,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:36:09,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 188.93077 ± 154.608
2025-08-07 08:36:09,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [368.54544, 46.120922, 52.442467, 40.434418, 248.20293, 386.06805, 15.625655, 334.1972, 360.06512, 37.6056]
2025-08-07 08:36:09,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [200.0, 51.0, 56.0, 48.0, 307.0, 208.0, 29.0, 163.0, 200.0, 43.0]
2025-08-07 08:36:09,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 33 minutes, 51 seconds)
2025-08-07 08:37:49,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:37:53,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 329.40240 ± 289.654
2025-08-07 08:37:53,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [341.64157, 298.91437, 1045.9359, 28.328823, 498.5286, 14.8772135, 35.97774, 300.35266, 272.86615, 456.60083]
2025-08-07 08:37:53,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [165.0, 166.0, 1000.0, 36.0, 309.0, 29.0, 44.0, 215.0, 221.0, 444.0]
2025-08-07 08:37:53,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (329.40) for latency ExtremeClogL1U23
2025-08-07 08:37:53,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 32 minutes, 4 seconds)
2025-08-07 08:39:35,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:39:37,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 240.66313 ± 141.841
2025-08-07 08:39:37,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [39.045746, 299.6317, 43.892326, 265.78574, 35.566868, 374.8616, 409.38693, 224.39616, 387.20978, 326.85446]
2025-08-07 08:39:37,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [45.0, 279.0, 54.0, 152.0, 48.0, 148.0, 224.0, 125.0, 233.0, 162.0]
2025-08-07 08:39:37,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 30 minutes, 30 seconds)
2025-08-07 08:41:18,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:41:21,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 315.23309 ± 173.099
2025-08-07 08:41:21,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [405.1139, 315.74164, 516.6233, 340.87384, 519.7717, 31.782463, 150.4235, 385.8874, 449.92047, 36.192924]
2025-08-07 08:41:21,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [166.0, 205.0, 270.0, 180.0, 238.0, 38.0, 132.0, 189.0, 163.0, 45.0]
2025-08-07 08:41:21,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 28 minutes, 7 seconds)
2025-08-07 08:43:03,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:43:05,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 283.62372 ± 175.696
2025-08-07 08:43:05,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [571.77606, 404.13086, 49.766106, 402.53592, 57.04098, 60.63137, 427.00833, 239.40475, 404.4943, 219.44849]
2025-08-07 08:43:05,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [305.0, 206.0, 52.0, 300.0, 58.0, 58.0, 185.0, 163.0, 242.0, 187.0]
2025-08-07 08:43:05,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 27 minutes, 4 seconds)
2025-08-07 08:44:47,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:44:49,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 222.65421 ± 123.613
2025-08-07 08:44:49,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [237.42674, 67.79099, 212.13145, 414.88083, 34.646507, 226.72859, 340.43622, 70.408775, 352.86804, 269.224]
2025-08-07 08:44:49,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [131.0, 63.0, 175.0, 248.0, 43.0, 165.0, 179.0, 62.0, 188.0, 159.0]
2025-08-07 08:44:49,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 24 minutes, 55 seconds)
2025-08-07 08:46:30,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:46:34,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 420.44980 ± 187.206
2025-08-07 08:46:34,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [246.90123, 336.42502, 500.03253, 485.97733, 547.4236, 267.47287, 885.80646, 396.72028, 263.54382, 274.19495]
2025-08-07 08:46:34,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [178.0, 258.0, 209.0, 226.0, 357.0, 142.0, 510.0, 201.0, 202.0, 197.0]
2025-08-07 08:46:34,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (420.45) for latency ExtremeClogL1U23
2025-08-07 08:46:34,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 23 minutes, 21 seconds)
2025-08-07 08:48:16,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:48:18,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 326.78336 ± 154.714
2025-08-07 08:48:18,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [504.68427, 452.4316, 278.49203, 365.89734, 276.84158, 419.42584, 474.61877, 35.123978, 390.4656, 69.85218]
2025-08-07 08:48:18,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [217.0, 188.0, 148.0, 171.0, 159.0, 178.0, 257.0, 47.0, 209.0, 60.0]
2025-08-07 08:48:18,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 21 minutes, 32 seconds)
2025-08-07 08:50:00,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:50:02,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 308.28046 ± 116.980
2025-08-07 08:50:02,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [330.4554, 336.64673, 341.34744, 456.97314, 363.8557, 312.31635, 38.776787, 344.15073, 408.79672, 149.4856]
2025-08-07 08:50:02,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [164.0, 181.0, 197.0, 241.0, 186.0, 170.0, 50.0, 190.0, 188.0, 146.0]
2025-08-07 08:50:02,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 19 minutes, 59 seconds)
2025-08-07 08:51:45,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:51:47,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 338.67838 ± 201.647
2025-08-07 08:51:47,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [72.32398, 10.241987, 78.752335, 568.03406, 329.22067, 472.7317, 481.41315, 370.75528, 588.40485, 414.90582]
2025-08-07 08:51:47,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [71.0, 22.0, 62.0, 259.0, 164.0, 181.0, 191.0, 184.0, 244.0, 234.0]
2025-08-07 08:51:47,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 18 minutes, 12 seconds)
2025-08-07 08:53:30,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:53:32,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 463.83685 ± 141.256
2025-08-07 08:53:32,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [446.72354, 525.8873, 428.74887, 326.14337, 438.45755, 368.18124, 333.43893, 375.70712, 822.84186, 572.23883]
2025-08-07 08:53:32,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [222.0, 312.0, 176.0, 164.0, 212.0, 181.0, 146.0, 147.0, 447.0, 268.0]
2025-08-07 08:53:32,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (463.84) for latency ExtremeClogL1U23
2025-08-07 08:53:33,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 16 minutes, 49 seconds)
2025-08-07 08:55:13,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:55:15,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 282.36142 ± 219.889
2025-08-07 08:55:15,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [330.22678, 492.7183, 15.709131, 372.26086, 549.962, 24.835575, 600.20013, 39.240963, 350.73126, 47.729134]
2025-08-07 08:55:15,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [171.0, 229.0, 26.0, 168.0, 262.0, 36.0, 287.0, 45.0, 186.0, 49.0]
2025-08-07 08:55:15,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 14 minutes, 39 seconds)
2025-08-07 08:56:57,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:57:00,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 351.97418 ± 184.729
2025-08-07 08:57:00,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [403.20837, 651.0312, 206.78491, 363.82437, 362.62537, 26.52149, 124.03692, 597.0583, 334.63776, 450.01318]
2025-08-07 08:57:00,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [249.0, 288.0, 131.0, 285.0, 228.0, 35.0, 109.0, 235.0, 176.0, 240.0]
2025-08-07 08:57:00,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 13 minutes, 5 seconds)
2025-08-07 08:58:41,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:58:43,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 318.20724 ± 143.846
2025-08-07 08:58:43,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [418.53116, 329.35013, 28.849636, 56.13926, 370.7539, 367.82864, 341.73434, 390.75113, 393.9797, 484.1545]
2025-08-07 08:58:43,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [182.0, 167.0, 37.0, 55.0, 215.0, 228.0, 188.0, 173.0, 205.0, 236.0]
2025-08-07 08:58:43,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 11 minutes, 13 seconds)
2025-08-07 09:00:25,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:00:28,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 342.16525 ± 181.627
2025-08-07 09:00:28,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [46.848236, 418.44745, 407.45023, 377.13596, 485.61, 628.3731, 225.98993, 481.0774, 317.55066, 33.169575]
2025-08-07 09:00:28,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [52.0, 211.0, 193.0, 165.0, 232.0, 303.0, 176.0, 220.0, 160.0, 43.0]
2025-08-07 09:00:28,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 9 minutes, 25 seconds)
2025-08-07 09:02:09,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:02:11,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 297.59808 ± 195.340
2025-08-07 09:02:11,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [364.66876, 391.53793, 56.12724, 684.4757, 181.57481, 30.052633, 546.184, 206.8366, 288.41968, 226.10387]
2025-08-07 09:02:11,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [183.0, 215.0, 59.0, 374.0, 158.0, 36.0, 275.0, 132.0, 196.0, 152.0]
2025-08-07 09:02:11,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 7 minutes, 25 seconds)
2025-08-07 09:03:54,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:03:56,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 349.54785 ± 109.292
2025-08-07 09:03:56,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [286.08066, 274.75262, 554.6075, 367.66904, 343.50262, 176.06505, 372.87524, 227.79305, 405.36215, 486.7707]
2025-08-07 09:03:56,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [152.0, 131.0, 256.0, 161.0, 184.0, 140.0, 153.0, 163.0, 221.0, 242.0]
2025-08-07 09:03:56,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 6 minutes, 4 seconds)
2025-08-07 09:05:39,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:05:40,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 228.03178 ± 187.654
2025-08-07 09:05:40,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [259.6304, 35.4475, 35.36446, 464.9486, 559.39386, 356.85556, 18.79961, 32.892506, 185.36603, 331.61948]
2025-08-07 09:05:40,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [175.0, 44.0, 41.0, 185.0, 244.0, 192.0, 34.0, 43.0, 143.0, 230.0]
2025-08-07 09:05:40,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 4 minutes, 12 seconds)
2025-08-07 09:07:22,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:07:24,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 199.90903 ± 194.340
2025-08-07 09:07:24,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [410.8766, 23.276545, 59.976444, 27.889385, 23.530989, 221.97879, 411.1249, 38.318623, 587.78973, 194.32825]
2025-08-07 09:07:24,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [165.0, 31.0, 59.0, 35.0, 32.0, 135.0, 166.0, 44.0, 283.0, 135.0]
2025-08-07 09:07:24,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 2 minutes, 27 seconds)
2025-08-07 09:09:05,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:09:08,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 413.60663 ± 207.221
2025-08-07 09:09:08,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [682.68054, 454.7391, 471.95074, 433.58963, 488.84442, 29.69813, 359.1468, 538.75977, 48.069637, 628.5876]
2025-08-07 09:09:08,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [318.0, 189.0, 190.0, 177.0, 223.0, 35.0, 231.0, 221.0, 50.0, 275.0]
2025-08-07 09:09:08,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 42 seconds)
2025-08-07 09:10:50,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:10:52,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 397.58966 ± 179.534
2025-08-07 09:10:52,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [415.58994, 710.0692, 74.12589, 616.9064, 180.4134, 485.81223, 331.03497, 312.74695, 382.21033, 466.98752]
2025-08-07 09:10:52,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [251.0, 297.0, 65.0, 298.0, 131.0, 224.0, 156.0, 158.0, 175.0, 242.0]
2025-08-07 09:10:52,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 67/100 (estimated time remaining: 59 minutes, 4 seconds)
2025-08-07 09:12:37,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:12:39,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 347.48892 ± 157.478
2025-08-07 09:12:39,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [181.57516, 218.95755, 512.9027, 319.93182, 479.67844, 543.1605, 37.09577, 322.24615, 360.47644, 498.8647]
2025-08-07 09:12:39,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [139.0, 182.0, 234.0, 174.0, 248.0, 227.0, 43.0, 180.0, 152.0, 215.0]
2025-08-07 09:12:39,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 68/100 (estimated time remaining: 57 minutes, 30 seconds)
2025-08-07 09:14:20,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:14:23,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 426.14233 ± 182.489
2025-08-07 09:14:23,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [33.63522, 435.20602, 657.2715, 479.86703, 662.312, 462.36545, 450.32114, 181.07072, 492.86203, 406.51242]
2025-08-07 09:14:23,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [44.0, 323.0, 299.0, 244.0, 282.0, 210.0, 256.0, 135.0, 251.0, 248.0]
2025-08-07 09:14:23,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 69/100 (estimated time remaining: 55 minutes, 41 seconds)
2025-08-07 09:16:05,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:16:07,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 334.07214 ± 188.328
2025-08-07 09:16:07,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [314.6589, 157.05455, 277.49695, 449.53027, 547.1058, 575.74646, 445.43164, 484.3936, 52.256874, 37.046295]
2025-08-07 09:16:07,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [158.0, 131.0, 202.0, 229.0, 227.0, 272.0, 219.0, 208.0, 56.0, 45.0]
2025-08-07 09:16:07,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 70/100 (estimated time remaining: 54 minutes, 3 seconds)
2025-08-07 09:17:49,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:17:52,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 355.16064 ± 280.320
2025-08-07 09:17:52,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [702.35065, 360.06244, 39.38184, 17.454235, 445.45963, 652.5268, 598.4086, 660.51984, 59.332024, 16.110403]
2025-08-07 09:17:52,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [284.0, 182.0, 45.0, 29.0, 203.0, 274.0, 236.0, 291.0, 57.0, 26.0]
2025-08-07 09:17:52,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 71/100 (estimated time remaining: 52 minutes, 22 seconds)
2025-08-07 09:19:34,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:19:37,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 441.44824 ± 215.797
2025-08-07 09:19:37,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [808.63513, 671.3536, 447.66922, 286.32877, 312.79553, 233.6402, 548.02277, 553.4448, 516.82855, 35.76374]
2025-08-07 09:19:37,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [382.0, 256.0, 231.0, 143.0, 140.0, 144.0, 284.0, 255.0, 280.0, 44.0]
2025-08-07 09:19:37,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 72/100 (estimated time remaining: 50 minutes, 43 seconds)
2025-08-07 09:21:20,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:21:23,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 429.29913 ± 213.138
2025-08-07 09:21:23,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [459.51505, 457.05563, 568.76636, 611.8128, 429.8423, 601.64264, 47.326214, 417.23456, 675.06805, 24.727512]
2025-08-07 09:21:23,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [205.0, 231.0, 260.0, 315.0, 180.0, 369.0, 51.0, 181.0, 305.0, 33.0]
2025-08-07 09:21:23,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 73/100 (estimated time remaining: 48 minutes, 51 seconds)
2025-08-07 09:23:03,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:23:05,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 309.52194 ± 216.803
2025-08-07 09:23:05,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [621.8484, 199.94864, 622.5862, 295.10995, 78.810326, 443.0792, 348.73456, 449.32806, 15.151258, 20.62277]
2025-08-07 09:23:05,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [249.0, 136.0, 279.0, 154.0, 70.0, 232.0, 235.0, 252.0, 26.0, 31.0]
2025-08-07 09:23:05,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 74/100 (estimated time remaining: 47 minutes, 2 seconds)
2025-08-07 09:24:50,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:24:53,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 521.09686 ± 274.231
2025-08-07 09:24:53,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [650.1412, 99.33194, 689.7203, 603.2752, 29.480284, 559.1864, 649.8042, 271.08307, 759.1181, 899.8281]
2025-08-07 09:24:53,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [292.0, 81.0, 309.0, 258.0, 38.0, 234.0, 278.0, 126.0, 380.0, 358.0]
2025-08-07 09:24:53,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (521.10) for latency ExtremeClogL1U23
2025-08-07 09:24:53,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 75/100 (estimated time remaining: 45 minutes, 34 seconds)
2025-08-07 09:26:33,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:26:36,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 484.61444 ± 113.904
2025-08-07 09:26:36,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [485.7216, 406.6016, 379.736, 481.70233, 437.58185, 330.8337, 498.77005, 677.8483, 444.48074, 702.8681]
2025-08-07 09:26:36,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [309.0, 193.0, 192.0, 185.0, 193.0, 206.0, 188.0, 304.0, 188.0, 267.0]
2025-08-07 09:26:36,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 76/100 (estimated time remaining: 43 minutes, 43 seconds)
2025-08-07 09:28:20,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:28:23,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 479.84344 ± 278.512
2025-08-07 09:28:23,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [634.4462, 42.16027, 501.45975, 387.7953, 623.58734, 255.89525, 310.71466, 332.01022, 1127.4371, 582.928]
2025-08-07 09:28:23,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [268.0, 45.0, 223.0, 174.0, 263.0, 113.0, 146.0, 144.0, 553.0, 279.0]
2025-08-07 09:28:23,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 77/100 (estimated time remaining: 42 minutes, 3 seconds)
2025-08-07 09:30:02,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:30:05,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 466.09229 ± 212.429
2025-08-07 09:30:05,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [17.396591, 301.37704, 602.8916, 775.01013, 403.03433, 722.40045, 488.86206, 604.2152, 404.57895, 341.1561]
2025-08-07 09:30:05,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [30.0, 141.0, 225.0, 307.0, 218.0, 382.0, 232.0, 230.0, 168.0, 167.0]
2025-08-07 09:30:05,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 78/100 (estimated time remaining: 40 minutes, 1 second)
2025-08-07 09:31:48,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:31:51,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 415.60263 ± 323.496
2025-08-07 09:31:51,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [502.77762, 198.3038, 34.528812, 758.035, 59.92004, 1063.2877, 37.80487, 525.3404, 582.7162, 393.312]
2025-08-07 09:31:51,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [272.0, 142.0, 41.0, 435.0, 60.0, 538.0, 42.0, 231.0, 246.0, 168.0]
2025-08-07 09:31:51,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 79/100 (estimated time remaining: 38 minutes, 32 seconds)
2025-08-07 09:33:31,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:33:33,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 343.81573 ± 277.780
2025-08-07 09:33:33,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [408.27045, 204.33725, 35.0017, 555.9774, 702.92737, 698.8897, 112.435326, 35.31238, 661.78064, 23.22543]
2025-08-07 09:33:33,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [202.0, 133.0, 42.0, 260.0, 325.0, 402.0, 67.0, 41.0, 291.0, 36.0]
2025-08-07 09:33:34,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 80/100 (estimated time remaining: 36 minutes, 26 seconds)
2025-08-07 09:35:17,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:35:19,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 378.21231 ± 181.293
2025-08-07 09:35:19,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [15.9893265, 440.65894, 240.42479, 522.8385, 396.73398, 541.3395, 173.69112, 337.17773, 457.8163, 655.4526]
2025-08-07 09:35:19,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [27.0, 236.0, 180.0, 229.0, 252.0, 252.0, 130.0, 174.0, 230.0, 321.0]
2025-08-07 09:35:19,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 81/100 (estimated time remaining: 34 minutes, 51 seconds)
2025-08-07 09:37:01,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:37:04,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 504.97568 ± 285.364
2025-08-07 09:37:04,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [200.85454, 738.005, 677.6668, 704.8993, 589.9363, 762.4588, 47.917, 23.106329, 511.70264, 793.20984]
2025-08-07 09:37:04,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [134.0, 265.0, 272.0, 273.0, 219.0, 279.0, 52.0, 33.0, 218.0, 335.0]
2025-08-07 09:37:04,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 82/100 (estimated time remaining: 33 minutes)
2025-08-07 09:38:46,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:38:48,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 310.97769 ± 160.963
2025-08-07 09:38:48,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [64.03577, 318.85092, 436.92554, 193.1279, 367.31653, 24.631584, 511.79962, 449.5865, 461.7485, 281.75403]
2025-08-07 09:38:48,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [61.0, 140.0, 192.0, 133.0, 159.0, 34.0, 185.0, 192.0, 251.0, 143.0]
2025-08-07 09:38:48,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 83/100 (estimated time remaining: 31 minutes, 24 seconds)
2025-08-07 09:40:30,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:40:34,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 505.64447 ± 133.966
2025-08-07 09:40:34,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [636.2175, 491.94376, 405.46207, 399.07333, 719.29944, 577.93097, 328.98224, 356.97546, 689.2277, 451.3324]
2025-08-07 09:40:34,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [360.0, 223.0, 182.0, 197.0, 410.0, 272.0, 165.0, 185.0, 292.0, 240.0]
2025-08-07 09:40:34,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 84/100 (estimated time remaining: 29 minutes, 38 seconds)
2025-08-07 09:42:15,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:42:17,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 396.13004 ± 224.232
2025-08-07 09:42:17,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [170.1134, 596.1846, 620.51996, 37.247417, 574.627, 39.987133, 392.88803, 655.3186, 477.73322, 396.68082]
2025-08-07 09:42:17,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [131.0, 249.0, 236.0, 43.0, 264.0, 44.0, 155.0, 243.0, 179.0, 224.0]
2025-08-07 09:42:17,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 85/100 (estimated time remaining: 27 minutes, 55 seconds)
2025-08-07 09:43:59,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:44:02,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 413.48322 ± 207.747
2025-08-07 09:44:02,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [155.22246, 419.68854, 14.88757, 739.9057, 432.33994, 440.02957, 724.94385, 401.76666, 425.3342, 380.7136]
2025-08-07 09:44:02,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [121.0, 157.0, 26.0, 353.0, 171.0, 166.0, 256.0, 203.0, 167.0, 176.0]
2025-08-07 09:44:02,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 86/100 (estimated time remaining: 26 minutes, 6 seconds)
2025-08-07 09:45:45,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:45:48,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 513.88196 ± 216.679
2025-08-07 09:45:48,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [309.16602, 761.9954, 512.0166, 35.492, 440.1123, 642.5398, 550.359, 846.4723, 480.69174, 559.9746]
2025-08-07 09:45:48,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [177.0, 343.0, 194.0, 40.0, 195.0, 270.0, 218.0, 412.0, 195.0, 243.0]
2025-08-07 09:45:48,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 87/100 (estimated time remaining: 24 minutes, 26 seconds)
2025-08-07 09:47:29,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:47:32,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 530.38806 ± 242.693
2025-08-07 09:47:32,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [343.30447, 592.04974, 742.8417, 45.40858, 546.785, 558.56995, 633.29034, 731.3457, 884.22626, 226.05865]
2025-08-07 09:47:32,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [167.0, 260.0, 272.0, 48.0, 266.0, 220.0, 256.0, 351.0, 362.0, 146.0]
2025-08-07 09:47:32,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (530.39) for latency ExtremeClogL1U23
2025-08-07 09:47:32,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 88/100 (estimated time remaining: 22 minutes, 42 seconds)
2025-08-07 09:49:16,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:49:19,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 429.39532 ± 292.147
2025-08-07 09:49:19,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [424.62247, 768.25653, 212.28908, 908.5446, 346.72568, 42.776142, 804.59906, 299.76953, 44.830257, 441.5394]
2025-08-07 09:49:19,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [166.0, 418.0, 145.0, 478.0, 179.0, 46.0, 405.0, 202.0, 53.0, 185.0]
2025-08-07 09:49:19,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 89/100 (estimated time remaining: 20 minutes, 59 seconds)
2025-08-07 09:51:01,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:51:04,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 455.97421 ± 209.435
2025-08-07 09:51:04,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [431.1879, 35.774277, 175.37537, 733.72565, 512.2847, 429.414, 578.21045, 750.48224, 459.12512, 454.16284]
2025-08-07 09:51:04,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [193.0, 43.0, 136.0, 320.0, 264.0, 191.0, 229.0, 291.0, 237.0, 219.0]
2025-08-07 09:51:04,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 90/100 (estimated time remaining: 19 minutes, 18 seconds)
2025-08-07 09:52:44,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:52:47,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 414.98004 ± 347.552
2025-08-07 09:52:47,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [357.5478, 44.061825, 39.8366, 449.40134, 1088.905, 34.534378, 234.33588, 768.8774, 318.52606, 813.7736]
2025-08-07 09:52:47,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [214.0, 48.0, 45.0, 165.0, 430.0, 41.0, 150.0, 340.0, 148.0, 433.0]
2025-08-07 09:52:47,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 91/100 (estimated time remaining: 17 minutes, 30 seconds)
2025-08-07 09:54:31,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:54:33,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 381.11298 ± 235.275
2025-08-07 09:54:33,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [441.4153, 250.35814, 384.6504, 371.2827, 40.075916, 31.540815, 431.19388, 438.47403, 523.11566, 899.02277]
2025-08-07 09:54:33,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [173.0, 116.0, 156.0, 166.0, 45.0, 41.0, 177.0, 179.0, 194.0, 385.0]
2025-08-07 09:54:33,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 92/100 (estimated time remaining: 15 minutes, 46 seconds)
2025-08-07 09:56:16,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:56:20,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 624.57092 ± 179.701
2025-08-07 09:56:20,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [1080.2937, 364.4329, 631.85657, 627.8349, 543.15656, 676.03424, 663.20544, 612.2217, 613.0257, 433.64676]
2025-08-07 09:56:20,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [493.0, 167.0, 299.0, 246.0, 223.0, 243.0, 271.0, 236.0, 243.0, 171.0]
2025-08-07 09:56:20,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (624.57) for latency ExtremeClogL1U23
2025-08-07 09:56:20,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 93/100 (estimated time remaining: 14 minutes, 3 seconds)
2025-08-07 09:58:00,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:58:03,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 478.70499 ± 309.999
2025-08-07 09:58:03,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [714.04926, 498.49832, 1079.373, 396.71893, 59.275635, 641.7509, 661.8073, 194.19795, 11.675879, 529.70264]
2025-08-07 09:58:03,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [351.0, 189.0, 497.0, 164.0, 59.0, 286.0, 268.0, 101.0, 23.0, 267.0]
2025-08-07 09:58:03,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 94/100 (estimated time remaining: 12 minutes, 14 seconds)
2025-08-07 09:59:46,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:59:49,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 539.96936 ± 262.632
2025-08-07 09:59:49,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [652.23505, 566.9112, 39.19218, 1015.9919, 712.4094, 774.04535, 290.28003, 386.86102, 376.58826, 585.17847]
2025-08-07 09:59:49,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [300.0, 242.0, 46.0, 430.0, 275.0, 343.0, 137.0, 173.0, 156.0, 226.0]
2025-08-07 09:59:49,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 30 seconds)
2025-08-07 10:01:31,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:01:34,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 524.71252 ± 252.596
2025-08-07 10:01:34,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [94.86071, 69.41656, 581.45746, 540.81726, 476.98743, 717.91797, 779.414, 566.92773, 893.8173, 525.5087]
2025-08-07 10:01:34,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [78.0, 67.0, 210.0, 192.0, 224.0, 266.0, 276.0, 196.0, 310.0, 190.0]
2025-08-07 10:01:34,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 46 seconds)
2025-08-07 10:03:16,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:03:19,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 490.80753 ± 208.100
2025-08-07 10:03:19,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [69.929985, 575.90967, 309.28415, 483.60074, 587.1552, 427.89636, 634.9764, 905.0839, 394.85208, 519.38654]
2025-08-07 10:03:19,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [66.0, 229.0, 171.0, 185.0, 234.0, 170.0, 274.0, 433.0, 173.0, 216.0]
2025-08-07 10:03:19,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes)
2025-08-07 10:05:00,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:05:04,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 613.70221 ± 241.985
2025-08-07 10:05:04,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [227.0294, 538.89185, 758.39514, 1168.6444, 706.0442, 425.43698, 685.21906, 620.83716, 621.64343, 384.88034]
2025-08-07 10:05:04,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [151.0, 210.0, 339.0, 527.0, 302.0, 169.0, 318.0, 303.0, 256.0, 158.0]
2025-08-07 10:05:04,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 14 seconds)
2025-08-07 10:06:49,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:06:53,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 643.83185 ± 451.124
2025-08-07 10:06:53,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [577.3855, 15.863953, 502.3683, 642.6858, 1248.872, 425.35434, 21.166094, 621.691, 863.94476, 1518.9866]
2025-08-07 10:06:53,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [205.0, 30.0, 188.0, 275.0, 565.0, 176.0, 32.0, 351.0, 347.0, 682.0]
2025-08-07 10:06:53,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1226 [INFO]: New best (643.83) for latency ExtremeClogL1U23
2025-08-07 10:06:53,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 31 seconds)
2025-08-07 10:08:34,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:08:37,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 439.45624 ± 167.561
2025-08-07 10:08:37,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [445.43747, 339.06668, 36.619694, 484.06036, 396.8403, 463.79126, 432.80585, 735.04724, 529.76245, 531.1311]
2025-08-07 10:08:37,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [191.0, 157.0, 42.0, 191.0, 164.0, 176.0, 188.0, 412.0, 234.0, 206.0]
2025-08-07 10:08:37,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 45 seconds)
2025-08-07 10:10:18,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:10:21,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1221 [DEBUG]: Total Reward: 542.04364 ± 223.946
2025-08-07 10:10:21,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1222 [DEBUG]: All rewards: [376.58807, 840.4798, 85.581085, 489.2188, 768.3937, 433.655, 859.4927, 597.9951, 499.06, 469.9723]
2025-08-07 10:10:21,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1223 [DEBUG]: All trajectory lengths: [173.0, 360.0, 74.0, 236.0, 315.0, 166.0, 320.0, 241.0, 195.0, 206.0]
2025-08-07 10:10:22,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-walker2d):1251 [DEBUG]: Training session finished
