2025-08-07 07:16:37,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc0-walker2d/ExtremeClogL1U23-bpql-mem24
2025-08-07 07:16:37,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc0-walker2d/ExtremeClogL1U23-bpql-mem24
2025-08-07 07:16:37,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x1459d82abb90>}
2025-08-07 07:16:37,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1111 [DEBUG]: using device: cuda
2025-08-07 07:16:37,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1133 [INFO]: Creating new trainer
2025-08-07 07:16:37,258 baseline-bpql-noiseperc0-walker2d:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=161, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 07:16:37,258 baseline-bpql-noiseperc0-walker2d:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 07:16:39,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1194 [DEBUG]: Starting training session...
2025-08-07 07:16:39,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 1/100
2025-08-07 07:18:12,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:18:13,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 45.17126 ± 2.645
2025-08-07 07:18:13,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [46.18589, 45.114388, 46.53987, 46.19472, 48.458252, 46.269314, 45.966537, 39.510624, 46.670242, 40.802734]
2025-08-07 07:18:13,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [60.0, 59.0, 59.0, 58.0, 59.0, 59.0, 59.0, 89.0, 60.0, 89.0]
2025-08-07 07:18:13,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (45.17) for latency ExtremeClogL1U23
2025-08-07 07:18:13,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 35 minutes, 29 seconds)
2025-08-07 07:19:54,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:19:56,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 101.12212 ± 75.747
2025-08-07 07:19:56,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [62.55402, 239.01027, 63.84913, 56.897728, 47.743336, 60.353558, 67.25573, 263.81912, 78.232216, 71.50604]
2025-08-07 07:19:56,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [97.0, 213.0, 91.0, 173.0, 103.0, 151.0, 101.0, 190.0, 102.0, 112.0]
2025-08-07 07:19:56,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (101.12) for latency ExtremeClogL1U23
2025-08-07 07:19:56,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 40 minutes, 57 seconds)
2025-08-07 07:21:37,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:21:39,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 94.55942 ± 64.722
2025-08-07 07:21:39,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [212.35799, 78.16193, 32.63628, 217.27301, 97.4001, 60.027008, 47.28643, 102.04384, 24.248281, 74.15932]
2025-08-07 07:21:39,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [199.0, 133.0, 157.0, 175.0, 176.0, 99.0, 67.0, 135.0, 151.0, 239.0]
2025-08-07 07:21:39,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 41 minutes, 45 seconds)
2025-08-07 07:23:19,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:23:21,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 111.57137 ± 61.291
2025-08-07 07:23:21,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [148.61104, 156.42729, 78.974815, 43.420902, 35.701233, 123.01582, 80.97725, 256.93994, 99.28673, 92.35869]
2025-08-07 07:23:21,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [146.0, 132.0, 88.0, 53.0, 50.0, 116.0, 133.0, 304.0, 96.0, 97.0]
2025-08-07 07:23:21,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (111.57) for latency ExtremeClogL1U23
2025-08-07 07:23:21,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 40 minutes, 46 seconds)
2025-08-07 07:25:02,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:25:03,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 88.11557 ± 44.332
2025-08-07 07:25:03,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [61.40614, 29.329868, 152.86559, 125.809265, 105.46886, 57.016136, 166.79912, 53.46435, 71.384514, 57.611797]
2025-08-07 07:25:03,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [66.0, 161.0, 131.0, 107.0, 123.0, 66.0, 123.0, 65.0, 69.0, 66.0]
2025-08-07 07:25:03,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 39 minutes, 36 seconds)
2025-08-07 07:26:43,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:26:45,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 167.26132 ± 35.184
2025-08-07 07:26:45,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [165.00331, 158.74811, 186.83516, 185.55035, 179.8152, 231.13467, 119.67292, 124.72896, 120.66606, 200.45845]
2025-08-07 07:26:45,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [117.0, 94.0, 116.0, 142.0, 105.0, 140.0, 101.0, 113.0, 102.0, 134.0]
2025-08-07 07:26:45,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (167.26) for latency ExtremeClogL1U23
2025-08-07 07:26:45,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 40 minutes, 19 seconds)
2025-08-07 07:28:25,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:28:27,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 136.02637 ± 98.221
2025-08-07 07:28:27,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [95.82047, 424.56888, 83.32465, 104.21592, 116.13084, 102.60828, 144.57037, 126.196144, 90.480255, 72.34792]
2025-08-07 07:28:27,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [86.0, 239.0, 112.0, 112.0, 106.0, 138.0, 176.0, 171.0, 94.0, 80.0]
2025-08-07 07:28:27,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 38 minutes, 20 seconds)
2025-08-07 07:30:08,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:30:09,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 136.98639 ± 49.778
2025-08-07 07:30:09,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [70.43676, 132.23463, 200.11288, 113.63211, 82.83344, 215.80382, 84.26304, 194.98038, 153.71552, 121.85119]
2025-08-07 07:30:09,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [84.0, 150.0, 122.0, 101.0, 123.0, 144.0, 87.0, 134.0, 130.0, 241.0]
2025-08-07 07:30:09,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 36 minutes, 26 seconds)
2025-08-07 07:31:50,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:31:51,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 85.22128 ± 42.250
2025-08-07 07:31:51,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [50.912674, 49.915062, 157.86896, 57.402462, 130.71075, 50.461338, 61.777233, 96.86373, 47.75767, 148.54288]
2025-08-07 07:31:51,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [58.0, 58.0, 115.0, 61.0, 133.0, 58.0, 61.0, 98.0, 59.0, 110.0]
2025-08-07 07:31:51,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 34 minutes, 42 seconds)
2025-08-07 07:33:32,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:33:34,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 151.21106 ± 61.949
2025-08-07 07:33:34,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [179.63173, 246.4331, 71.31896, 74.61431, 166.75171, 108.857765, 221.51973, 73.88725, 161.40314, 207.69289]
2025-08-07 07:33:34,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [129.0, 181.0, 79.0, 79.0, 131.0, 101.0, 159.0, 81.0, 135.0, 136.0]
2025-08-07 07:33:34,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 33 minutes, 11 seconds)
2025-08-07 07:35:16,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:35:17,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 173.05164 ± 42.881
2025-08-07 07:35:17,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [119.2134, 150.88504, 214.49179, 103.25237, 198.22972, 234.58568, 222.29836, 133.89719, 175.94118, 177.72151]
2025-08-07 07:35:17,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [106.0, 110.0, 151.0, 99.0, 125.0, 145.0, 133.0, 122.0, 115.0, 134.0]
2025-08-07 07:35:17,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (173.05) for latency ExtremeClogL1U23
2025-08-07 07:35:17,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 32 minutes, 3 seconds)
2025-08-07 07:36:57,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:36:59,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 131.13968 ± 25.149
2025-08-07 07:36:59,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [175.68059, 146.47762, 104.27942, 171.20485, 127.83347, 127.24751, 124.80547, 117.333015, 123.68949, 92.8455]
2025-08-07 07:36:59,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [149.0, 133.0, 124.0, 157.0, 137.0, 134.0, 134.0, 127.0, 210.0, 91.0]
2025-08-07 07:36:59,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 30 minutes, 7 seconds)
2025-08-07 07:38:39,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:38:41,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 147.09659 ± 47.971
2025-08-07 07:38:41,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [131.08633, 107.10089, 147.49113, 176.48164, 127.81881, 125.10288, 154.65143, 117.50592, 276.99908, 106.72795]
2025-08-07 07:38:41,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [137.0, 129.0, 144.0, 150.0, 140.0, 136.0, 150.0, 146.0, 194.0, 127.0]
2025-08-07 07:38:41,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 28 minutes, 29 seconds)
2025-08-07 07:40:23,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:40:24,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 169.11313 ± 71.449
2025-08-07 07:40:24,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [84.99062, 189.15128, 191.36263, 259.82858, 64.92112, 271.61237, 222.30412, 200.19228, 126.28723, 80.481094]
2025-08-07 07:40:24,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [94.0, 137.0, 144.0, 161.0, 78.0, 165.0, 134.0, 134.0, 114.0, 78.0]
2025-08-07 07:40:24,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 27 minutes, 12 seconds)
2025-08-07 07:42:05,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:42:07,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 201.91312 ± 87.195
2025-08-07 07:42:07,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [160.23668, 373.87552, 164.86993, 147.52007, 165.08963, 249.32272, 130.84235, 116.34282, 352.12894, 158.9024]
2025-08-07 07:42:07,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [160.0, 233.0, 217.0, 127.0, 94.0, 147.0, 149.0, 164.0, 205.0, 160.0]
2025-08-07 07:42:07,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (201.91) for latency ExtremeClogL1U23
2025-08-07 07:42:07,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 25 minutes, 32 seconds)
2025-08-07 07:43:47,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:43:49,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 210.36296 ± 60.927
2025-08-07 07:43:49,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [334.8576, 173.82014, 186.3117, 104.116135, 254.27222, 273.52936, 208.18971, 216.84534, 169.8637, 181.8238]
2025-08-07 07:43:49,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [182.0, 123.0, 119.0, 97.0, 138.0, 145.0, 121.0, 140.0, 102.0, 112.0]
2025-08-07 07:43:49,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (210.36) for latency ExtremeClogL1U23
2025-08-07 07:43:49,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 23 minutes, 11 seconds)
2025-08-07 07:45:29,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:45:31,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 150.10036 ± 25.406
2025-08-07 07:45:31,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [124.904724, 210.75082, 138.00021, 145.04887, 136.57547, 130.48984, 174.88863, 166.72125, 127.92824, 145.69557]
2025-08-07 07:45:31,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [103.0, 120.0, 108.0, 98.0, 98.0, 102.0, 110.0, 96.0, 95.0, 92.0]
2025-08-07 07:45:31,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 21 minutes, 42 seconds)
2025-08-07 07:47:12,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:47:13,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 214.91785 ± 28.443
2025-08-07 07:47:13,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [195.33852, 264.80325, 217.10114, 234.33455, 176.06284, 237.6919, 233.67264, 167.76602, 221.33258, 201.0753]
2025-08-07 07:47:13,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [111.0, 142.0, 139.0, 135.0, 117.0, 131.0, 124.0, 121.0, 134.0, 124.0]
2025-08-07 07:47:13,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (214.92) for latency ExtremeClogL1U23
2025-08-07 07:47:13,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 19 minutes, 54 seconds)
2025-08-07 07:48:54,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:48:56,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 206.43437 ± 36.696
2025-08-07 07:48:56,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [201.20982, 184.83774, 184.23857, 194.79083, 193.71986, 185.24762, 183.92221, 311.70956, 218.41293, 206.25441]
2025-08-07 07:48:56,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [126.0, 108.0, 107.0, 116.0, 118.0, 112.0, 108.0, 181.0, 141.0, 125.0]
2025-08-07 07:48:56,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 18 minutes, 8 seconds)
2025-08-07 07:50:37,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:50:38,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 151.68478 ± 57.838
2025-08-07 07:50:38,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [82.85865, 171.49068, 156.77434, 111.05711, 152.97598, 134.69777, 235.70494, 263.22354, 138.05254, 70.012405]
2025-08-07 07:50:38,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [76.0, 144.0, 123.0, 110.0, 127.0, 143.0, 135.0, 168.0, 102.0, 70.0]
2025-08-07 07:50:39,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 16 minutes, 18 seconds)
2025-08-07 07:52:19,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:52:21,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 181.39441 ± 79.286
2025-08-07 07:52:21,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [121.702126, 256.2276, 111.223564, 288.0989, 312.20624, 105.25272, 196.62833, 79.095894, 214.16634, 129.34235]
2025-08-07 07:52:21,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [102.0, 154.0, 98.0, 167.0, 193.0, 100.0, 129.0, 75.0, 145.0, 111.0]
2025-08-07 07:52:21,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 14 minutes, 46 seconds)
2025-08-07 07:54:02,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:54:03,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 92.83314 ± 13.683
2025-08-07 07:54:03,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [111.20572, 90.33062, 94.919106, 79.408516, 90.91884, 76.01369, 94.774376, 120.80919, 93.56111, 76.39026]
2025-08-07 07:54:03,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [133.0, 134.0, 126.0, 106.0, 118.0, 96.0, 138.0, 129.0, 161.0, 110.0]
2025-08-07 07:54:03,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 13 minutes, 14 seconds)
2025-08-07 07:55:45,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:55:47,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 192.86980 ± 132.116
2025-08-07 07:55:47,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [46.812492, 287.97195, 316.7548, 407.89728, 75.826416, 41.60733, 264.48407, 135.1224, 40.13011, 312.091]
2025-08-07 07:55:47,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [56.0, 163.0, 184.0, 229.0, 68.0, 55.0, 137.0, 117.0, 55.0, 273.0]
2025-08-07 07:55:47,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 11 minutes, 46 seconds)
2025-08-07 07:57:27,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:57:29,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 236.28596 ± 102.745
2025-08-07 07:57:29,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [413.68927, 153.98251, 212.58975, 269.60654, 398.9053, 98.070496, 114.32316, 257.79343, 268.58966, 175.30963]
2025-08-07 07:57:29,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [190.0, 113.0, 189.0, 158.0, 183.0, 92.0, 88.0, 136.0, 141.0, 100.0]
2025-08-07 07:57:29,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (236.29) for latency ExtremeClogL1U23
2025-08-07 07:57:29,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 9 minutes, 55 seconds)
2025-08-07 07:59:10,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:59:12,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 171.73921 ± 70.143
2025-08-07 07:59:12,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [64.80318, 137.17958, 153.4386, 82.18745, 158.83546, 290.30875, 170.36548, 158.04366, 275.6133, 226.61673]
2025-08-07 07:59:12,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [124.0, 89.0, 202.0, 171.0, 227.0, 183.0, 154.0, 101.0, 161.0, 130.0]
2025-08-07 07:59:12,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 8 minutes, 17 seconds)
2025-08-07 08:00:52,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:00:53,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 129.46388 ± 83.782
2025-08-07 08:00:53,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [54.055687, 146.1472, 49.00993, 241.3669, 186.94742, 213.14725, 50.20815, 48.781467, 255.14156, 49.833233]
2025-08-07 08:00:53,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [58.0, 84.0, 56.0, 134.0, 88.0, 102.0, 56.0, 56.0, 120.0, 56.0]
2025-08-07 08:00:53,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 6 minutes, 22 seconds)
2025-08-07 08:02:34,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:02:36,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 164.81284 ± 63.824
2025-08-07 08:02:36,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [103.54327, 223.83267, 70.40147, 188.54782, 99.142494, 214.96545, 157.26501, 129.31682, 170.42722, 290.68607]
2025-08-07 08:02:36,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [102.0, 141.0, 65.0, 134.0, 82.0, 154.0, 134.0, 118.0, 117.0, 142.0]
2025-08-07 08:02:36,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 4 minutes, 44 seconds)
2025-08-07 08:04:18,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:04:19,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 232.58318 ± 64.326
2025-08-07 08:04:19,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [251.61174, 191.23428, 387.6524, 241.56992, 220.4748, 202.72856, 217.81223, 145.76012, 176.99773, 289.9901]
2025-08-07 08:04:19,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [120.0, 96.0, 164.0, 110.0, 113.0, 94.0, 112.0, 74.0, 85.0, 127.0]
2025-08-07 08:04:19,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 3 minutes, 5 seconds)
2025-08-07 08:05:59,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:06:00,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 246.21196 ± 50.549
2025-08-07 08:06:00,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [304.51572, 287.26938, 258.45776, 278.57745, 269.70874, 157.85252, 174.56407, 182.30093, 267.41724, 281.45572]
2025-08-07 08:06:00,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [141.0, 140.0, 121.0, 141.0, 138.0, 78.0, 85.0, 88.0, 146.0, 125.0]
2025-08-07 08:06:00,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (246.21) for latency ExtremeClogL1U23
2025-08-07 08:06:00,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 1 minute)
2025-08-07 08:07:42,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:07:44,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 200.70180 ± 64.882
2025-08-07 08:07:44,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [289.3622, 280.5962, 314.20157, 181.22269, 141.39864, 161.89317, 132.04738, 141.57405, 200.87996, 163.84215]
2025-08-07 08:07:44,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [292.0, 213.0, 156.0, 134.0, 114.0, 170.0, 111.0, 115.0, 233.0, 123.0]
2025-08-07 08:07:44,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 59 minutes, 30 seconds)
2025-08-07 08:09:26,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:09:27,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 152.30643 ± 23.580
2025-08-07 08:09:27,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [125.6981, 171.32709, 186.70891, 148.03517, 130.84393, 168.36942, 138.34094, 161.06448, 179.70793, 112.96827]
2025-08-07 08:09:27,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [115.0, 125.0, 144.0, 116.0, 108.0, 140.0, 114.0, 125.0, 190.0, 106.0]
2025-08-07 08:09:27,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 58 minutes, 20 seconds)
2025-08-07 08:11:08,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:11:11,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 194.64493 ± 122.992
2025-08-07 08:11:11,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [155.24632, 119.3481, 192.71318, 212.76634, 169.23112, 116.92395, 148.07365, 161.37111, 552.3395, 118.436035]
2025-08-07 08:11:11,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [175.0, 224.0, 148.0, 155.0, 163.0, 181.0, 166.0, 158.0, 358.0, 96.0]
2025-08-07 08:11:11,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 56 minutes, 42 seconds)
2025-08-07 08:12:52,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:12:54,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 252.03445 ± 84.265
2025-08-07 08:12:54,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [156.57271, 133.39018, 155.84438, 321.696, 365.1177, 336.35284, 184.5302, 288.4625, 340.02652, 238.35138]
2025-08-07 08:12:54,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [146.0, 101.0, 186.0, 170.0, 249.0, 178.0, 127.0, 190.0, 240.0, 175.0]
2025-08-07 08:12:54,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (252.03) for latency ExtremeClogL1U23
2025-08-07 08:12:54,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 54 minutes, 55 seconds)
2025-08-07 08:14:38,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:14:40,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 178.92249 ± 87.912
2025-08-07 08:14:40,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [107.446075, 427.79333, 162.60092, 214.25621, 165.49689, 158.30403, 120.766106, 120.75797, 143.78032, 168.02324]
2025-08-07 08:14:40,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [293.0, 280.0, 221.0, 232.0, 175.0, 167.0, 205.0, 184.0, 155.0, 152.0]
2025-08-07 08:14:40,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 54 minutes, 26 seconds)
2025-08-07 08:16:20,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:16:23,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 381.91000 ± 113.178
2025-08-07 08:16:23,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [185.6855, 373.9516, 238.24422, 435.42935, 496.5636, 385.04724, 363.33624, 555.51874, 497.8987, 287.42502]
2025-08-07 08:16:23,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [265.0, 223.0, 193.0, 276.0, 269.0, 202.0, 223.0, 287.0, 351.0, 243.0]
2025-08-07 08:16:23,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (381.91) for latency ExtremeClogL1U23
2025-08-07 08:16:23,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 52 minutes, 32 seconds)
2025-08-07 08:18:05,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:18:07,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 363.16223 ± 62.974
2025-08-07 08:18:07,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [365.13547, 298.5764, 358.80612, 238.81017, 465.98724, 432.31253, 343.59515, 368.92343, 422.62753, 336.8484]
2025-08-07 08:18:07,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [189.0, 240.0, 206.0, 198.0, 236.0, 207.0, 171.0, 205.0, 225.0, 222.0]
2025-08-07 08:18:07,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 50 minutes, 55 seconds)
2025-08-07 08:19:49,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:19:52,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 371.66489 ± 37.882
2025-08-07 08:19:52,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [374.20703, 319.18134, 397.52386, 387.75897, 380.9714, 346.82455, 344.68222, 425.48923, 313.75653, 426.25375]
2025-08-07 08:19:52,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [186.0, 264.0, 200.0, 192.0, 191.0, 165.0, 167.0, 220.0, 149.0, 232.0]
2025-08-07 08:19:52,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 49 minutes, 22 seconds)
2025-08-07 08:21:33,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:21:35,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 200.44507 ± 115.030
2025-08-07 08:21:35,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [142.2397, 169.34462, 157.56714, 537.7624, 149.80789, 137.724, 146.78123, 147.07295, 208.24544, 207.90524]
2025-08-07 08:21:35,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [113.0, 136.0, 136.0, 215.0, 113.0, 110.0, 114.0, 116.0, 138.0, 136.0]
2025-08-07 08:21:35,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 47 minutes, 34 seconds)
2025-08-07 08:23:16,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:23:19,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 260.13254 ± 124.938
2025-08-07 08:23:19,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [448.66086, 392.14877, 148.60857, 137.36292, 427.19305, 140.97371, 138.11623, 357.8721, 244.09995, 166.2895]
2025-08-07 08:23:19,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [205.0, 178.0, 132.0, 128.0, 300.0, 126.0, 115.0, 169.0, 226.0, 126.0]
2025-08-07 08:23:19,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 45 minutes, 23 seconds)
2025-08-07 08:25:00,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:25:03,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 341.79633 ± 34.329
2025-08-07 08:25:03,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [299.8865, 293.1435, 312.6945, 354.3809, 326.07983, 370.809, 371.20578, 408.83905, 350.5615, 330.3626]
2025-08-07 08:25:03,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [169.0, 153.0, 170.0, 206.0, 181.0, 201.0, 188.0, 234.0, 187.0, 174.0]
2025-08-07 08:25:03,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 43 minutes, 55 seconds)
2025-08-07 08:26:43,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:26:45,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 148.00572 ± 63.687
2025-08-07 08:26:45,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [104.15487, 176.11469, 158.47087, 128.17451, 135.69164, 92.47286, 323.99554, 132.1414, 96.67992, 132.16104]
2025-08-07 08:26:45,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [162.0, 207.0, 193.0, 161.0, 177.0, 169.0, 185.0, 125.0, 211.0, 176.0]
2025-08-07 08:26:45,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 41 minutes, 49 seconds)
2025-08-07 08:28:29,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:28:31,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 235.19995 ± 82.746
2025-08-07 08:28:31,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [152.60016, 337.09543, 148.15231, 176.25566, 196.8181, 363.73053, 187.71611, 206.75403, 371.53134, 211.34595]
2025-08-07 08:28:31,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [120.0, 197.0, 120.0, 156.0, 170.0, 195.0, 228.0, 298.0, 205.0, 147.0]
2025-08-07 08:28:31,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 40 minutes, 24 seconds)
2025-08-07 08:30:12,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:30:14,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 284.63074 ± 129.961
2025-08-07 08:30:14,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [362.68274, 435.38943, 371.93088, 135.74657, 149.1972, 104.55804, 144.7967, 469.37698, 352.0425, 320.58633]
2025-08-07 08:30:14,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [205.0, 205.0, 201.0, 150.0, 144.0, 105.0, 161.0, 234.0, 193.0, 177.0]
2025-08-07 08:30:14,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 38 minutes, 43 seconds)
2025-08-07 08:31:55,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:31:58,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 391.34235 ± 46.641
2025-08-07 08:31:58,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [374.80066, 381.8048, 355.42642, 428.106, 370.26343, 502.1368, 363.50827, 369.01978, 335.94775, 432.40927]
2025-08-07 08:31:58,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [207.0, 228.0, 204.0, 245.0, 216.0, 300.0, 209.0, 213.0, 191.0, 270.0]
2025-08-07 08:31:58,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (391.34) for latency ExtremeClogL1U23
2025-08-07 08:31:58,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 36 minutes, 50 seconds)
2025-08-07 08:33:40,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:33:44,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 499.14145 ± 137.163
2025-08-07 08:33:44,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [441.91235, 391.26172, 873.2433, 528.40295, 404.41953, 425.6038, 466.39093, 437.7308, 591.5318, 430.91742]
2025-08-07 08:33:44,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [230.0, 298.0, 471.0, 317.0, 214.0, 267.0, 273.0, 234.0, 374.0, 264.0]
2025-08-07 08:33:44,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (499.14) for latency ExtremeClogL1U23
2025-08-07 08:33:44,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 35 minutes, 29 seconds)
2025-08-07 08:35:27,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:35:30,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 421.43506 ± 119.127
2025-08-07 08:35:30,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [366.8912, 372.68665, 494.41583, 522.47266, 429.21606, 352.19656, 341.85498, 306.25696, 713.39105, 314.96857]
2025-08-07 08:35:30,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [242.0, 223.0, 236.0, 305.0, 280.0, 252.0, 202.0, 181.0, 429.0, 190.0]
2025-08-07 08:35:30,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 34 minutes, 27 seconds)
2025-08-07 08:37:10,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:37:13,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 419.65805 ± 126.960
2025-08-07 08:37:13,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [446.20245, 333.58896, 392.07162, 265.62674, 465.03452, 336.95404, 475.7867, 548.06354, 685.30225, 247.94954]
2025-08-07 08:37:13,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [259.0, 239.0, 215.0, 155.0, 277.0, 189.0, 249.0, 340.0, 401.0, 249.0]
2025-08-07 08:37:13,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 32 minutes, 13 seconds)
2025-08-07 08:38:54,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:38:58,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 435.85294 ± 71.851
2025-08-07 08:38:58,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [431.06558, 451.65018, 504.00006, 575.3791, 286.44598, 456.84656, 402.39676, 402.90067, 455.45068, 392.3937]
2025-08-07 08:38:58,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [469.0, 259.0, 304.0, 309.0, 173.0, 283.0, 241.0, 191.0, 315.0, 246.0]
2025-08-07 08:38:58,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 30 minutes, 46 seconds)
2025-08-07 08:40:41,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:40:44,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 399.95026 ± 43.895
2025-08-07 08:40:44,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [379.80264, 371.0754, 478.66553, 424.42517, 386.85068, 372.6416, 449.0248, 388.85245, 317.3135, 430.8506]
2025-08-07 08:40:44,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [201.0, 207.0, 256.0, 240.0, 182.0, 217.0, 252.0, 202.0, 190.0, 228.0]
2025-08-07 08:40:44,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 29 minutes, 27 seconds)
2025-08-07 08:42:25,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:42:27,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 216.52153 ± 140.513
2025-08-07 08:42:27,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [171.8528, 527.97253, 161.92957, 440.8112, 85.25468, 147.0818, 157.0024, 236.10461, 117.955536, 119.250305]
2025-08-07 08:42:27,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [136.0, 280.0, 154.0, 263.0, 97.0, 130.0, 206.0, 144.0, 123.0, 152.0]
2025-08-07 08:42:27,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 27 minutes, 13 seconds)
2025-08-07 08:44:08,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:44:11,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 357.78317 ± 121.192
2025-08-07 08:44:11,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [171.71785, 314.80814, 517.73694, 328.67255, 359.92548, 498.9188, 160.49016, 479.8421, 302.95447, 442.7654]
2025-08-07 08:44:11,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [215.0, 188.0, 274.0, 181.0, 211.0, 307.0, 138.0, 240.0, 179.0, 245.0]
2025-08-07 08:44:11,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 25 minutes, 6 seconds)
2025-08-07 08:45:53,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:45:55,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 294.41949 ± 34.320
2025-08-07 08:45:55,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [263.3192, 290.2743, 271.149, 372.49844, 290.08823, 271.11145, 288.76172, 269.65115, 280.4477, 346.89386]
2025-08-07 08:45:55,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [152.0, 166.0, 157.0, 217.0, 162.0, 151.0, 168.0, 151.0, 160.0, 224.0]
2025-08-07 08:45:55,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 23 minutes, 34 seconds)
2025-08-07 08:47:37,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:47:41,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 400.95462 ± 168.985
2025-08-07 08:47:41,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [399.20773, 458.2762, 270.85425, 244.02707, 668.8543, 220.90678, 710.2379, 205.58334, 449.45163, 382.14676]
2025-08-07 08:47:41,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [212.0, 252.0, 226.0, 204.0, 335.0, 234.0, 444.0, 221.0, 225.0, 244.0]
2025-08-07 08:47:41,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 21 minutes, 54 seconds)
2025-08-07 08:49:21,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:49:24,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 318.18851 ± 75.693
2025-08-07 08:49:24,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [326.69476, 431.998, 253.67174, 283.25104, 297.50568, 362.01123, 148.86227, 317.3144, 389.05756, 371.51825]
2025-08-07 08:49:24,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [192.0, 262.0, 240.0, 160.0, 161.0, 218.0, 152.0, 288.0, 209.0, 239.0]
2025-08-07 08:49:24,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 19 minutes, 46 seconds)
2025-08-07 08:51:07,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:51:10,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 326.91022 ± 70.735
2025-08-07 08:51:10,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [311.01132, 315.66663, 255.74017, 329.06705, 472.20822, 406.09122, 393.65994, 255.42928, 264.39282, 265.83527]
2025-08-07 08:51:10,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [177.0, 178.0, 148.0, 185.0, 229.0, 204.0, 215.0, 147.0, 160.0, 162.0]
2025-08-07 08:51:10,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 18 minutes, 21 seconds)
2025-08-07 08:52:50,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:52:52,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 354.63019 ± 67.092
2025-08-07 08:52:52,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [307.02563, 237.35806, 350.20355, 371.13635, 253.7751, 403.41873, 344.45248, 410.432, 445.53622, 422.96387]
2025-08-07 08:52:52,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [153.0, 130.0, 169.0, 193.0, 137.0, 205.0, 174.0, 208.0, 229.0, 208.0]
2025-08-07 08:52:52,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 16 minutes, 28 seconds)
2025-08-07 08:54:34,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:54:37,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 341.72504 ± 54.306
2025-08-07 08:54:37,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [345.5054, 368.44604, 243.06696, 403.0147, 233.16249, 366.25687, 360.73044, 388.11783, 359.33585, 349.61356]
2025-08-07 08:54:37,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [187.0, 199.0, 135.0, 223.0, 132.0, 197.0, 195.0, 206.0, 196.0, 191.0]
2025-08-07 08:54:37,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 14 minutes, 42 seconds)
2025-08-07 08:56:20,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:56:22,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 301.06839 ± 40.581
2025-08-07 08:56:22,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [265.67264, 304.39584, 368.47205, 354.94052, 266.80612, 254.97475, 262.4977, 353.17682, 291.71158, 288.03574]
2025-08-07 08:56:22,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [142.0, 151.0, 178.0, 171.0, 144.0, 139.0, 137.0, 169.0, 148.0, 149.0]
2025-08-07 08:56:22,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 13 minutes, 2 seconds)
2025-08-07 08:58:01,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:58:04,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 415.71222 ± 118.710
2025-08-07 08:58:04,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [321.91922, 544.227, 355.06223, 461.03662, 334.34042, 347.56787, 313.4889, 415.43683, 357.23, 706.8128]
2025-08-07 08:58:04,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [165.0, 259.0, 170.0, 219.0, 164.0, 169.0, 166.0, 198.0, 197.0, 376.0]
2025-08-07 08:58:04,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 11 minutes, 3 seconds)
2025-08-07 08:59:46,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:59:47,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 280.56009 ± 17.566
2025-08-07 08:59:47,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [278.99384, 279.67798, 305.63242, 291.43893, 277.9548, 271.14746, 254.76242, 312.93698, 258.3857, 274.6701]
2025-08-07 08:59:47,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [147.0, 144.0, 158.0, 154.0, 146.0, 143.0, 137.0, 165.0, 136.0, 145.0]
2025-08-07 08:59:47,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 9 minutes, 3 seconds)
2025-08-07 09:01:30,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:01:33,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 426.52094 ± 94.657
2025-08-07 09:01:33,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [390.66776, 239.5415, 518.89325, 471.5354, 558.8524, 432.96515, 490.92273, 486.24048, 367.63947, 307.95132]
2025-08-07 09:01:33,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [190.0, 146.0, 266.0, 260.0, 335.0, 204.0, 334.0, 253.0, 218.0, 168.0]
2025-08-07 09:01:33,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 7 minutes, 40 seconds)
2025-08-07 09:03:14,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:03:16,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 368.76019 ± 70.111
2025-08-07 09:03:16,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [458.61115, 365.3099, 268.16336, 309.81897, 285.15866, 316.7575, 398.34073, 482.49954, 371.2149, 431.72745]
2025-08-07 09:03:16,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [241.0, 202.0, 151.0, 168.0, 161.0, 172.0, 202.0, 250.0, 191.0, 221.0]
2025-08-07 09:03:16,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 5 minutes, 48 seconds)
2025-08-07 09:04:58,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:05:00,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 298.29645 ± 186.816
2025-08-07 09:05:00,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [530.6644, 157.06696, 111.37551, 440.66113, 276.44952, 156.68881, 106.37248, 437.85724, 129.24849, 636.5796]
2025-08-07 09:05:00,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [263.0, 137.0, 108.0, 228.0, 162.0, 131.0, 105.0, 229.0, 133.0, 320.0]
2025-08-07 09:05:00,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 3 minutes, 50 seconds)
2025-08-07 09:06:44,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:06:47,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 337.19919 ± 66.813
2025-08-07 09:06:47,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [495.97812, 322.60355, 396.9814, 348.16916, 313.9128, 310.46738, 353.00845, 322.61026, 253.543, 254.71793]
2025-08-07 09:06:47,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [247.0, 188.0, 222.0, 201.0, 182.0, 176.0, 210.0, 188.0, 143.0, 142.0]
2025-08-07 09:06:47,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 2 minutes, 44 seconds)
2025-08-07 09:08:28,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:08:31,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 308.14603 ± 112.453
2025-08-07 09:08:31,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [351.19943, 395.18848, 497.9042, 356.4074, 367.96774, 180.30186, 394.64557, 173.68398, 155.44661, 208.71494]
2025-08-07 09:08:31,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [195.0, 282.0, 384.0, 206.0, 210.0, 189.0, 227.0, 195.0, 195.0, 206.0]
2025-08-07 09:08:31,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 1 minute, 1 second)
2025-08-07 09:10:12,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:10:15,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 373.89606 ± 110.494
2025-08-07 09:10:15,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [311.7511, 295.96555, 355.68686, 612.2647, 358.9029, 250.64967, 316.14752, 333.62198, 346.07095, 557.89923]
2025-08-07 09:10:15,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [188.0, 161.0, 203.0, 302.0, 215.0, 189.0, 186.0, 194.0, 187.0, 259.0]
2025-08-07 09:10:15,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 67/100 (estimated time remaining: 59 minutes, 9 seconds)
2025-08-07 09:11:57,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:12:00,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 380.74384 ± 104.419
2025-08-07 09:12:00,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [320.57452, 502.9617, 324.97247, 405.9395, 316.64017, 366.20407, 255.41103, 631.5299, 360.8996, 322.3055]
2025-08-07 09:12:00,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [159.0, 200.0, 161.0, 183.0, 158.0, 184.0, 129.0, 247.0, 167.0, 154.0]
2025-08-07 09:12:00,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 68/100 (estimated time remaining: 57 minutes, 36 seconds)
2025-08-07 09:13:40,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:13:44,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 521.87512 ± 63.397
2025-08-07 09:13:44,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [468.5707, 557.8189, 516.8408, 682.3529, 452.55777, 531.86365, 518.1161, 527.8006, 449.38425, 513.44495]
2025-08-07 09:13:44,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [220.0, 234.0, 228.0, 334.0, 186.0, 317.0, 238.0, 220.0, 187.0, 208.0]
2025-08-07 09:13:44,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (521.88) for latency ExtremeClogL1U23
2025-08-07 09:13:44,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 69/100 (estimated time remaining: 55 minutes, 50 seconds)
2025-08-07 09:15:25,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:15:27,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 319.55667 ± 50.011
2025-08-07 09:15:27,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [310.1411, 307.98306, 267.2898, 307.92365, 425.7522, 376.677, 359.43457, 263.76248, 307.33286, 269.27008]
2025-08-07 09:15:27,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [158.0, 157.0, 133.0, 149.0, 204.0, 180.0, 160.0, 133.0, 151.0, 128.0]
2025-08-07 09:15:27,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 70/100 (estimated time remaining: 53 minutes, 46 seconds)
2025-08-07 09:17:11,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:17:13,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 297.34854 ± 59.292
2025-08-07 09:17:13,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [424.1078, 245.90176, 246.79015, 230.87178, 353.62076, 244.80571, 300.53073, 319.44678, 265.2388, 342.1711]
2025-08-07 09:17:13,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [242.0, 130.0, 131.0, 120.0, 168.0, 128.0, 145.0, 152.0, 135.0, 157.0]
2025-08-07 09:17:13,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 71/100 (estimated time remaining: 52 minutes, 11 seconds)
2025-08-07 09:18:54,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:18:56,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 373.82263 ± 120.098
2025-08-07 09:18:56,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [378.9452, 314.06088, 481.73346, 319.4057, 388.3985, 567.0669, 480.23526, 98.93199, 375.4338, 334.0146]
2025-08-07 09:18:56,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [223.0, 152.0, 224.0, 159.0, 175.0, 243.0, 227.0, 109.0, 174.0, 164.0]
2025-08-07 09:18:56,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 72/100 (estimated time remaining: 50 minutes, 24 seconds)
2025-08-07 09:20:38,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:20:43,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 600.32697 ± 188.199
2025-08-07 09:20:43,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [197.33354, 612.28186, 714.5669, 919.7419, 435.32697, 518.97797, 566.7143, 642.28217, 596.1819, 799.8621]
2025-08-07 09:20:43,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [183.0, 394.0, 337.0, 428.0, 244.0, 235.0, 319.0, 310.0, 284.0, 422.0]
2025-08-07 09:20:43,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (600.33) for latency ExtremeClogL1U23
2025-08-07 09:20:43,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 73/100 (estimated time remaining: 48 minutes, 48 seconds)
2025-08-07 09:22:26,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:22:29,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 463.97809 ± 177.072
2025-08-07 09:22:29,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [437.79413, 565.8807, 318.64963, 193.10117, 536.08624, 583.829, 540.19293, 740.1686, 568.8519, 155.22661]
2025-08-07 09:22:29,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [215.0, 222.0, 173.0, 128.0, 244.0, 262.0, 235.0, 326.0, 235.0, 124.0]
2025-08-07 09:22:29,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 74/100 (estimated time remaining: 47 minutes, 17 seconds)
2025-08-07 09:24:10,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:24:12,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 391.81113 ± 38.826
2025-08-07 09:24:12,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [377.4923, 428.13522, 373.37268, 348.31808, 397.0423, 367.0705, 474.40756, 356.7128, 362.04675, 433.51312]
2025-08-07 09:24:12,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [178.0, 202.0, 181.0, 171.0, 207.0, 175.0, 254.0, 180.0, 178.0, 209.0]
2025-08-07 09:24:12,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 75/100 (estimated time remaining: 45 minutes, 29 seconds)
2025-08-07 09:25:54,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:25:57,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 586.17523 ± 92.544
2025-08-07 09:25:57,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [771.4843, 580.9008, 531.5571, 618.2884, 586.6924, 514.2826, 469.92264, 560.78125, 728.8003, 499.04276]
2025-08-07 09:25:57,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [339.0, 275.0, 256.0, 270.0, 269.0, 254.0, 206.0, 258.0, 334.0, 216.0]
2025-08-07 09:25:57,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 76/100 (estimated time remaining: 43 minutes, 44 seconds)
2025-08-07 09:27:41,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:27:44,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 586.17584 ± 138.452
2025-08-07 09:27:44,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [723.17444, 417.5606, 443.72305, 825.664, 541.6398, 404.1752, 524.14075, 579.47003, 727.7926, 674.418]
2025-08-07 09:27:44,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [335.0, 207.0, 227.0, 391.0, 220.0, 191.0, 206.0, 236.0, 331.0, 306.0]
2025-08-07 09:27:44,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 77/100 (estimated time remaining: 42 minutes, 12 seconds)
2025-08-07 09:29:26,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:29:29,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 556.40027 ± 254.529
2025-08-07 09:29:29,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [136.53969, 131.15633, 862.7075, 785.80554, 419.52835, 404.3319, 640.06464, 673.5805, 718.4231, 791.86536]
2025-08-07 09:29:29,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [122.0, 123.0, 398.0, 329.0, 171.0, 260.0, 333.0, 306.0, 326.0, 373.0]
2025-08-07 09:29:29,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 78/100 (estimated time remaining: 40 minutes, 22 seconds)
2025-08-07 09:31:11,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:31:15,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 634.95770 ± 199.415
2025-08-07 09:31:15,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [599.00494, 1003.9341, 615.1952, 149.5206, 611.42175, 761.384, 589.6153, 645.6264, 693.07355, 680.80133]
2025-08-07 09:31:15,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [259.0, 427.0, 260.0, 117.0, 229.0, 362.0, 285.0, 287.0, 315.0, 257.0]
2025-08-07 09:31:15,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (634.96) for latency ExtremeClogL1U23
2025-08-07 09:31:15,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 79/100 (estimated time remaining: 38 minutes, 33 seconds)
2025-08-07 09:32:58,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:33:02,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 722.53473 ± 319.482
2025-08-07 09:33:02,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [788.6771, 595.51874, 385.623, 588.4687, 881.4374, 385.6414, 913.32117, 828.04474, 382.99948, 1475.6151]
2025-08-07 09:33:02,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [311.0, 238.0, 170.0, 237.0, 379.0, 173.0, 372.0, 320.0, 172.0, 670.0]
2025-08-07 09:33:02,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (722.53) for latency ExtremeClogL1U23
2025-08-07 09:33:02,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 80/100 (estimated time remaining: 37 minutes, 6 seconds)
2025-08-07 09:34:44,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:34:47,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 510.30875 ± 221.115
2025-08-07 09:34:47,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [575.19806, 360.04468, 1041.281, 309.89932, 682.0891, 494.5984, 323.63608, 638.72284, 346.40884, 331.20886]
2025-08-07 09:34:47,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [246.0, 188.0, 653.0, 179.0, 313.0, 198.0, 155.0, 284.0, 196.0, 188.0]
2025-08-07 09:34:47,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 81/100 (estimated time remaining: 35 minutes, 19 seconds)
2025-08-07 09:36:28,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:36:32,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 618.55847 ± 271.272
2025-08-07 09:36:32,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [446.77386, 557.4079, 525.8716, 153.81665, 1219.6637, 508.51974, 913.6099, 687.4514, 512.98315, 659.48676]
2025-08-07 09:36:32,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [205.0, 239.0, 226.0, 124.0, 554.0, 220.0, 363.0, 301.0, 219.0, 313.0]
2025-08-07 09:36:32,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 82/100 (estimated time remaining: 33 minutes, 24 seconds)
2025-08-07 09:38:17,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:38:21,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 708.32977 ± 152.988
2025-08-07 09:38:21,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [487.38525, 917.6986, 831.08826, 675.87585, 520.089, 540.82904, 710.9651, 908.1355, 843.28595, 647.9454]
2025-08-07 09:38:21,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [202.0, 355.0, 331.0, 282.0, 214.0, 240.0, 283.0, 343.0, 339.0, 266.0]
2025-08-07 09:38:21,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 83/100 (estimated time remaining: 31 minutes, 53 seconds)
2025-08-07 09:40:00,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:40:05,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 820.77997 ± 222.635
2025-08-07 09:40:05,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [409.154, 923.2606, 1019.10095, 607.3514, 724.74255, 572.55286, 1064.1991, 977.12964, 1096.9147, 813.3936]
2025-08-07 09:40:05,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [189.0, 401.0, 439.0, 278.0, 325.0, 261.0, 467.0, 412.0, 455.0, 350.0]
2025-08-07 09:40:05,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (820.78) for latency ExtremeClogL1U23
2025-08-07 09:40:05,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 84/100 (estimated time remaining: 30 minutes, 1 second)
2025-08-07 09:41:47,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:41:52,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 948.73224 ± 327.410
2025-08-07 09:41:52,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [841.0047, 1173.7548, 633.59607, 616.28235, 1721.5504, 825.7815, 674.6996, 930.43286, 1255.5417, 814.6775]
2025-08-07 09:41:52,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [334.0, 491.0, 303.0, 267.0, 795.0, 350.0, 320.0, 407.0, 531.0, 363.0]
2025-08-07 09:41:52,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (948.73) for latency ExtremeClogL1U23
2025-08-07 09:41:52,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 85/100 (estimated time remaining: 28 minutes, 16 seconds)
2025-08-07 09:43:40,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:43:46,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 947.69531 ± 632.460
2025-08-07 09:43:46,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [535.5258, 572.9052, 1185.8414, 2151.4373, 416.02298, 436.62997, 2134.3584, 784.95056, 589.9089, 669.372]
2025-08-07 09:43:46,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [331.0, 313.0, 599.0, 937.0, 224.0, 269.0, 1000.0, 413.0, 242.0, 281.0]
2025-08-07 09:43:46,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 86/100 (estimated time remaining: 26 minutes, 57 seconds)
2025-08-07 09:45:24,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:45:29,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 829.20282 ± 480.838
2025-08-07 09:45:29,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1812.1769, 478.30594, 412.63602, 383.34476, 664.02234, 1014.75934, 1344.1505, 1289.2668, 348.7557, 544.6098]
2025-08-07 09:45:29,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [821.0, 214.0, 196.0, 172.0, 286.0, 471.0, 591.0, 593.0, 176.0, 235.0]
2025-08-07 09:45:29,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 87/100 (estimated time remaining: 25 minutes, 4 seconds)
2025-08-07 09:47:12,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:47:18,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 949.93182 ± 478.168
2025-08-07 09:47:18,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [845.33514, 468.9475, 638.45667, 704.1921, 824.9873, 742.8353, 887.45764, 2131.0781, 1555.6915, 700.337]
2025-08-07 09:47:18,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [355.0, 223.0, 347.0, 335.0, 369.0, 325.0, 390.0, 1000.0, 641.0, 332.0]
2025-08-07 09:47:18,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (949.93) for latency ExtremeClogL1U23
2025-08-07 09:47:18,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 88/100 (estimated time remaining: 23 minutes, 16 seconds)
2025-08-07 09:49:00,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:49:04,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 725.29529 ± 312.336
2025-08-07 09:49:04,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [391.22247, 364.55255, 610.0496, 873.71313, 1468.569, 682.50836, 848.2385, 835.0464, 389.30438, 789.7492]
2025-08-07 09:49:04,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [165.0, 159.0, 269.0, 384.0, 643.0, 324.0, 383.0, 349.0, 199.0, 350.0]
2025-08-07 09:49:04,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 89/100 (estimated time remaining: 21 minutes, 34 seconds)
2025-08-07 09:50:49,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:50:56,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 1066.01831 ± 357.635
2025-08-07 09:50:56,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1824.575, 419.65637, 871.4992, 912.1153, 1115.8431, 1094.4679, 1132.1396, 1452.9047, 1026.7466, 810.2357]
2025-08-07 09:50:56,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [825.0, 276.0, 370.0, 390.0, 488.0, 464.0, 545.0, 647.0, 438.0, 363.0]
2025-08-07 09:50:56,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (1066.02) for latency ExtremeClogL1U23
2025-08-07 09:50:56,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 90/100 (estimated time remaining: 19 minutes, 55 seconds)
2025-08-07 09:52:40,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:52:43,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 558.52545 ± 161.022
2025-08-07 09:52:43,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [506.23212, 433.87778, 599.97504, 519.3874, 469.52054, 1018.2377, 504.78, 492.84607, 584.85895, 455.53897]
2025-08-07 09:52:43,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [217.0, 214.0, 271.0, 219.0, 190.0, 455.0, 211.0, 207.0, 325.0, 219.0]
2025-08-07 09:52:43,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 91/100 (estimated time remaining: 17 minutes, 53 seconds)
2025-08-07 09:54:24,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:54:29,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 951.55322 ± 476.924
2025-08-07 09:54:29,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [945.4579, 701.36816, 595.93134, 1664.1951, 589.5697, 2041.2194, 586.18884, 950.36615, 623.019, 818.21686]
2025-08-07 09:54:29,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [384.0, 308.0, 243.0, 651.0, 256.0, 838.0, 248.0, 389.0, 256.0, 346.0]
2025-08-07 09:54:29,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 92/100 (estimated time remaining: 16 minutes, 11 seconds)
2025-08-07 09:56:10,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:56:16,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 998.54950 ± 385.042
2025-08-07 09:56:16,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [554.7101, 862.77014, 757.1799, 622.09064, 895.51324, 783.2347, 1530.8196, 1417.4221, 838.156, 1723.5989]
2025-08-07 09:56:16,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [232.0, 350.0, 299.0, 329.0, 391.0, 320.0, 712.0, 608.0, 336.0, 711.0]
2025-08-07 09:56:16,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 93/100 (estimated time remaining: 14 minutes, 21 seconds)
2025-08-07 09:57:59,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:58:01,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 463.63135 ± 63.638
2025-08-07 09:58:01,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [581.19885, 495.7138, 454.5362, 462.33386, 531.0458, 483.89365, 388.74344, 479.43353, 397.89993, 361.51395]
2025-08-07 09:58:01,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [254.0, 220.0, 213.0, 233.0, 238.0, 209.0, 177.0, 213.0, 178.0, 185.0]
2025-08-07 09:58:01,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 94/100 (estimated time remaining: 12 minutes, 32 seconds)
2025-08-07 09:59:46,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:59:50,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 786.92255 ± 160.252
2025-08-07 09:59:50,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [975.3399, 900.9761, 786.1422, 810.28644, 575.9827, 962.53815, 747.512, 517.62604, 625.8828, 966.9397]
2025-08-07 09:59:50,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [408.0, 358.0, 317.0, 347.0, 246.0, 424.0, 324.0, 231.0, 294.0, 406.0]
2025-08-07 09:59:50,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 41 seconds)
2025-08-07 10:01:34,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:01:38,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 862.31396 ± 143.707
2025-08-07 10:01:38,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [875.36035, 798.913, 708.8496, 590.58606, 774.3107, 896.89233, 857.69055, 1057.1216, 1049.5853, 1013.83026]
2025-08-07 10:01:38,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [370.0, 307.0, 283.0, 267.0, 338.0, 347.0, 320.0, 440.0, 436.0, 416.0]
2025-08-07 10:01:38,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 54 seconds)
2025-08-07 10:03:23,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:03:28,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 875.30725 ± 630.473
2025-08-07 10:03:28,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [48.34279, 867.04474, 2364.6353, 722.8223, 777.4454, 689.3792, 1408.9583, 789.67413, 46.53422, 1038.2368]
2025-08-07 10:03:28,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [47.0, 348.0, 1000.0, 292.0, 309.0, 280.0, 538.0, 310.0, 45.0, 426.0]
2025-08-07 10:03:28,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 11 seconds)
2025-08-07 10:05:06,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:05:10,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 786.62482 ± 535.915
2025-08-07 10:05:10,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [634.4434, 563.4435, 602.48193, 607.7049, 609.4693, 2392.1055, 586.9822, 605.383, 675.43365, 588.80066]
2025-08-07 10:05:10,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [230.0, 212.0, 228.0, 232.0, 230.0, 1000.0, 225.0, 227.0, 243.0, 220.0]
2025-08-07 10:05:10,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 20 seconds)
2025-08-07 10:06:58,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:07:06,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 1287.70642 ± 754.839
2025-08-07 10:07:06,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [677.0905, 624.6286, 1100.5123, 2524.2048, 708.0616, 2315.154, 779.95886, 960.9541, 763.06805, 2423.4314]
2025-08-07 10:07:06,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [270.0, 257.0, 465.0, 1000.0, 294.0, 943.0, 308.0, 393.0, 315.0, 1000.0]
2025-08-07 10:07:06,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (1287.71) for latency ExtremeClogL1U23
2025-08-07 10:07:06,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 37 seconds)
2025-08-07 10:08:43,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:08:49,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 979.89807 ± 686.177
2025-08-07 10:08:49,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [540.34125, 1098.2966, 2358.6462, 558.3067, 602.5088, 641.1204, 573.2415, 550.3704, 2273.407, 602.7408]
2025-08-07 10:08:49,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [219.0, 464.0, 1000.0, 225.0, 242.0, 299.0, 232.0, 220.0, 1000.0, 237.0]
2025-08-07 10:08:49,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 47 seconds)
2025-08-07 10:10:34,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:10:40,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 1091.09216 ± 601.718
2025-08-07 10:10:40,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [600.4528, 734.7959, 660.2146, 792.6818, 1564.2306, 1120.6166, 2517.013, 698.724, 1642.8553, 579.3373]
2025-08-07 10:10:40,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [233.0, 304.0, 271.0, 324.0, 582.0, 437.0, 1000.0, 306.0, 659.0, 236.0]
2025-08-07 10:10:40,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1251 [DEBUG]: Training session finished
