2025-08-07 07:17:45,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc15-walker2d/ExtremeClogL1U23-bpql-mem24
2025-08-07 07:17:45,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc15-walker2d/ExtremeClogL1U23-bpql-mem24
2025-08-07 07:17:45,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x15304e4e7b50>}
2025-08-07 07:17:45,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1111 [DEBUG]: using device: cuda
2025-08-07 07:17:45,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1133 [INFO]: Creating new trainer
2025-08-07 07:17:45,333 baseline-bpql-noiseperc15-walker2d:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=161, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 07:17:45,333 baseline-bpql-noiseperc15-walker2d:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 07:17:46,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1194 [DEBUG]: Starting training session...
2025-08-07 07:17:46,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 1/100
2025-08-07 07:19:19,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:19:19,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 11.66092 ± 5.532
2025-08-07 07:19:19,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [9.074657, 12.800985, 18.376461, 9.258364, 22.718456, 4.9415197, 12.043841, 7.392727, 4.7173758, 15.284801]
2025-08-07 07:19:19,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [20.0, 38.0, 39.0, 35.0, 41.0, 15.0, 22.0, 21.0, 17.0, 39.0]
2025-08-07 07:19:19,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1226 [INFO]: New best (11.66) for latency ExtremeClogL1U23
2025-08-07 07:19:19,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 34 minutes, 36 seconds)
2025-08-07 07:21:03,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:21:05,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 104.58315 ± 73.683
2025-08-07 07:21:05,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [35.596558, 212.8401, 10.046221, 56.549915, 203.62651, 166.95447, 23.96782, 62.303085, 101.725426, 172.2213]
2025-08-07 07:21:05,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [164.0, 162.0, 23.0, 62.0, 171.0, 140.0, 168.0, 86.0, 174.0, 138.0]
2025-08-07 07:21:05,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1226 [INFO]: New best (104.58) for latency ExtremeClogL1U23
2025-08-07 07:21:05,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 42 minutes, 41 seconds)
2025-08-07 07:22:49,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:22:50,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 28.43931 ± 54.932
2025-08-07 07:22:50,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [11.103126, 4.9930873, -33.15098, 151.38518, 11.47328, 2.7587123, 3.5039828, 8.029122, 117.63762, 6.659968]
2025-08-07 07:22:50,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [34.0, 118.0, 109.0, 172.0, 24.0, 24.0, 15.0, 18.0, 196.0, 16.0]
2025-08-07 07:22:50,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 43 minutes, 53 seconds)
2025-08-07 07:24:31,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:24:33,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 29.72232 ± 48.596
2025-08-07 07:24:33,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [36.676697, 4.776513, 143.1753, 5.949538, 89.76282, 3.1026225, 7.0664854, 19.58402, -37.773167, 24.90233]
2025-08-07 07:24:33,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [105.0, 18.0, 167.0, 19.0, 134.0, 16.0, 16.0, 47.0, 151.0, 155.0]
2025-08-07 07:24:33,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 42 minutes, 46 seconds)
2025-08-07 07:26:16,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:26:17,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 25.28687 ± 41.797
2025-08-07 07:26:17,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [28.632574, -14.463382, 5.2836237, -2.1621644, 5.5998034, 56.21068, 24.603376, 7.6499777, 137.59465, 3.9195802]
2025-08-07 07:26:17,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [64.0, 129.0, 21.0, 123.0, 19.0, 168.0, 87.0, 19.0, 147.0, 17.0]
2025-08-07 07:26:17,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 42 minutes)
2025-08-07 07:28:01,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:28:01,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 14.55714 ± 10.626
2025-08-07 07:28:01,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [6.7943225, 34.675728, 31.391325, 18.757034, 6.0458198, 5.003459, 20.979353, 6.897746, 7.2028646, 7.8237767]
2025-08-07 07:28:01,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [17.0, 124.0, 46.0, 50.0, 18.0, 16.0, 106.0, 20.0, 18.0, 18.0]
2025-08-07 07:28:01,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 43 minutes, 30 seconds)
2025-08-07 07:29:42,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:29:43,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 19.56528 ± 23.028
2025-08-07 07:29:43,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [12.61964, 2.1969514, 15.599148, 32.21185, 75.58896, 18.797901, 38.290962, -9.56516, 2.4913394, 7.421273]
2025-08-07 07:29:43,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [25.0, 16.0, 45.0, 120.0, 122.0, 97.0, 69.0, 145.0, 18.0, 18.0]
2025-08-07 07:29:43,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 40 minutes, 39 seconds)
2025-08-07 07:31:27,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:31:28,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 27.66291 ± 21.358
2025-08-07 07:31:28,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [52.456688, 68.0423, 49.04229, 3.8975186, 4.9127183, 14.216641, 9.078242, 27.357265, 13.298079, 34.32732]
2025-08-07 07:31:28,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [105.0, 87.0, 61.0, 17.0, 22.0, 38.0, 19.0, 87.0, 23.0, 106.0]
2025-08-07 07:31:28,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 38 minutes, 54 seconds)
2025-08-07 07:33:09,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:33:10,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 25.21118 ± 21.551
2025-08-07 07:33:10,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [50.603333, 17.392654, 15.116776, 74.88037, 8.33846, 23.704884, 2.8762386, 1.7570887, 30.846455, 26.595552]
2025-08-07 07:33:10,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [66.0, 42.0, 111.0, 75.0, 21.0, 78.0, 15.0, 15.0, 43.0, 71.0]
2025-08-07 07:33:10,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 36 minutes, 52 seconds)
2025-08-07 07:34:53,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:34:54,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 25.32351 ± 27.975
2025-08-07 07:34:54,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [88.09475, 3.466814, 8.935509, 63.891964, 18.298023, 8.409725, 6.1680565, 41.90444, 5.37853, 8.687279]
2025-08-07 07:34:54,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [82.0, 20.0, 18.0, 95.0, 40.0, 21.0, 19.0, 61.0, 16.0, 21.0]
2025-08-07 07:34:54,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 34 minutes, 59 seconds)
2025-08-07 07:36:39,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:36:40,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 91.88991 ± 111.979
2025-08-07 07:36:40,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [4.967847, 1.850729, 309.91977, 6.376507, 57.81657, 200.42302, 16.037031, -1.0649872, 255.92863, 66.64396]
2025-08-07 07:36:40,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [24.0, 16.0, 236.0, 17.0, 66.0, 123.0, 36.0, 16.0, 202.0, 108.0]
2025-08-07 07:36:40,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 33 minutes, 48 seconds)
2025-08-07 07:38:22,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:38:24,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 83.26789 ± 76.593
2025-08-07 07:38:24,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [48.939697, 75.73376, 9.038285, 38.799255, 2.6749098, 210.76431, 36.98744, 163.0928, 211.26779, 35.38068]
2025-08-07 07:38:24,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [61.0, 131.0, 23.0, 75.0, 16.0, 152.0, 66.0, 116.0, 140.0, 46.0]
2025-08-07 07:38:24,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 32 minutes, 40 seconds)
2025-08-07 07:40:06,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:40:07,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 84.97070 ± 91.229
2025-08-07 07:40:07,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [12.773028, 65.47102, 7.7315626, 272.3, 6.9792857, 79.48056, 247.5093, 44.278095, 36.87204, 76.3121]
2025-08-07 07:40:07,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [24.0, 79.0, 21.0, 155.0, 22.0, 102.0, 127.0, 70.0, 94.0, 104.0]
2025-08-07 07:40:07,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 30 minutes, 29 seconds)
2025-08-07 07:41:49,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:41:50,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 117.61623 ± 99.338
2025-08-07 07:41:50,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [208.92494, 9.767232, 161.98332, 228.56325, -0.06299226, 192.03839, 55.10408, 2.9561195, 47.406055, 269.48178]
2025-08-07 07:41:50,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [118.0, 24.0, 189.0, 122.0, 22.0, 104.0, 114.0, 14.0, 60.0, 232.0]
2025-08-07 07:41:50,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1226 [INFO]: New best (117.62) for latency ExtremeClogL1U23
2025-08-07 07:41:50,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 29 minutes, 15 seconds)
2025-08-07 07:43:34,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:43:35,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 54.57629 ± 49.868
2025-08-07 07:43:35,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [148.02794, 5.8233438, 17.59033, 48.899014, 49.784504, 29.533127, 82.84994, 139.5049, 6.954297, 16.795528]
2025-08-07 07:43:35,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [118.0, 19.0, 39.0, 94.0, 87.0, 47.0, 81.0, 128.0, 23.0, 37.0]
2025-08-07 07:43:35,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 27 minutes, 36 seconds)
2025-08-07 07:45:19,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:45:20,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 97.92031 ± 117.012
2025-08-07 07:45:20,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [6.5076365, 2.8722668, 9.536484, 154.64862, 7.7754645, 197.67381, 280.19302, 10.481141, 7.974613, 301.54007]
2025-08-07 07:45:20,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [17.0, 16.0, 21.0, 81.0, 19.0, 108.0, 147.0, 23.0, 22.0, 229.0]
2025-08-07 07:45:20,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 25 minutes, 37 seconds)
2025-08-07 07:47:05,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:47:06,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 64.33690 ± 79.539
2025-08-07 07:47:06,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [86.41892, 61.054226, 13.611816, 2.0649712, 6.0797625, 42.148228, 267.56473, 17.128819, 7.5979114, 139.69966]
2025-08-07 07:47:06,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [86.0, 187.0, 24.0, 23.0, 18.0, 122.0, 157.0, 36.0, 17.0, 79.0]
2025-08-07 07:47:06,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 24 minutes, 31 seconds)
2025-08-07 07:48:49,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:48:50,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 109.00053 ± 78.938
2025-08-07 07:48:50,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [290.08823, 119.84349, 12.22139, 177.36241, 78.583, 58.01094, 117.8407, 96.18545, 135.29056, 4.579216]
2025-08-07 07:48:50,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [167.0, 87.0, 25.0, 103.0, 132.0, 140.0, 90.0, 99.0, 86.0, 22.0]
2025-08-07 07:48:50,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 23 minutes, 3 seconds)
2025-08-07 07:50:34,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:50:35,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 28.28929 ± 43.532
2025-08-07 07:50:35,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [82.49603, 139.93945, 6.5281544, 3.5166023, 10.003135, 5.1716976, 9.629096, 15.456366, 5.012937, 5.139412]
2025-08-07 07:50:35,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [196.0, 106.0, 17.0, 21.0, 20.0, 23.0, 23.0, 24.0, 20.0, 18.0]
2025-08-07 07:50:35,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 21 minutes, 34 seconds)
2025-08-07 07:52:21,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:52:22,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 80.33900 ± 61.507
2025-08-07 07:52:22,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [4.772796, 8.911186, 183.09108, 6.676271, 86.30044, 167.05136, 103.020775, 96.11153, 109.990005, 37.46467]
2025-08-07 07:52:22,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [21.0, 19.0, 130.0, 22.0, 101.0, 91.0, 93.0, 86.0, 93.0, 93.0]
2025-08-07 07:52:22,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 20 minutes, 28 seconds)
2025-08-07 07:54:05,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:54:06,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 133.89670 ± 110.927
2025-08-07 07:54:06,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [245.83414, 5.2823153, 4.212794, 232.6578, 271.24832, 133.1325, 9.996508, 257.77905, 6.099457, 172.7241]
2025-08-07 07:54:06,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [134.0, 20.0, 13.0, 165.0, 151.0, 92.0, 19.0, 179.0, 24.0, 129.0]
2025-08-07 07:54:06,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1226 [INFO]: New best (133.90) for latency ExtremeClogL1U23
2025-08-07 07:54:06,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 18 minutes, 39 seconds)
2025-08-07 07:55:49,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:55:49,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 52.29409 ± 92.852
2025-08-07 07:55:49,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [8.132944, 4.3791175, 9.01248, 26.862913, 9.228205, 6.8860984, 4.287876, 9.760017, 304.9164, 139.47484]
2025-08-07 07:55:49,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [20.0, 16.0, 20.0, 56.0, 19.0, 20.0, 24.0, 22.0, 144.0, 81.0]
2025-08-07 07:55:49,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 16 minutes, 2 seconds)
2025-08-07 07:57:31,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:57:32,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 125.07727 ± 123.031
2025-08-07 07:57:32,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [9.879812, 7.68255, 13.655882, 335.97647, 233.72208, 170.33835, 6.707328, 5.994422, 264.2836, 202.53227]
2025-08-07 07:57:32,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [20.0, 20.0, 24.0, 161.0, 111.0, 102.0, 19.0, 21.0, 154.0, 103.0]
2025-08-07 07:57:32,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 14 minutes)
2025-08-07 07:59:16,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:59:18,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 174.40686 ± 84.181
2025-08-07 07:59:18,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [9.371837, 197.53145, 138.84363, 153.7215, 128.1308, 143.09659, 277.16858, 213.3762, 334.2052, 148.62277]
2025-08-07 07:59:18,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [19.0, 104.0, 146.0, 115.0, 121.0, 88.0, 147.0, 115.0, 157.0, 137.0]
2025-08-07 07:59:18,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1226 [INFO]: New best (174.41) for latency ExtremeClogL1U23
2025-08-07 07:59:18,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 12 minutes, 32 seconds)
2025-08-07 08:01:01,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:01:02,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 98.10708 ± 80.853
2025-08-07 08:01:02,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [4.7397556, 66.22595, 157.43994, 270.27603, 5.6700754, 137.53725, 10.084804, 68.10003, 101.95713, 159.03983]
2025-08-07 08:01:02,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [24.0, 119.0, 78.0, 154.0, 17.0, 126.0, 23.0, 60.0, 118.0, 111.0]
2025-08-07 08:01:02,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 10 minutes, 6 seconds)
2025-08-07 08:02:47,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:02:48,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 131.95702 ± 128.826
2025-08-07 08:02:48,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [261.62125, 10.034694, 62.653645, 9.592761, 43.4092, 335.61728, 43.501183, 262.11435, 287.03223, 3.9935594]
2025-08-07 08:02:48,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [131.0, 24.0, 102.0, 22.0, 104.0, 197.0, 113.0, 113.0, 143.0, 20.0]
2025-08-07 08:02:48,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 8 minutes, 48 seconds)
2025-08-07 08:04:32,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:04:33,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 50.66921 ± 61.414
2025-08-07 08:04:33,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [8.541942, 3.0397482, 126.30235, 10.093944, 32.894665, 164.21034, 137.90735, 8.464038, 7.0828986, 8.154881]
2025-08-07 08:04:33,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [21.0, 15.0, 147.0, 37.0, 46.0, 112.0, 125.0, 18.0, 20.0, 17.0]
2025-08-07 08:04:33,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 7 minutes, 25 seconds)
2025-08-07 08:06:18,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:06:19,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 129.02931 ± 142.809
2025-08-07 08:06:19,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [431.88443, 79.60537, 10.895443, 4.189785, 242.20236, 3.362955, 33.341408, 283.75018, 192.50824, 8.552845]
2025-08-07 08:06:19,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [231.0, 143.0, 23.0, 20.0, 132.0, 17.0, 114.0, 148.0, 123.0, 21.0]
2025-08-07 08:06:19,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 6 minutes, 20 seconds)
2025-08-07 08:08:01,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:08:02,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 121.63091 ± 122.425
2025-08-07 08:08:02,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [2.915663, 194.60567, 10.001081, 8.34883, 7.2049346, 266.74374, 235.7467, 328.08365, 159.17809, 3.4807124]
2025-08-07 08:08:02,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [20.0, 175.0, 22.0, 21.0, 20.0, 139.0, 169.0, 155.0, 80.0, 16.0]
2025-08-07 08:08:02,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 4 minutes, 7 seconds)
2025-08-07 08:09:46,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:09:47,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 172.37708 ± 151.787
2025-08-07 08:09:47,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [392.6721, 2.9537897, 217.28748, 4.918588, 150.47435, 335.587, 326.1035, 290.65408, 0.0633329, 3.0565312]
2025-08-07 08:09:47,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [205.0, 13.0, 103.0, 22.0, 96.0, 159.0, 160.0, 154.0, 15.0, 17.0]
2025-08-07 08:09:47,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 2 minutes, 32 seconds)
2025-08-07 08:11:29,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:11:30,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 112.70679 ± 114.497
2025-08-07 08:11:30,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [8.676835, 5.0180407, 7.449433, 221.15143, 240.11285, 107.74758, 6.0235705, 261.90802, 2.2235422, 266.75662]
2025-08-07 08:11:30,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [35.0, 18.0, 22.0, 118.0, 114.0, 92.0, 18.0, 143.0, 12.0, 149.0]
2025-08-07 08:11:30,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 59 minutes, 55 seconds)
2025-08-07 08:13:14,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:13:15,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 94.65507 ± 110.684
2025-08-07 08:13:15,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [280.21625, 175.95412, 251.01668, 4.3758492, 9.772056, 11.94549, 6.8233156, 2.1663668, 6.117919, 198.16267]
2025-08-07 08:13:15,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [147.0, 96.0, 158.0, 21.0, 23.0, 25.0, 22.0, 25.0, 21.0, 113.0]
2025-08-07 08:13:15,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 58 minutes, 22 seconds)
2025-08-07 08:14:58,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:14:59,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 123.92540 ± 181.267
2025-08-07 08:14:59,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [4.1034102, 389.91525, 13.988226, 474.2494, 3.6564121, 323.65405, 6.9174776, 9.124106, 5.331494, 8.31407]
2025-08-07 08:14:59,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [18.0, 245.0, 25.0, 253.0, 18.0, 170.0, 23.0, 24.0, 17.0, 21.0]
2025-08-07 08:14:59,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 56 minutes, 9 seconds)
2025-08-07 08:16:42,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:16:43,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 146.41063 ± 133.467
2025-08-07 08:16:43,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [78.9029, 2.2344596, 143.91374, 309.85272, 2.567417, 22.18271, 320.92056, 307.7076, 8.816456, 267.00784]
2025-08-07 08:16:43,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [107.0, 23.0, 96.0, 151.0, 20.0, 42.0, 160.0, 223.0, 20.0, 132.0]
2025-08-07 08:16:43,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 54 minutes, 32 seconds)
2025-08-07 08:18:27,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:18:28,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 127.67328 ± 132.196
2025-08-07 08:18:28,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3.6782331, 12.942291, 148.71072, 332.77396, 186.63683, 2.9405417, 290.91388, 291.9158, 6.372761, -0.15232189]
2025-08-07 08:18:28,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [17.0, 25.0, 137.0, 247.0, 208.0, 18.0, 162.0, 162.0, 20.0, 19.0]
2025-08-07 08:18:28,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 52 minutes, 54 seconds)
2025-08-07 08:20:14,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:20:15,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 117.92920 ± 171.399
2025-08-07 08:20:15,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [6.3861504, 10.772887, 5.6675663, 7.52431, 318.24988, 405.9689, 8.393764, 4.379667, 407.92816, 4.0207896]
2025-08-07 08:20:15,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [18.0, 21.0, 17.0, 20.0, 156.0, 247.0, 25.0, 19.0, 214.0, 14.0]
2025-08-07 08:20:15,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 52 minutes, 6 seconds)
2025-08-07 08:21:55,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:21:56,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 71.00630 ± 98.630
2025-08-07 08:21:56,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [5.0485153, 8.583639, 207.56451, 224.12054, 7.423913, 7.9944587, 5.4612184, 5.6900506, 232.53456, 5.641627]
2025-08-07 08:21:56,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [22.0, 21.0, 131.0, 122.0, 21.0, 19.0, 22.0, 18.0, 119.0, 20.0]
2025-08-07 08:21:56,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 49 minutes, 24 seconds)
2025-08-07 08:23:41,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:23:43,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 242.03256 ± 192.953
2025-08-07 08:23:43,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [385.50262, 5.3750396, 248.10211, 568.2918, 1.3088784, 190.76305, 7.434631, 489.0254, 185.44508, 339.07697]
2025-08-07 08:23:43,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [171.0, 20.0, 131.0, 283.0, 23.0, 106.0, 24.0, 261.0, 121.0, 196.0]
2025-08-07 08:23:43,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1226 [INFO]: New best (242.03) for latency ExtremeClogL1U23
2025-08-07 08:23:43,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 48 minutes, 15 seconds)
2025-08-07 08:25:29,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:25:30,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 105.93197 ± 144.276
2025-08-07 08:25:30,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [10.364173, 181.89642, 5.701783, -0.40616962, 105.07869, 4.2160196, 340.42383, 5.4072394, 6.479803, 400.158]
2025-08-07 08:25:30,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [24.0, 106.0, 19.0, 11.0, 145.0, 17.0, 167.0, 17.0, 18.0, 208.0]
2025-08-07 08:25:30,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 47 minutes, 12 seconds)
2025-08-07 08:27:14,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:27:15,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 95.43214 ± 143.767
2025-08-07 08:27:15,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [7.9845366, 51.483063, 94.01653, 21.99446, 9.725195, 486.60855, 56.02549, 8.791207, 213.56622, 4.12619]
2025-08-07 08:27:15,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [22.0, 62.0, 101.0, 48.0, 20.0, 302.0, 88.0, 18.0, 170.0, 18.0]
2025-08-07 08:27:15,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 45 minutes, 17 seconds)
2025-08-07 08:28:57,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:28:59,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 160.88757 ± 131.581
2025-08-07 08:28:59,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [6.0876064, 193.9228, 12.40437, 1.4276003, 6.288826, 291.73648, 313.59067, 274.71118, 308.38043, 200.32568]
2025-08-07 08:28:59,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [17.0, 108.0, 25.0, 19.0, 17.0, 179.0, 179.0, 139.0, 144.0, 110.0]
2025-08-07 08:28:59,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 42 minutes, 56 seconds)
2025-08-07 08:30:43,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:30:44,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 117.16746 ± 123.784
2025-08-07 08:30:44,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [190.90344, 0.7880617, 2.316407, 190.50554, 2.9612281, 217.01244, 365.65408, 2.0836453, 192.09407, 7.355584]
2025-08-07 08:30:44,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [115.0, 14.0, 13.0, 114.0, 17.0, 154.0, 152.0, 14.0, 116.0, 20.0]
2025-08-07 08:30:44,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 42 minutes, 1 second)
2025-08-07 08:32:27,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:32:28,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 160.00638 ± 141.697
2025-08-07 08:32:28,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [210.16475, 193.04391, 256.53415, 10.545447, 205.67265, 456.37497, 7.9906707, 6.957264, 10.623879, 242.156]
2025-08-07 08:32:28,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [114.0, 118.0, 128.0, 22.0, 117.0, 254.0, 23.0, 18.0, 24.0, 170.0]
2025-08-07 08:32:28,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 39 minutes, 46 seconds)
2025-08-07 08:34:13,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:34:15,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 181.62033 ± 145.668
2025-08-07 08:34:15,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [-2.546644, 188.66779, 93.30143, 293.04813, 8.87285, 249.50307, 191.3612, 2.9290059, 392.81805, 398.24844]
2025-08-07 08:34:15,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [23.0, 101.0, 105.0, 178.0, 21.0, 148.0, 115.0, 23.0, 216.0, 260.0]
2025-08-07 08:34:15,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 37 minutes, 54 seconds)
2025-08-07 08:35:59,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:36:00,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 45.75243 ± 78.856
2025-08-07 08:36:00,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [4.5986867, 8.548368, 5.891767, 14.632203, 264.52228, 105.343506, 10.20277, 2.5990264, 39.881042, 1.3047084]
2025-08-07 08:36:00,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [18.0, 24.0, 22.0, 23.0, 155.0, 130.0, 19.0, 16.0, 53.0, 18.0]
2025-08-07 08:36:00,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 36 minutes, 11 seconds)
2025-08-07 08:37:45,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:37:47,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 235.30183 ± 222.905
2025-08-07 08:37:47,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [293.66824, 9.750565, 6.1351514, 299.17142, 270.41907, 292.94034, 784.7122, 10.169833, 80.49996, 305.55142]
2025-08-07 08:37:47,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [149.0, 23.0, 16.0, 154.0, 138.0, 134.0, 434.0, 22.0, 85.0, 156.0]
2025-08-07 08:37:47,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 35 minutes, 5 seconds)
2025-08-07 08:39:31,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:39:32,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 169.01089 ± 179.096
2025-08-07 08:39:32,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [4.8414383, 1.9415531, 467.29025, 8.987369, 363.62262, 53.69086, 8.443917, 402.95392, 286.16418, 92.172676]
2025-08-07 08:39:32,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [18.0, 16.0, 327.0, 23.0, 214.0, 69.0, 19.0, 203.0, 146.0, 111.0]
2025-08-07 08:39:32,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 33 minutes, 22 seconds)
2025-08-07 08:41:16,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:41:17,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 208.88017 ± 152.622
2025-08-07 08:41:17,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [9.099223, 214.26797, 155.04475, 400.03348, 308.46436, 1.91622, 4.281304, 264.74094, 423.68903, 307.2643]
2025-08-07 08:41:17,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [21.0, 101.0, 129.0, 210.0, 148.0, 16.0, 14.0, 146.0, 219.0, 157.0]
2025-08-07 08:41:17,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 31 minutes, 47 seconds)
2025-08-07 08:42:59,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:43:01,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 175.67593 ± 150.129
2025-08-07 08:43:01,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [335.30276, 151.00395, 239.49863, 487.68356, 43.852406, 7.772091, 51.9334, 270.5071, 9.061554, 160.14395]
2025-08-07 08:43:01,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [212.0, 113.0, 143.0, 324.0, 53.0, 20.0, 109.0, 194.0, 21.0, 170.0]
2025-08-07 08:43:01,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 29 minutes, 25 seconds)
2025-08-07 08:44:44,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:44:46,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 261.05267 ± 98.450
2025-08-07 08:44:46,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [270.79297, 276.15298, 300.13773, 252.71129, 183.323, 317.832, 296.69424, 5.712196, 320.84192, 386.3286]
2025-08-07 08:44:46,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [144.0, 137.0, 162.0, 155.0, 94.0, 196.0, 144.0, 21.0, 232.0, 179.0]
2025-08-07 08:44:46,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1226 [INFO]: New best (261.05) for latency ExtremeClogL1U23
2025-08-07 08:44:46,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 27 minutes, 41 seconds)
2025-08-07 08:46:30,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:46:32,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 228.24805 ± 198.554
2025-08-07 08:46:32,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [290.27722, 0.20512359, 553.1918, 550.2768, 84.34223, 300.1732, 212.40878, 2.3433528, 6.8891816, 282.37296]
2025-08-07 08:46:32,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [151.0, 16.0, 233.0, 340.0, 108.0, 151.0, 141.0, 17.0, 20.0, 132.0]
2025-08-07 08:46:32,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 25 minutes, 45 seconds)
2025-08-07 08:48:15,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:48:17,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 131.73337 ± 138.828
2025-08-07 08:48:17,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [12.168045, 295.7429, 98.82829, 7.106167, 6.4272027, 4.7708845, 84.23761, 438.3993, 208.94537, 160.70789]
2025-08-07 08:48:17,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [24.0, 157.0, 113.0, 21.0, 21.0, 19.0, 117.0, 279.0, 119.0, 106.0]
2025-08-07 08:48:17,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 23 minutes, 53 seconds)
2025-08-07 08:50:00,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:50:01,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 112.48093 ± 129.561
2025-08-07 08:50:01,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [2.8092983, 299.46005, 13.851932, 2.608821, 261.66318, 10.273373, 8.25441, 5.802012, 243.06927, 277.01694]
2025-08-07 08:50:01,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [13.0, 155.0, 23.0, 21.0, 145.0, 22.0, 18.0, 18.0, 129.0, 157.0]
2025-08-07 08:50:01,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 22 minutes, 2 seconds)
2025-08-07 08:51:44,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:51:45,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 165.98125 ± 140.046
2025-08-07 08:51:45,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [332.85342, 203.12357, 285.97842, 197.92433, 220.45222, 386.78442, 18.225777, 2.5165968, 4.234202, 7.719526]
2025-08-07 08:51:45,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [180.0, 120.0, 161.0, 107.0, 118.0, 206.0, 26.0, 22.0, 16.0, 17.0]
2025-08-07 08:51:45,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 20 minutes, 22 seconds)
2025-08-07 08:53:29,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:53:31,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 253.66130 ± 178.131
2025-08-07 08:53:31,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [225.07288, 237.1482, 363.41055, 3.7203228, 3.6978097, 663.6935, 193.83109, 305.9296, 223.98076, 316.1284]
2025-08-07 08:53:31,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [122.0, 173.0, 172.0, 23.0, 21.0, 398.0, 107.0, 121.0, 125.0, 179.0]
2025-08-07 08:53:31,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 18 minutes, 44 seconds)
2025-08-07 08:55:16,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:55:17,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 120.67757 ± 114.141
2025-08-07 08:55:17,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [239.4166, 10.400618, 202.96495, 103.78823, 212.46408, 3.0808053, 2.4601963, 5.7066617, 89.999825, 336.4937]
2025-08-07 08:55:17,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [150.0, 21.0, 163.0, 109.0, 117.0, 13.0, 21.0, 17.0, 143.0, 205.0]
2025-08-07 08:55:17,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 16 minutes, 59 seconds)
2025-08-07 08:57:01,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:57:03,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 196.48424 ± 164.588
2025-08-07 08:57:03,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [345.68225, 7.659139, 359.46173, 250.84422, 253.64105, 441.62018, 292.90366, 8.444709, 5.898767, -1.313411]
2025-08-07 08:57:03,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [203.0, 19.0, 236.0, 121.0, 137.0, 240.0, 149.0, 18.0, 18.0, 20.0]
2025-08-07 08:57:03,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 15 minutes, 24 seconds)
2025-08-07 08:58:48,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:58:50,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 207.55000 ± 132.770
2025-08-07 08:58:50,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [311.27103, 321.7603, 6.3238673, 140.52902, 89.60395, 414.29288, 254.5849, 9.127118, 226.04639, 301.96048]
2025-08-07 08:58:50,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [152.0, 185.0, 20.0, 134.0, 102.0, 238.0, 133.0, 21.0, 133.0, 148.0]
2025-08-07 08:58:50,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 13 minutes, 59 seconds)
2025-08-07 09:00:37,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:00:38,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 238.72774 ± 162.491
2025-08-07 09:00:38,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [281.3293, 9.380852, 178.4409, 277.9552, 364.2493, 379.5211, 112.844536, 553.8682, 222.10791, 7.5800886]
2025-08-07 09:00:38,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [127.0, 21.0, 99.0, 137.0, 154.0, 194.0, 118.0, 307.0, 142.0, 22.0]
2025-08-07 09:00:38,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 12 minutes, 54 seconds)
2025-08-07 09:02:23,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:02:25,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 169.66675 ± 168.175
2025-08-07 09:02:25,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [295.09927, 294.62146, 6.0421543, 1.2204788, 429.50116, 5.8039036, 8.544553, 311.87665, 4.983887, 338.974]
2025-08-07 09:02:25,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [145.0, 169.0, 19.0, 25.0, 221.0, 20.0, 23.0, 167.0, 19.0, 163.0]
2025-08-07 09:02:25,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 11 minutes, 13 seconds)
2025-08-07 09:04:11,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:04:12,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 181.69597 ± 152.085
2025-08-07 09:04:12,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [249.7329, 319.40094, 7.949993, 451.19034, 11.445059, 268.87152, 4.460112, 12.345187, 244.80304, 246.76074]
2025-08-07 09:04:12,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [168.0, 168.0, 20.0, 211.0, 22.0, 130.0, 18.0, 24.0, 149.0, 122.0]
2025-08-07 09:04:12,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 9 minutes, 34 seconds)
2025-08-07 09:05:58,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:05:59,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 191.75723 ± 165.500
2025-08-07 09:05:59,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [226.28746, 326.8033, 348.98975, 218.18909, 8.247189, 459.75354, 0.4171014, 320.50192, 6.021411, 2.3616567]
2025-08-07 09:05:59,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [134.0, 173.0, 171.0, 114.0, 20.0, 225.0, 19.0, 163.0, 21.0, 13.0]
2025-08-07 09:05:59,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 7 minutes, 58 seconds)
2025-08-07 09:07:41,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:07:42,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 82.92561 ± 149.087
2025-08-07 09:07:42,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [7.919925, 4.8353133, 8.099518, 11.743074, 4.3330474, 452.6899, 8.923795, 34.584488, 5.446083, 290.68097]
2025-08-07 09:07:42,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [22.0, 17.0, 23.0, 24.0, 21.0, 213.0, 19.0, 49.0, 18.0, 144.0]
2025-08-07 09:07:42,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 5 minutes, 39 seconds)
2025-08-07 09:09:26,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:09:28,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 277.47546 ± 173.582
2025-08-07 09:09:28,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [6.2968755, 371.93604, 282.8507, 687.0607, 201.86482, 255.65543, 157.8414, 405.31857, 163.56331, 242.36697]
2025-08-07 09:09:28,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [20.0, 183.0, 166.0, 403.0, 175.0, 146.0, 111.0, 194.0, 100.0, 116.0]
2025-08-07 09:09:28,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1226 [INFO]: New best (277.48) for latency ExtremeClogL1U23
2025-08-07 09:09:28,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 3 minutes, 36 seconds)
2025-08-07 09:11:14,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:11:17,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 298.61816 ± 125.486
2025-08-07 09:11:17,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [389.29742, 401.85223, 444.21347, 298.8332, 283.61847, 219.89226, 434.48715, 280.4409, 2.665983, 230.8806]
2025-08-07 09:11:17,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [192.0, 199.0, 277.0, 160.0, 160.0, 117.0, 204.0, 154.0, 19.0, 112.0]
2025-08-07 09:11:17,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1226 [INFO]: New best (298.62) for latency ExtremeClogL1U23
2025-08-07 09:11:17,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 2 minutes, 2 seconds)
2025-08-07 09:13:00,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:13:01,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 170.92896 ± 179.681
2025-08-07 09:13:01,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [8.872928, 6.974982, 363.17255, 498.50223, 3.9851475, 368.06763, 219.09789, 10.397744, 225.54414, 4.674274]
2025-08-07 09:13:01,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [21.0, 18.0, 174.0, 324.0, 15.0, 167.0, 120.0, 21.0, 115.0, 16.0]
2025-08-07 09:13:01,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 67/100 (estimated time remaining: 59 minutes, 58 seconds)
2025-08-07 09:14:46,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:14:48,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 208.82747 ± 164.967
2025-08-07 09:14:48,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [197.47119, 142.3088, 11.944293, 5.5394335, 468.6421, 347.94818, 170.68141, 3.6854804, 319.37912, 420.67447]
2025-08-07 09:14:48,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [161.0, 91.0, 22.0, 17.0, 223.0, 205.0, 94.0, 23.0, 159.0, 201.0]
2025-08-07 09:14:48,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 68/100 (estimated time remaining: 58 minutes, 10 seconds)
2025-08-07 09:16:32,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:16:33,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 83.21840 ± 126.870
2025-08-07 09:16:33,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [9.367184, 375.38098, -0.5640292, 2.7655277, 220.0118, 8.039599, 6.787359, 201.5696, 7.5782285, 1.2477342]
2025-08-07 09:16:33,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [25.0, 201.0, 25.0, 18.0, 120.0, 24.0, 34.0, 117.0, 18.0, 16.0]
2025-08-07 09:16:33,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 69/100 (estimated time remaining: 56 minutes, 37 seconds)
2025-08-07 09:18:19,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:18:20,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 139.07375 ± 153.648
2025-08-07 09:18:20,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [157.99728, 4.676067, 9.484881, 7.5391755, 351.73618, 180.1526, 3.2878017, 439.88123, 233.82796, 2.1541564]
2025-08-07 09:18:20,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [129.0, 17.0, 21.0, 22.0, 148.0, 94.0, 18.0, 248.0, 119.0, 20.0]
2025-08-07 09:18:20,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 70/100 (estimated time remaining: 54 minutes, 53 seconds)
2025-08-07 09:20:03,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:20:05,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 181.50871 ± 143.058
2025-08-07 09:20:05,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [339.05276, 191.27248, 3.7574492, 247.81398, 7.5810065, 291.97562, 437.22363, 175.92172, 3.3966665, 117.09198]
2025-08-07 09:20:05,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [184.0, 112.0, 21.0, 137.0, 17.0, 145.0, 242.0, 89.0, 14.0, 68.0]
2025-08-07 09:20:05,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 71/100 (estimated time remaining: 52 minutes, 48 seconds)
2025-08-07 09:21:48,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:21:50,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 227.85115 ± 238.930
2025-08-07 09:21:50,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [13.980102, 206.11963, 9.492753, 718.68317, 317.94437, 7.9252253, 508.56717, 3.6054766, 403.93695, 88.25657]
2025-08-07 09:21:50,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [25.0, 212.0, 23.0, 404.0, 162.0, 22.0, 226.0, 14.0, 216.0, 75.0]
2025-08-07 09:21:50,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 72/100 (estimated time remaining: 51 minutes, 5 seconds)
2025-08-07 09:23:34,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:23:36,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 274.37650 ± 156.507
2025-08-07 09:23:36,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [267.0321, 432.23492, 492.42825, 294.30377, 280.0673, 6.9101586, 8.9059105, 346.0631, 201.6209, 414.1984]
2025-08-07 09:23:36,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [151.0, 188.0, 222.0, 163.0, 165.0, 16.0, 35.0, 165.0, 100.0, 217.0]
2025-08-07 09:23:36,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 73/100 (estimated time remaining: 49 minutes, 16 seconds)
2025-08-07 09:25:21,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:25:23,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 293.02051 ± 123.505
2025-08-07 09:25:23,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [10.306004, 198.62177, 287.7472, 444.98346, 208.54962, 358.93268, 364.9286, 269.7506, 352.4806, 433.90482]
2025-08-07 09:25:23,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [24.0, 106.0, 145.0, 193.0, 157.0, 176.0, 188.0, 144.0, 192.0, 216.0]
2025-08-07 09:25:23,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 74/100 (estimated time remaining: 47 minutes, 43 seconds)
2025-08-07 09:27:10,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:27:12,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 316.01178 ± 216.533
2025-08-07 09:27:12,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [697.46716, 277.60602, 374.84125, 263.82745, 297.99057, 8.991316, 645.66766, 383.5053, 202.38817, 7.8330874]
2025-08-07 09:27:12,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [353.0, 151.0, 165.0, 154.0, 137.0, 23.0, 336.0, 218.0, 142.0, 19.0]
2025-08-07 09:27:12,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1226 [INFO]: New best (316.01) for latency ExtremeClogL1U23
2025-08-07 09:27:12,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 75/100 (estimated time remaining: 46 minutes, 9 seconds)
2025-08-07 09:28:55,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:28:57,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 216.10970 ± 166.460
2025-08-07 09:28:57,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [261.64148, 316.25784, 384.4586, 358.7207, 425.07968, 64.86392, -3.547616, 6.7229156, 9.086908, 337.81256]
2025-08-07 09:28:57,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [147.0, 158.0, 177.0, 224.0, 218.0, 106.0, 21.0, 17.0, 23.0, 218.0]
2025-08-07 09:28:57,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 76/100 (estimated time remaining: 44 minutes, 21 seconds)
2025-08-07 09:30:42,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:30:44,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 312.91797 ± 191.837
2025-08-07 09:30:44,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [461.71948, 274.37686, 313.56152, 6.2354455, 179.53123, 8.339553, 550.5463, 370.20795, 375.9775, 588.6839]
2025-08-07 09:30:44,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [247.0, 131.0, 144.0, 17.0, 153.0, 21.0, 332.0, 185.0, 232.0, 306.0]
2025-08-07 09:30:44,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 77/100 (estimated time remaining: 42 minutes, 42 seconds)
2025-08-07 09:32:31,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:32:32,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 153.13089 ± 176.285
2025-08-07 09:32:32,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [296.7645, 2.753739, 457.10416, 103.66963, 205.42528, 6.8690896, 4.3067193, 1.4891411, 441.69293, 11.233587]
2025-08-07 09:32:32,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [173.0, 13.0, 239.0, 133.0, 91.0, 21.0, 15.0, 21.0, 187.0, 20.0]
2025-08-07 09:32:32,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 78/100 (estimated time remaining: 41 minutes, 7 seconds)
2025-08-07 09:34:14,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:34:16,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 185.74916 ± 156.603
2025-08-07 09:34:16,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [216.2117, 257.4948, 8.571305, 362.8538, 6.618837, 3.2546127, 338.62073, 242.50826, 6.7350774, 414.62244]
2025-08-07 09:34:16,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [108.0, 132.0, 20.0, 195.0, 24.0, 13.0, 177.0, 121.0, 24.0, 207.0]
2025-08-07 09:34:16,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 79/100 (estimated time remaining: 39 minutes, 4 seconds)
2025-08-07 09:35:59,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:36:00,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 110.18452 ± 171.918
2025-08-07 09:36:00,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [7.424851, 9.765658, 519.2937, 287.27145, 6.989911, 9.693757, 4.5401425, 6.947788, -3.8455825, 253.76361]
2025-08-07 09:36:00,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [18.0, 21.0, 234.0, 136.0, 19.0, 25.0, 15.0, 22.0, 23.0, 154.0]
2025-08-07 09:36:00,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 80/100 (estimated time remaining: 36 minutes, 55 seconds)
2025-08-07 09:37:47,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:37:48,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 126.60030 ± 160.306
2025-08-07 09:37:48,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [284.9064, 5.8754277, 9.838416, 7.046675, 363.83368, 8.385024, 141.99855, 8.413417, 7.728018, 427.97733]
2025-08-07 09:37:48,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [139.0, 15.0, 21.0, 19.0, 188.0, 24.0, 97.0, 21.0, 20.0, 193.0]
2025-08-07 09:37:48,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 81/100 (estimated time remaining: 35 minutes, 23 seconds)
2025-08-07 09:39:30,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:39:32,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 301.24670 ± 193.170
2025-08-07 09:39:32,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [407.08304, 343.06146, 718.998, 6.5951505, 376.02786, 239.54088, 292.9652, 4.214283, 333.59888, 290.38235]
2025-08-07 09:39:32,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [234.0, 140.0, 466.0, 21.0, 199.0, 126.0, 156.0, 23.0, 182.0, 159.0]
2025-08-07 09:39:32,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 82/100 (estimated time remaining: 33 minutes, 28 seconds)
2025-08-07 09:41:17,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:41:19,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 284.59256 ± 224.502
2025-08-07 09:41:19,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [255.18553, 278.56003, 439.2624, 6.2533746, 353.91202, 70.0023, 374.51978, 271.89362, 796.1712, 0.16544212]
2025-08-07 09:41:19,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [178.0, 156.0, 236.0, 22.0, 186.0, 116.0, 187.0, 152.0, 353.0, 18.0]
2025-08-07 09:41:19,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 83/100 (estimated time remaining: 31 minutes, 36 seconds)
2025-08-07 09:43:03,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:43:05,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 222.39877 ± 203.412
2025-08-07 09:43:05,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [7.0698304, 13.218163, 554.4489, 8.934812, 515.01874, 179.88971, 289.94406, 5.538182, 387.2228, 262.70267]
2025-08-07 09:43:05,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [19.0, 23.0, 262.0, 19.0, 203.0, 152.0, 142.0, 21.0, 165.0, 159.0]
2025-08-07 09:43:05,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 84/100 (estimated time remaining: 29 minutes, 57 seconds)
2025-08-07 09:44:46,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:44:47,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 152.60048 ± 169.641
2025-08-07 09:44:47,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [281.36154, 11.448944, 1.9039919, 10.984272, 4.73536, 447.1785, 98.95448, 405.76678, 258.40414, 5.266774]
2025-08-07 09:44:47,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [129.0, 25.0, 14.0, 23.0, 21.0, 238.0, 150.0, 215.0, 139.0, 21.0]
2025-08-07 09:44:47,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 85/100 (estimated time remaining: 28 minutes, 6 seconds)
2025-08-07 09:46:30,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:46:32,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 199.23265 ± 170.313
2025-08-07 09:46:32,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [6.7423973, 144.17514, 320.8559, 103.1788, 3.5438914, 3.9502482, 210.21326, 296.68988, 531.0159, 371.9611]
2025-08-07 09:46:32,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [21.0, 104.0, 199.0, 100.0, 20.0, 22.0, 110.0, 143.0, 264.0, 192.0]
2025-08-07 09:46:32,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 86/100 (estimated time remaining: 26 minutes, 11 seconds)
2025-08-07 09:48:18,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:48:19,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 103.93767 ± 152.764
2025-08-07 09:48:19,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [5.2548127, 2.256121, 5.8364887, 275.0237, 262.43497, 15.353269, 443.03204, 9.01468, 9.577335, 11.593287]
2025-08-07 09:48:19,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [20.0, 15.0, 21.0, 165.0, 141.0, 24.0, 199.0, 24.0, 22.0, 23.0]
2025-08-07 09:48:19,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 87/100 (estimated time remaining: 24 minutes, 33 seconds)
2025-08-07 09:50:02,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:50:03,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 76.50968 ± 107.343
2025-08-07 09:50:03,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [7.33744, 4.326148, 237.28966, 225.72263, 257.2297, 4.1251373, 3.8714354, 6.778149, 12.2076645, 6.208884]
2025-08-07 09:50:03,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [20.0, 19.0, 126.0, 122.0, 157.0, 16.0, 14.0, 19.0, 23.0, 22.0]
2025-08-07 09:50:03,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 88/100 (estimated time remaining: 22 minutes, 40 seconds)
2025-08-07 09:51:46,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:51:47,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 155.45674 ± 181.043
2025-08-07 09:51:47,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [0.29782873, 9.275253, 10.751692, 388.42584, 15.819053, 396.6655, 331.6783, 388.13724, 9.215535, 4.301176]
2025-08-07 09:51:47,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [25.0, 18.0, 20.0, 217.0, 25.0, 219.0, 156.0, 157.0, 19.0, 21.0]
2025-08-07 09:51:47,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 89/100 (estimated time remaining: 20 minutes, 53 seconds)
2025-08-07 09:53:32,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:53:34,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 251.76721 ± 252.277
2025-08-07 09:53:34,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [661.7011, 3.617702, 8.829438, 447.91998, 11.002149, 10.25962, 437.87012, 7.4878736, 400.4005, 528.5837]
2025-08-07 09:53:34,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [273.0, 17.0, 20.0, 177.0, 22.0, 25.0, 188.0, 17.0, 204.0, 285.0]
2025-08-07 09:53:34,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 90/100 (estimated time remaining: 19 minutes, 19 seconds)
2025-08-07 09:55:18,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:55:21,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 300.58615 ± 290.975
2025-08-07 09:55:21,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [438.84967, 15.017396, 676.857, 201.71495, 388.2431, 8.430493, 6.1484523, 6.3329134, 399.2403, 865.0274]
2025-08-07 09:55:21,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [201.0, 23.0, 421.0, 106.0, 217.0, 18.0, 22.0, 20.0, 194.0, 441.0]
2025-08-07 09:55:21,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 91/100 (estimated time remaining: 17 minutes, 37 seconds)
2025-08-07 09:57:06,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:57:08,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 284.30609 ± 291.511
2025-08-07 09:57:08,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [7.3940377, 8.890283, 691.175, 694.2683, 7.21727, 585.9604, 6.089217, 476.10684, 357.8266, 8.132482]
2025-08-07 09:57:08,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [21.0, 19.0, 310.0, 246.0, 19.0, 386.0, 17.0, 201.0, 204.0, 25.0]
2025-08-07 09:57:08,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 92/100 (estimated time remaining: 15 minutes, 52 seconds)
2025-08-07 09:58:54,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:58:56,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 298.11676 ± 263.885
2025-08-07 09:58:56,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [208.46564, 3.7696087, 277.97565, -0.9358747, 321.91443, 681.61694, 344.21637, 822.87714, 4.2770057, 316.9907]
2025-08-07 09:58:56,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [197.0, 18.0, 142.0, 23.0, 151.0, 340.0, 151.0, 531.0, 14.0, 157.0]
2025-08-07 09:58:56,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 93/100 (estimated time remaining: 14 minutes, 13 seconds)
2025-08-07 10:00:38,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:00:39,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 303.69751 ± 204.267
2025-08-07 10:00:39,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [341.59317, 359.64682, 486.0567, 489.00723, 194.76964, 9.753165, 6.38023, 402.24716, 110.23184, 637.289]
2025-08-07 10:00:39,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [147.0, 159.0, 210.0, 231.0, 110.0, 24.0, 22.0, 171.0, 97.0, 304.0]
2025-08-07 10:00:39,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 94/100 (estimated time remaining: 12 minutes, 25 seconds)
2025-08-07 10:02:22,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:02:24,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 279.04871 ± 175.154
2025-08-07 10:02:24,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [201.27115, 574.92993, 214.75287, 9.424325, 348.25116, 485.6221, 346.6664, 0.07771163, 355.65973, 253.83174]
2025-08-07 10:02:24,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [112.0, 266.0, 117.0, 21.0, 155.0, 215.0, 169.0, 24.0, 156.0, 129.0]
2025-08-07 10:02:24,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 35 seconds)
2025-08-07 10:04:07,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:04:09,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 211.42818 ± 199.693
2025-08-07 10:04:09,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [6.8067517, 115.981544, 487.49057, 1.6683906, 337.46692, 392.7291, 8.483231, 230.03448, 522.8978, 10.722845]
2025-08-07 10:04:09,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [24.0, 122.0, 225.0, 15.0, 175.0, 218.0, 19.0, 143.0, 233.0, 23.0]
2025-08-07 10:04:09,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 48 seconds)
2025-08-07 10:05:55,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:05:56,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 152.87308 ± 179.565
2025-08-07 10:05:56,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [0.8666753, 418.2878, 372.3773, 12.307949, 7.3618445, 9.746902, 7.201745, 370.286, 6.2526026, 324.04202]
2025-08-07 10:05:56,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [20.0, 247.0, 169.0, 24.0, 20.0, 20.0, 19.0, 225.0, 22.0, 164.0]
2025-08-07 10:05:56,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 2 seconds)
2025-08-07 10:07:38,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:07:40,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 309.44415 ± 167.558
2025-08-07 10:07:40,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [369.44073, 559.8715, 51.769516, 419.38336, 405.36026, 363.7926, 277.8762, 445.04703, 194.44298, 7.457467]
2025-08-07 10:07:40,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [159.0, 239.0, 63.0, 198.0, 203.0, 187.0, 164.0, 234.0, 129.0, 18.0]
2025-08-07 10:07:40,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 14 seconds)
2025-08-07 10:09:24,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:09:25,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 181.85037 ± 188.164
2025-08-07 10:09:25,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [4.3724775, 506.78488, 3.4060843, 407.27628, 1.1005839, 6.569358, 277.05273, 316.16833, 3.8216643, 291.95132]
2025-08-07 10:09:25,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [22.0, 236.0, 22.0, 196.0, 17.0, 23.0, 151.0, 151.0, 16.0, 142.0]
2025-08-07 10:09:25,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 30 seconds)
2025-08-07 10:11:07,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:11:09,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 213.32390 ± 223.741
2025-08-07 10:11:09,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [7.83354, 414.4187, 8.018595, 11.755785, 464.05475, 148.05832, 233.47029, 156.16461, 5.746821, 683.7176]
2025-08-07 10:11:09,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [20.0, 180.0, 24.0, 22.0, 267.0, 80.0, 117.0, 106.0, 23.0, 277.0]
2025-08-07 10:11:09,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 45 seconds)
2025-08-07 10:12:51,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:12:52,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 183.95831 ± 197.467
2025-08-07 10:12:52,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [4.9194784, 1.7017459, 8.817723, 462.283, 313.13654, 3.6281183, 538.6756, 313.07288, 165.89435, 27.453701]
2025-08-07 10:12:52,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [15.0, 19.0, 19.0, 284.0, 145.0, 15.0, 221.0, 141.0, 101.0, 57.0]
2025-08-07 10:12:52,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1251 [DEBUG]: Training session finished
