2025-08-07 04:01:45,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc25-walker2d/ExtremeSparseL4U32-bpql-mem32
2025-08-07 04:01:45,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc25-walker2d/ExtremeSparseL4U32-bpql-mem32
2025-08-07 04:01:45,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x150a26fa7c50>}
2025-08-07 04:01:45,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1111 [DEBUG]: using device: cuda
2025-08-07 04:01:45,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1133 [INFO]: Creating new trainer
2025-08-07 04:01:45,381 baseline-bpql-noiseperc25-walker2d:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=209, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 04:01:45,381 baseline-bpql-noiseperc25-walker2d:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 04:01:46,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1194 [DEBUG]: Starting training session...
2025-08-07 04:01:46,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 1/100
2025-08-07 04:03:19,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:03:19,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 9.63255 ± 11.651
2025-08-07 04:03:19,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [8.142918, 2.3693593, 2.537509, 23.277466, 2.0054994, 2.2995584, 7.840903, 39.00518, 0.31001967, 8.53704]
2025-08-07 04:03:19,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [22.0, 30.0, 13.0, 44.0, 25.0, 34.0, 24.0, 57.0, 31.0, 21.0]
2025-08-07 04:03:19,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1226 [INFO]: New best (9.63) for latency ExtremeSparseL4U32
2025-08-07 04:03:19,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 34 minutes, 16 seconds)
2025-08-07 04:04:58,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:04:59,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 11.92988 ± 18.806
2025-08-07 04:04:59,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [3.6526458, 3.182034, 26.067944, 3.9347503, -2.1296883, 3.392366, 0.92443556, 6.878974, 63.93476, 9.460537]
2025-08-07 04:04:59,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [16.0, 23.0, 58.0, 19.0, 18.0, 14.0, 14.0, 24.0, 98.0, 23.0]
2025-08-07 04:04:59,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1226 [INFO]: New best (11.93) for latency ExtremeSparseL4U32
2025-08-07 04:04:59,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 37 minutes, 37 seconds)
2025-08-07 04:06:40,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:06:41,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 1.30411 ± 7.650
2025-08-07 04:06:41,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [5.5585866, 7.6423116, -0.8032615, -2.654218, -2.2307034, 11.647758, -16.758368, 8.732855, -0.7312809, 2.637391]
2025-08-07 04:06:41,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [22.0, 25.0, 13.0, 17.0, 11.0, 19.0, 196.0, 26.0, 27.0, 16.0]
2025-08-07 04:06:41,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 38 minutes, 58 seconds)
2025-08-07 04:08:22,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:08:22,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 0.62196 ± 7.627
2025-08-07 04:08:22,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [3.4440181, 7.477678, -20.65172, 3.3898532, 4.177596, 0.53349394, -2.7965825, 3.1552246, 6.589247, 0.9008313]
2025-08-07 04:08:22,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [13.0, 19.0, 96.0, 110.0, 15.0, 16.0, 11.0, 21.0, 35.0, 16.0]
2025-08-07 04:08:22,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 38 minutes, 35 seconds)
2025-08-07 04:10:05,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:10:05,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 7.16491 ± 11.353
2025-08-07 04:10:05,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-0.19570328, 8.223144, 7.16878, 2.3593524, 37.690018, 3.8423052, -0.69041264, 15.228859, -2.9974675, 1.0201795]
2025-08-07 04:10:05,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [15.0, 19.0, 20.0, 19.0, 59.0, 31.0, 10.0, 26.0, 30.0, 12.0]
2025-08-07 04:10:05,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 38 minutes, 6 seconds)
2025-08-07 04:11:45,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:11:46,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 20.57708 ± 21.801
2025-08-07 04:11:46,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [56.42039, 37.025974, -0.2953848, 5.9795766, 10.446812, 3.5934715, 57.59495, 2.4542143, 30.797394, 1.7533891]
2025-08-07 04:11:46,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [167.0, 70.0, 15.0, 21.0, 24.0, 20.0, 213.0, 18.0, 47.0, 19.0]
2025-08-07 04:11:46,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1226 [INFO]: New best (20.58) for latency ExtremeSparseL4U32
2025-08-07 04:11:46,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 38 minutes, 51 seconds)
2025-08-07 04:13:27,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:13:28,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 2.14571 ± 6.709
2025-08-07 04:13:28,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [4.4037, 12.3487, 0.20534259, -4.9243865, 2.5330417, 0.5256708, -11.913032, 11.106623, 2.513836, 4.6575685]
2025-08-07 04:13:28,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [18.0, 29.0, 17.0, 11.0, 20.0, 14.0, 123.0, 21.0, 13.0, 15.0]
2025-08-07 04:13:28,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 37 minutes, 46 seconds)
2025-08-07 04:15:09,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:15:09,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 7.06768 ± 6.205
2025-08-07 04:15:09,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [10.769347, -1.0412179, 12.860106, 9.333823, 5.571381, 7.7733502, 1.3337866, 0.32502326, 20.229591, 3.5216534]
2025-08-07 04:15:09,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [37.0, 17.0, 33.0, 30.0, 17.0, 24.0, 26.0, 20.0, 31.0, 18.0]
2025-08-07 04:15:09,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 35 minutes, 53 seconds)
2025-08-07 04:16:50,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:16:51,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 2.94349 ± 2.888
2025-08-07 04:16:51,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [6.533702, 1.3409272, -0.90398705, -0.97565806, 3.9986267, 7.8038797, 3.8704064, 1.0085257, 5.3898687, 1.3686377]
2025-08-07 04:16:51,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [20.0, 13.0, 24.0, 10.0, 19.0, 19.0, 22.0, 14.0, 22.0, 13.0]
2025-08-07 04:16:51,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 34 minutes, 12 seconds)
2025-08-07 04:18:31,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:18:32,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 26.54935 ± 66.315
2025-08-07 04:18:32,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [2.1901348, 4.5229473, 0.09834803, 8.6202, 225.30487, 2.448589, 10.166313, 4.464143, 2.3695972, 5.3083224]
2025-08-07 04:18:32,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [17.0, 16.0, 15.0, 77.0, 248.0, 21.0, 41.0, 14.0, 14.0, 24.0]
2025-08-07 04:18:32,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1226 [INFO]: New best (26.55) for latency ExtremeSparseL4U32
2025-08-07 04:18:32,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 32 minutes, 9 seconds)
2025-08-07 04:20:13,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:20:13,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 16.44869 ± 42.743
2025-08-07 04:20:13,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [4.1541667, 144.42375, 3.1938195, 3.3553922, 3.3531258, -2.212948, 3.6019762, -2.8343956, 1.1699655, 6.282019]
2025-08-07 04:20:13,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [17.0, 194.0, 15.0, 14.0, 23.0, 22.0, 13.0, 15.0, 15.0, 21.0]
2025-08-07 04:20:13,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 30 minutes, 26 seconds)
2025-08-07 04:21:55,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:21:56,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 5.20731 ± 6.416
2025-08-07 04:21:56,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [1.2410377, 7.452085, 2.6619618, -4.2179055, 11.3684435, 5.8221993, -1.2469134, -0.24550189, 12.536953, 16.700777]
2025-08-07 04:21:56,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [15.0, 19.0, 13.0, 24.0, 24.0, 21.0, 14.0, 26.0, 26.0, 26.0]
2025-08-07 04:21:56,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 29 minutes, 2 seconds)
2025-08-07 04:23:37,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:23:37,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 6.74972 ± 3.767
2025-08-07 04:23:37,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [8.490058, 1.5370307, 13.887165, 2.9285777, 3.5827286, 9.208959, 4.856746, 4.686789, 11.467005, 6.8521175]
2025-08-07 04:23:37,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [20.0, 19.0, 27.0, 24.0, 33.0, 31.0, 40.0, 31.0, 25.0, 22.0]
2025-08-07 04:23:37,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 27 minutes, 17 seconds)
2025-08-07 04:25:18,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:25:19,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 4.41965 ± 4.251
2025-08-07 04:25:19,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [7.330975, 5.397001, -3.4810615, 1.2679121, 5.5678453, 4.1698914, 3.592963, 0.59251213, 13.240551, 6.517953]
2025-08-07 04:25:19,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [21.0, 17.0, 22.0, 23.0, 16.0, 25.0, 26.0, 13.0, 26.0, 18.0]
2025-08-07 04:25:19,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 25 minutes, 40 seconds)
2025-08-07 04:27:00,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:27:00,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 2.65380 ± 3.757
2025-08-07 04:27:00,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [3.8553617, 4.48639, 5.7556105, 5.5377264, 0.3893762, 3.7399385, 7.46291, -5.9643216, 2.4533017, -1.1783144]
2025-08-07 04:27:00,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [16.0, 16.0, 22.0, 20.0, 17.0, 16.0, 28.0, 15.0, 18.0, 18.0]
2025-08-07 04:27:00,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 23 minutes, 48 seconds)
2025-08-07 04:28:40,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:28:41,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 13.95167 ± 19.216
2025-08-07 04:28:41,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [2.832498, 66.34159, -0.8353642, 5.3070817, 29.195023, 2.7864609, 6.3256416, 5.9707117, 14.809713, 6.7833233]
2025-08-07 04:28:41,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [14.0, 129.0, 26.0, 26.0, 69.0, 15.0, 30.0, 28.0, 30.0, 26.0]
2025-08-07 04:28:41,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 22 minutes, 3 seconds)
2025-08-07 04:30:21,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:30:22,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 22.28493 ± 44.611
2025-08-07 04:30:22,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [10.623022, 4.3208857, 1.0536865, -4.38761, 5.270326, 22.046799, 0.94599843, 19.176361, 154.07361, 9.726178]
2025-08-07 04:30:22,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [24.0, 16.0, 18.0, 18.0, 17.0, 54.0, 17.0, 32.0, 135.0, 23.0]
2025-08-07 04:30:22,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 20 minutes)
2025-08-07 04:32:03,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:32:03,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 7.18298 ± 15.028
2025-08-07 04:32:03,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-0.07472564, -1.9529339, -1.8471636, 14.081851, 49.391438, -2.3932984, 3.4455817, -1.9147445, 3.2050676, 9.888766]
2025-08-07 04:32:03,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [20.0, 17.0, 11.0, 85.0, 120.0, 11.0, 19.0, 18.0, 26.0, 32.0]
2025-08-07 04:32:03,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 18 minutes, 20 seconds)
2025-08-07 04:33:44,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:33:45,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 23.79714 ± 50.187
2025-08-07 04:33:45,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-1.7616532, 21.000296, 41.461246, -2.9240425, 0.9746346, 0.11141831, 168.58202, 0.31297743, -5.0418086, 15.256267]
2025-08-07 04:33:45,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [17.0, 46.0, 62.0, 11.0, 16.0, 22.0, 95.0, 25.0, 16.0, 117.0]
2025-08-07 04:33:45,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 16 minutes, 38 seconds)
2025-08-07 04:35:25,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:35:25,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 7.28546 ± 11.546
2025-08-07 04:35:25,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-1.7507787, 3.5754282, 7.4766912, -0.10042165, 21.27666, -3.1257458, 35.882572, 1.3211932, 2.4420018, 5.857012]
2025-08-07 04:35:25,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [11.0, 13.0, 18.0, 12.0, 32.0, 32.0, 50.0, 16.0, 19.0, 27.0]
2025-08-07 04:35:25,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 14 minutes, 46 seconds)
2025-08-07 04:37:06,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:37:06,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 4.26819 ± 6.454
2025-08-07 04:37:06,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-4.2778735, 1.4236294, 2.8466709, 2.288964, 5.126199, 7.3871136, 21.57959, 0.24418874, 2.9016697, 3.1617048]
2025-08-07 04:37:06,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [18.0, 12.0, 15.0, 13.0, 23.0, 21.0, 43.0, 16.0, 17.0, 23.0]
2025-08-07 04:37:06,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 13 minutes, 8 seconds)
2025-08-07 04:38:47,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:38:48,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 29.02692 ± 76.601
2025-08-07 04:38:48,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [7.216587, 258.68057, 0.4435353, 2.6526728, 3.6403854, 9.811328, 2.4052665, 0.40338945, 2.3786354, 2.6368127]
2025-08-07 04:38:48,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [24.0, 174.0, 22.0, 14.0, 27.0, 33.0, 12.0, 11.0, 21.0, 17.0]
2025-08-07 04:38:48,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1226 [INFO]: New best (29.03) for latency ExtremeSparseL4U32
2025-08-07 04:38:48,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 11 minutes, 29 seconds)
2025-08-07 04:40:29,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:40:29,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 13.31273 ± 22.446
2025-08-07 04:40:29,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [4.0803027, 1.9241598, 5.064026, 1.7355454, 7.286755, 9.324337, 79.84456, 2.512408, 13.105833, 8.249401]
2025-08-07 04:40:29,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [16.0, 32.0, 31.0, 16.0, 19.0, 18.0, 118.0, 29.0, 33.0, 34.0]
2025-08-07 04:40:29,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 9 minutes, 55 seconds)
2025-08-07 04:42:12,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:42:12,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 14.03186 ± 29.014
2025-08-07 04:42:12,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-5.8387, -0.37015995, 2.1783905, -0.32980168, 6.505454, 5.9159675, 98.92609, 19.816954, 6.8719563, 6.6424403]
2025-08-07 04:42:12,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [12.0, 13.0, 16.0, 11.0, 25.0, 23.0, 140.0, 49.0, 25.0, 18.0]
2025-08-07 04:42:12,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 8 minutes, 29 seconds)
2025-08-07 04:43:51,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:43:51,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 3.00080 ± 4.248
2025-08-07 04:43:51,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [4.5574794, -1.2640786, 12.1072, 6.8148403, -2.13525, 0.3236828, 5.3743463, 2.2575302, 3.6836538, -1.7114186]
2025-08-07 04:43:51,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [21.0, 21.0, 25.0, 23.0, 24.0, 25.0, 23.0, 14.0, 24.0, 22.0]
2025-08-07 04:43:51,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 6 minutes, 33 seconds)
2025-08-07 04:45:33,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:45:33,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 8.03282 ± 8.125
2025-08-07 04:45:33,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [3.9533434, 15.602763, -3.5613134, 9.846845, 0.38999167, 21.300474, 19.982647, 2.0481164, 2.1076293, 8.657679]
2025-08-07 04:45:33,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [15.0, 28.0, 27.0, 26.0, 19.0, 57.0, 32.0, 31.0, 15.0, 20.0]
2025-08-07 04:45:33,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 4 minutes, 58 seconds)
2025-08-07 04:47:14,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:47:15,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 27.14995 ± 54.693
2025-08-07 04:47:15,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [11.831714, 32.896275, 5.2863107, 5.44348, 3.6146364, 7.2356243, 22.032541, -5.073495, 188.22215, 0.010291076]
2025-08-07 04:47:15,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [29.0, 58.0, 20.0, 32.0, 14.0, 20.0, 76.0, 20.0, 137.0, 17.0]
2025-08-07 04:47:15,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 3 minutes, 20 seconds)
2025-08-07 04:48:55,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:48:56,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 14.72011 ± 37.776
2025-08-07 04:48:56,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-0.13769327, 7.5382643, -3.8522465, 6.551551, -4.273921, 4.579308, 127.46571, 3.695822, 4.5359006, 1.0984532]
2025-08-07 04:48:56,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [15.0, 19.0, 19.0, 22.0, 19.0, 17.0, 170.0, 13.0, 24.0, 12.0]
2025-08-07 04:48:56,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 1 minute, 31 seconds)
2025-08-07 04:50:38,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:50:38,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 2.28838 ± 3.196
2025-08-07 04:50:38,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [1.0794467, 3.3785834, 7.750719, 4.439337, -0.55526304, 4.904669, -4.7804446, 2.4365845, 2.2068167, 2.0233083]
2025-08-07 04:50:38,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [12.0, 22.0, 26.0, 24.0, 23.0, 19.0, 21.0, 16.0, 20.0, 16.0]
2025-08-07 04:50:38,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 30/100 (estimated time remaining: 1 hour, 59 minutes, 47 seconds)
2025-08-07 04:52:19,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:52:19,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 2.11615 ± 1.799
2025-08-07 04:52:19,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-0.7869235, 3.4454443, 1.0288066, 3.5291653, 2.3930416, -0.8609019, 3.220838, 5.083348, 2.1545434, 1.9541147]
2025-08-07 04:52:19,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [12.0, 22.0, 11.0, 15.0, 26.0, 13.0, 26.0, 24.0, 18.0, 13.0]
2025-08-07 04:52:19,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 58 minutes, 27 seconds)
2025-08-07 04:54:00,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:54:01,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 23.69879 ± 33.914
2025-08-07 04:54:01,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [7.7262983, 1.1298779, 2.0137708, 6.397112, 105.01819, 66.16456, 4.024218, 4.961317, -0.0034542452, 39.556]
2025-08-07 04:54:01,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [18.0, 25.0, 12.0, 19.0, 198.0, 106.0, 15.0, 32.0, 26.0, 77.0]
2025-08-07 04:54:01,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 56 minutes, 49 seconds)
2025-08-07 04:55:42,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:55:43,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 25.11839 ± 32.525
2025-08-07 04:55:43,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [80.48135, 11.659482, 28.594269, 2.1869898, 10.034723, 95.94321, 3.325355, 9.459158, 4.1250467, 5.3743167]
2025-08-07 04:55:43,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [80.0, 30.0, 89.0, 13.0, 25.0, 143.0, 13.0, 29.0, 16.0, 16.0]
2025-08-07 04:55:43,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 55 minutes, 16 seconds)
2025-08-07 04:57:24,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:57:25,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 35.71859 ± 50.131
2025-08-07 04:57:25,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-1.3005562, 0.903831, 9.436182, 36.99475, -5.2891483, 5.477907, 123.448166, 140.87892, 13.390809, 33.245075]
2025-08-07 04:57:25,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [16.0, 21.0, 31.0, 83.0, 18.0, 76.0, 106.0, 134.0, 26.0, 119.0]
2025-08-07 04:57:25,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1226 [INFO]: New best (35.72) for latency ExtremeSparseL4U32
2025-08-07 04:57:25,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 53 minutes, 45 seconds)
2025-08-07 04:59:06,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:59:06,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 8.18564 ± 16.056
2025-08-07 04:59:06,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [55.6338, 3.5762362, -1.5412961, 1.8954345, 2.313421, 0.81127465, 9.750749, 1.6673523, 3.599394, 4.1500416]
2025-08-07 04:59:06,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [76.0, 22.0, 18.0, 15.0, 21.0, 18.0, 30.0, 18.0, 30.0, 15.0]
2025-08-07 04:59:06,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 51 minutes, 48 seconds)
2025-08-07 05:00:48,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:00:48,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 15.34490 ± 29.616
2025-08-07 05:00:48,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-1.672978, 4.7760887, 102.08775, 6.334705, 19.648909, 9.790897, 9.416713, -0.4080948, 7.2737093, -3.7986655]
2025-08-07 05:00:48,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [14.0, 32.0, 95.0, 33.0, 33.0, 38.0, 30.0, 10.0, 21.0, 23.0]
2025-08-07 05:00:48,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 50 minutes, 18 seconds)
2025-08-07 05:02:30,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:02:31,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 52.66582 ± 51.528
2025-08-07 05:02:31,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [2.3454466, 3.8932693, 68.98905, 102.203575, 5.3097653, 4.626692, 127.25565, 1.8930211, 85.892715, 124.24902]
2025-08-07 05:02:31,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [17.0, 33.0, 96.0, 97.0, 16.0, 18.0, 131.0, 30.0, 94.0, 102.0]
2025-08-07 05:02:31,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1226 [INFO]: New best (52.67) for latency ExtremeSparseL4U32
2025-08-07 05:02:31,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 48 minutes, 43 seconds)
2025-08-07 05:04:12,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:04:12,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 3.01603 ± 4.368
2025-08-07 05:04:12,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [4.0801787, 0.5520234, -6.543647, 5.8400474, 1.2352543, 5.89588, 1.8308678, 2.4470215, 3.3868392, 11.4358]
2025-08-07 05:04:12,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [19.0, 19.0, 18.0, 18.0, 15.0, 19.0, 20.0, 20.0, 22.0, 27.0]
2025-08-07 05:04:12,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 46 minutes, 52 seconds)
2025-08-07 05:05:52,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:05:53,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 2.66830 ± 6.522
2025-08-07 05:05:53,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-2.7820196, 4.8324957, 3.0895503, 1.2924837, 0.16314931, 1.9534335, -2.3833842, -0.9461881, 0.400267, 21.063185]
2025-08-07 05:05:53,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [25.0, 16.0, 15.0, 16.0, 28.0, 16.0, 17.0, 15.0, 13.0, 122.0]
2025-08-07 05:05:53,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 44 minutes, 54 seconds)
2025-08-07 05:07:33,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:07:34,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 47.40038 ± 93.057
2025-08-07 05:07:34,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-1.8285258, 178.59158, 0.92437035, 3.4611242, -1.6943507, 13.611291, 277.38052, 0.09140595, -1.2993563, 4.765684]
2025-08-07 05:07:34,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [9.0, 140.0, 21.0, 14.0, 26.0, 25.0, 187.0, 10.0, 16.0, 18.0]
2025-08-07 05:07:34,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 43 minutes, 9 seconds)
2025-08-07 05:09:15,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:09:15,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 9.98947 ± 22.778
2025-08-07 05:09:15,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-3.443713, -4.9583654, 3.091288, 76.70776, 3.4122214, 0.80624926, 12.431481, -0.38587165, 3.4210815, 8.81254]
2025-08-07 05:09:15,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [19.0, 17.0, 19.0, 142.0, 22.0, 14.0, 25.0, 14.0, 16.0, 37.0]
2025-08-07 05:09:15,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 41 minutes, 21 seconds)
2025-08-07 05:10:55,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:10:55,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 5.70245 ± 4.379
2025-08-07 05:10:55,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [0.4503592, 2.4997368, 2.3244557, 4.958869, 1.9452482, 10.022461, 5.212151, 14.207606, 3.9190009, 11.484572]
2025-08-07 05:10:55,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [21.0, 37.0, 20.0, 17.0, 13.0, 21.0, 18.0, 26.0, 18.0, 30.0]
2025-08-07 05:10:55,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 39 minutes, 17 seconds)
2025-08-07 05:12:35,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:12:36,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 3.75781 ± 7.512
2025-08-07 05:12:36,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-6.394345, 4.386872, 8.576561, -2.5772166, 1.3783507, 12.692619, 2.3845892, 19.587017, -3.4508069, 0.9944349]
2025-08-07 05:12:36,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [19.0, 24.0, 26.0, 21.0, 17.0, 29.0, 19.0, 29.0, 18.0, 23.0]
2025-08-07 05:12:36,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 37 minutes, 22 seconds)
2025-08-07 05:14:14,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:14:15,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 11.57407 ± 21.977
2025-08-07 05:14:15,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-0.3795962, 74.7714, 15.340224, 0.683638, 16.686956, 3.1414635, 7.2286263, -0.1764113, -1.4929699, -0.06262868]
2025-08-07 05:14:15,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [16.0, 94.0, 38.0, 21.0, 43.0, 16.0, 20.0, 27.0, 17.0, 14.0]
2025-08-07 05:14:15,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 35 minutes, 23 seconds)
2025-08-07 05:15:53,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:15:54,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 11.23788 ± 16.286
2025-08-07 05:15:54,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [4.8834267, 9.33057, 3.6431377, -2.2618718, 20.580746, 5.684878, 56.22755, 0.47620586, 0.8643215, 12.94986]
2025-08-07 05:15:54,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [24.0, 33.0, 20.0, 9.0, 56.0, 18.0, 88.0, 18.0, 16.0, 25.0]
2025-08-07 05:15:54,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 33 minutes, 20 seconds)
2025-08-07 05:17:31,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:17:32,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 2.43659 ± 6.120
2025-08-07 05:17:32,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [3.8904128, -3.2492778, 6.1126423, 10.65077, -0.9753579, 15.342452, -0.9851233, -0.44584924, -1.9685935, -4.006172]
2025-08-07 05:17:32,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [28.0, 27.0, 27.0, 34.0, 17.0, 36.0, 34.0, 17.0, 23.0, 11.0]
2025-08-07 05:17:32,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 31 minutes, 5 seconds)
2025-08-07 05:19:09,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:19:10,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 7.13175 ± 8.228
2025-08-07 05:19:10,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [0.46543938, 0.9457366, 7.36437, 4.2349296, 13.704473, 26.54704, 12.579233, 7.593234, -0.16148822, -1.9554443]
2025-08-07 05:19:10,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [17.0, 11.0, 29.0, 16.0, 26.0, 38.0, 22.0, 17.0, 11.0, 24.0]
2025-08-07 05:19:10,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 28 minutes, 57 seconds)
2025-08-07 05:20:47,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:20:47,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 5.43122 ± 2.477
2025-08-07 05:20:47,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [3.5646465, 9.199888, 1.1706638, 5.3930798, 5.590448, 2.7740333, 9.493361, 4.9353094, 6.719958, 5.4707675]
2025-08-07 05:20:47,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [17.0, 34.0, 34.0, 32.0, 29.0, 20.0, 24.0, 14.0, 15.0, 17.0]
2025-08-07 05:20:47,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 26 minutes, 48 seconds)
2025-08-07 05:22:25,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:22:25,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 8.09090 ± 10.006
2025-08-07 05:22:25,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [7.558229, -2.5633564, 22.072874, 0.90980184, 5.1874113, 0.41473985, 15.037271, 29.080723, 1.3965975, 1.814691]
2025-08-07 05:22:25,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [27.0, 17.0, 41.0, 17.0, 21.0, 18.0, 35.0, 39.0, 17.0, 32.0]
2025-08-07 05:22:25,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 24 minutes, 59 seconds)
2025-08-07 05:24:03,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:24:03,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 6.22993 ± 3.129
2025-08-07 05:24:03,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-0.66798574, 8.393546, 7.48203, 9.472585, 4.9939113, 4.4594603, 5.8243985, 11.206419, 6.8131995, 4.321778]
2025-08-07 05:24:03,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [18.0, 31.0, 31.0, 24.0, 17.0, 17.0, 24.0, 34.0, 28.0, 27.0]
2025-08-07 05:24:03,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 23 minutes, 8 seconds)
2025-08-07 05:25:40,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:25:41,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 5.40253 ± 2.940
2025-08-07 05:25:41,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [5.148318, 6.5886974, 0.3455682, 3.9764483, 5.695191, 7.7796283, 8.87079, 7.1628513, 0.12090255, 8.336924]
2025-08-07 05:25:41,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [35.0, 21.0, 24.0, 33.0, 19.0, 22.0, 29.0, 25.0, 21.0, 25.0]
2025-08-07 05:25:41,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 21 minutes, 29 seconds)
2025-08-07 05:27:18,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:27:19,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 5.09863 ± 5.261
2025-08-07 05:27:19,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-0.89722025, 10.3283415, 5.6843495, 5.1412835, 1.4639094, -2.4362197, 6.605969, 10.725513, 0.09763258, 14.272744]
2025-08-07 05:27:19,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [13.0, 33.0, 30.0, 23.0, 14.0, 14.0, 20.0, 33.0, 15.0, 24.0]
2025-08-07 05:27:19,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 19 minutes, 54 seconds)
2025-08-07 05:28:57,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:28:57,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 1.76267 ± 6.276
2025-08-07 05:28:57,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-2.3735042, -11.2322645, 2.7650745, 14.886482, 4.137176, 0.5419256, 1.5953348, -1.4230535, 5.7409596, 2.988543]
2025-08-07 05:28:57,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [13.0, 32.0, 18.0, 31.0, 16.0, 24.0, 25.0, 9.0, 21.0, 21.0]
2025-08-07 05:28:57,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 18 minutes, 27 seconds)
2025-08-07 05:30:37,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:30:38,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 30.52320 ± 61.176
2025-08-07 05:30:38,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-5.1133275, 32.02462, 1.1978998, -0.15483665, 8.300811, 8.912563, 9.413522, 208.57687, 44.07572, -2.0019069]
2025-08-07 05:30:38,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [15.0, 71.0, 22.0, 12.0, 21.0, 17.0, 25.0, 139.0, 86.0, 17.0]
2025-08-07 05:30:38,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 17 minutes, 10 seconds)
2025-08-07 05:32:17,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:32:18,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 48.07721 ± 138.008
2025-08-07 05:32:18,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-1.5213263, 462.00934, -0.8742642, 0.08565225, 7.552253, 4.5486617, 3.492269, 5.369555, 1.1035883, -0.9936573]
2025-08-07 05:32:18,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [20.0, 357.0, 24.0, 25.0, 17.0, 19.0, 16.0, 16.0, 13.0, 20.0]
2025-08-07 05:32:18,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 15 minutes, 52 seconds)
2025-08-07 05:33:58,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:33:58,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 7.71508 ± 18.973
2025-08-07 05:33:58,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [6.3736196, -3.9677463, 0.8734018, 2.8589156, 7.934082, -1.4691147, 4.583788, -5.249041, 63.34439, 1.8685315]
2025-08-07 05:33:58,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [15.0, 27.0, 11.0, 22.0, 23.0, 9.0, 21.0, 13.0, 134.0, 28.0]
2025-08-07 05:33:58,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 14 minutes, 36 seconds)
2025-08-07 05:35:39,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:35:39,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 29.71154 ± 54.333
2025-08-07 05:35:39,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [94.55811, 170.14096, 13.349282, 10.181314, 4.645531, -1.6442858, 4.5339804, 6.8610983, 3.165001, -8.675618]
2025-08-07 05:35:39,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [189.0, 116.0, 23.0, 22.0, 21.0, 11.0, 14.0, 19.0, 17.0, 16.0]
2025-08-07 05:35:39,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 13 minutes, 24 seconds)
2025-08-07 05:37:20,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:37:20,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 35.28178 ± 98.519
2025-08-07 05:37:20,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [3.035974, 5.020829, 3.126898, 2.8372257, -0.6340709, 330.63736, 10.34148, 1.1867986, -4.671236, 1.9364791]
2025-08-07 05:37:20,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [17.0, 24.0, 14.0, 16.0, 12.0, 189.0, 28.0, 18.0, 13.0, 20.0]
2025-08-07 05:37:20,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 12 minutes, 3 seconds)
2025-08-07 05:38:59,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:39:00,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 6.19002 ± 12.342
2025-08-07 05:39:00,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [1.8723328, 41.815224, -0.620511, -1.9729484, 2.7435567, 3.4754903, 10.368027, 3.673496, -1.901569, 2.4471195]
2025-08-07 05:39:00,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [13.0, 129.0, 20.0, 12.0, 17.0, 17.0, 26.0, 27.0, 11.0, 15.0]
2025-08-07 05:39:00,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 10 minutes, 15 seconds)
2025-08-07 05:40:39,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:40:40,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 32.99273 ± 62.655
2025-08-07 05:40:40,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [4.1845436, 3.377898, -1.9844241, 0.4157727, 158.47232, 1.1173226, 0.22385319, 157.98225, 5.9062667, 0.23143129]
2025-08-07 05:40:40,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [19.0, 23.0, 8.0, 25.0, 114.0, 14.0, 13.0, 100.0, 15.0, 13.0]
2025-08-07 05:40:40,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 8 minutes, 37 seconds)
2025-08-07 05:42:18,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:42:18,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 36.21314 ± 96.945
2025-08-07 05:42:18,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [1.3798424, 5.178047, 11.619622, 7.353974, 1.8715024, -5.1073365, 326.76312, 0.9945873, 5.0788155, 6.999231]
2025-08-07 05:42:18,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [18.0, 20.0, 21.0, 32.0, 11.0, 20.0, 225.0, 23.0, 21.0, 17.0]
2025-08-07 05:42:18,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 6 minutes, 40 seconds)
2025-08-07 05:43:57,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:43:59,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 134.19635 ± 125.926
2025-08-07 05:43:59,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [177.52554, 7.0128026, 184.9397, 177.56233, 346.7026, 4.215451, 6.3238006, 330.23572, 109.74429, -2.2985232]
2025-08-07 05:43:59,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [114.0, 17.0, 132.0, 238.0, 262.0, 16.0, 25.0, 213.0, 152.0, 21.0]
2025-08-07 05:43:59,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1226 [INFO]: New best (134.20) for latency ExtremeSparseL4U32
2025-08-07 05:43:59,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 4 minutes, 59 seconds)
2025-08-07 05:45:38,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:45:40,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 56.74773 ± 150.011
2025-08-07 05:45:40,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [9.776786, 5.347935, 505.4565, 39.502693, -2.744052, -1.4753461, 6.1464987, 0.63911813, 3.940039, 0.88708186]
2025-08-07 05:45:40,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [30.0, 17.0, 527.0, 133.0, 17.0, 16.0, 32.0, 11.0, 27.0, 11.0]
2025-08-07 05:45:40,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 3 minutes, 16 seconds)
2025-08-07 05:47:18,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:47:19,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 4.86114 ± 3.075
2025-08-07 05:47:19,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [2.9360282, 1.3910134, 7.4775934, 3.607329, 6.2924542, 6.885769, 9.238744, 8.313353, 3.0902214, -0.62113994]
2025-08-07 05:47:19,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [15.0, 13.0, 20.0, 16.0, 32.0, 30.0, 28.0, 22.0, 14.0, 11.0]
2025-08-07 05:47:19,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 1 minute, 33 seconds)
2025-08-07 05:49:01,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:49:02,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 6.82753 ± 9.794
2025-08-07 05:49:02,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [7.888369, -2.7332397, -0.7417153, 10.220618, -0.6439771, 7.8098526, 32.761856, -1.1850473, 6.1260476, 8.772486]
2025-08-07 05:49:02,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [18.0, 10.0, 13.0, 33.0, 15.0, 26.0, 77.0, 14.0, 21.0, 19.0]
2025-08-07 05:49:02,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 12 seconds)
2025-08-07 05:50:37,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:50:38,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 12.92846 ± 26.208
2025-08-07 05:50:38,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [90.60274, 2.419297, 8.512897, 8.298552, 6.2412047, 11.51972, -2.2539458, 0.080008656, 1.8922579, 1.9719135]
2025-08-07 05:50:38,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [116.0, 30.0, 24.0, 30.0, 26.0, 23.0, 27.0, 23.0, 21.0, 15.0]
2025-08-07 05:50:38,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 66/100 (estimated time remaining: 58 minutes, 18 seconds)
2025-08-07 05:52:16,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:52:17,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 6.33911 ± 4.338
2025-08-07 05:52:17,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [5.6300116, 5.325006, 3.0796525, 15.211845, 7.0994883, -1.3206544, 3.8533232, 4.3683257, 9.794415, 10.349666]
2025-08-07 05:52:17,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [28.0, 16.0, 20.0, 32.0, 18.0, 14.0, 14.0, 18.0, 25.0, 18.0]
2025-08-07 05:52:17,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 67/100 (estimated time remaining: 56 minutes, 22 seconds)
2025-08-07 05:53:55,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:53:56,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 3.82257 ± 4.076
2025-08-07 05:53:56,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-0.16995022, -2.3666673, 9.019795, 3.2414258, 1.5324647, -0.06816657, 6.2453594, 2.8794866, 7.4210105, 10.490973]
2025-08-07 05:53:56,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [13.0, 15.0, 25.0, 14.0, 17.0, 14.0, 22.0, 28.0, 25.0, 23.0]
2025-08-07 05:53:56,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 68/100 (estimated time remaining: 54 minutes, 33 seconds)
2025-08-07 05:55:35,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:55:35,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 35.90691 ± 82.120
2025-08-07 05:55:35,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [46.856846, -1.9829438, 11.96002, 2.3693345, 1.58888, 278.8354, -0.996378, 4.9440956, 1.2588483, 14.235007]
2025-08-07 05:55:35,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [135.0, 17.0, 28.0, 14.0, 32.0, 168.0, 9.0, 16.0, 18.0, 30.0]
2025-08-07 05:55:36,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 69/100 (estimated time remaining: 52 minutes, 59 seconds)
2025-08-07 05:57:14,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:57:15,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 13.78221 ± 12.252
2025-08-07 05:57:15,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [7.8727546, 8.332644, 5.4160776, 5.346531, 23.281473, 16.568914, 46.592464, 4.061254, 9.825094, 10.52485]
2025-08-07 05:57:15,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [22.0, 22.0, 32.0, 30.0, 161.0, 142.0, 196.0, 18.0, 22.0, 28.0]
2025-08-07 05:57:15,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 70/100 (estimated time remaining: 51 minutes, 1 second)
2025-08-07 05:58:55,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:58:55,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 3.27883 ± 3.440
2025-08-07 05:58:55,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [7.219789, 4.9994, -0.31596157, -1.293388, 4.2332277, 7.508483, -1.5213127, 6.453324, 0.15298775, 5.3517275]
2025-08-07 05:58:55,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [18.0, 22.0, 19.0, 20.0, 21.0, 20.0, 11.0, 20.0, 16.0, 21.0]
2025-08-07 05:58:55,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 71/100 (estimated time remaining: 49 minutes, 44 seconds)
2025-08-07 06:00:34,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:00:36,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 108.26472 ± 242.638
2025-08-07 06:00:36,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [7.237796, 140.30672, 78.519684, 14.108824, 1.6530173, -0.68549687, 9.817168, -4.30605, 11.622628, 824.3729]
2025-08-07 06:00:36,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [25.0, 80.0, 145.0, 24.0, 16.0, 16.0, 32.0, 21.0, 25.0, 693.0]
2025-08-07 06:00:36,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 72/100 (estimated time remaining: 48 minutes, 13 seconds)
2025-08-07 06:02:15,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:02:16,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 57.88417 ± 140.908
2025-08-07 06:02:16,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [23.758257, 40.086933, -0.20820825, 478.73413, 1.8377022, 4.1051235, -1.8147314, 22.80176, -2.6042855, 12.1451025]
2025-08-07 06:02:16,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [79.0, 103.0, 59.0, 378.0, 15.0, 17.0, 9.0, 34.0, 16.0, 28.0]
2025-08-07 06:02:16,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 73/100 (estimated time remaining: 46 minutes, 44 seconds)
2025-08-07 06:03:56,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:03:56,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 6.36933 ± 6.312
2025-08-07 06:03:56,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-2.7475395, 4.822543, 7.1454835, 13.374074, -3.979641, 17.371386, 7.325044, 6.093912, 11.196597, 3.0914767]
2025-08-07 06:03:56,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [19.0, 25.0, 23.0, 30.0, 25.0, 31.0, 21.0, 21.0, 27.0, 25.0]
2025-08-07 06:03:56,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 74/100 (estimated time remaining: 45 minutes, 3 seconds)
2025-08-07 06:05:36,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:05:37,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 60.57980 ± 171.158
2025-08-07 06:05:37,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [7.237432, 10.521526, -1.0294799, 4.235818, 3.0374072, 8.173952, 0.7893537, 573.84784, 6.1458592, -7.1617575]
2025-08-07 06:05:37,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [17.0, 30.0, 22.0, 14.0, 29.0, 20.0, 16.0, 303.0, 17.0, 20.0]
2025-08-07 06:05:37,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 75/100 (estimated time remaining: 43 minutes, 29 seconds)
2025-08-07 06:07:16,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:07:16,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 40.44038 ± 114.822
2025-08-07 06:07:16,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [9.925614, 4.854619, -8.714203, 1.6503069, 8.047652, -0.34612733, 1.5488617, 384.573, 5.1364493, -2.2724402]
2025-08-07 06:07:16,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [20.0, 32.0, 22.0, 22.0, 21.0, 14.0, 17.0, 192.0, 18.0, 19.0]
2025-08-07 06:07:16,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 76/100 (estimated time remaining: 41 minutes, 43 seconds)
2025-08-07 06:08:55,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:08:55,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 1.95896 ± 4.764
2025-08-07 06:08:55,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-0.362282, -2.1891775, 12.195045, 5.7802534, -0.27766484, 0.10247324, 1.7909969, 4.1572733, -6.112703, 4.5053983]
2025-08-07 06:08:55,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [20.0, 12.0, 29.0, 20.0, 24.0, 11.0, 24.0, 18.0, 13.0, 15.0]
2025-08-07 06:08:56,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 77/100 (estimated time remaining: 39 minutes, 59 seconds)
2025-08-07 06:10:34,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:10:35,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 7.46187 ± 5.185
2025-08-07 06:10:35,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [10.226748, 14.372543, 5.879186, 16.410973, 8.756635, -0.036517706, 2.189191, 1.6433547, 5.212394, 9.9641485]
2025-08-07 06:10:35,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [27.0, 28.0, 21.0, 30.0, 24.0, 24.0, 14.0, 24.0, 17.0, 28.0]
2025-08-07 06:10:35,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 78/100 (estimated time remaining: 38 minutes, 12 seconds)
2025-08-07 06:12:14,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:12:15,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 42.48952 ± 119.081
2025-08-07 06:12:15,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [6.8463316, -1.8273442, -5.487998, 6.903664, 10.568761, 5.722056, 3.5588114, 0.6782455, 399.46082, -1.5280713]
2025-08-07 06:12:15,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [23.0, 29.0, 22.0, 25.0, 34.0, 17.0, 17.0, 16.0, 206.0, 14.0]
2025-08-07 06:12:15,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 79/100 (estimated time remaining: 36 minutes, 33 seconds)
2025-08-07 06:13:55,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:13:56,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 51.65734 ± 105.853
2025-08-07 06:13:56,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [3.9953983, 181.55371, -1.2990903, 1.1043171, -5.1402307, 325.0252, -3.0246522, 3.6763916, 3.5919843, 7.0904226]
2025-08-07 06:13:56,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [21.0, 120.0, 13.0, 15.0, 12.0, 380.0, 22.0, 23.0, 14.0, 26.0]
2025-08-07 06:13:56,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 80/100 (estimated time remaining: 34 minutes, 56 seconds)
2025-08-07 06:15:35,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:15:36,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 23.69196 ± 70.912
2025-08-07 06:15:36,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [1.3173829, -3.1156511, -1.683829, 11.67428, -5.1755605, 235.90402, 5.611585, -5.2824335, -3.1820006, 0.8517765]
2025-08-07 06:15:36,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [27.0, 9.0, 9.0, 23.0, 19.0, 351.0, 17.0, 21.0, 11.0, 13.0]
2025-08-07 06:15:36,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 81/100 (estimated time remaining: 33 minutes, 19 seconds)
2025-08-07 06:17:17,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:17:18,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 80.36425 ± 223.301
2025-08-07 06:17:18,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-0.24138564, 2.0820427, 4.2475853, 12.023424, 5.410435, 750.1832, 4.0761313, 6.6450105, 10.162102, 9.053883]
2025-08-07 06:17:18,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [16.0, 12.0, 16.0, 23.0, 24.0, 418.0, 15.0, 23.0, 21.0, 19.0]
2025-08-07 06:17:18,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 82/100 (estimated time remaining: 31 minutes, 47 seconds)
2025-08-07 06:18:56,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:18:56,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 14.74595 ± 43.939
2025-08-07 06:18:56,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [0.17454343, 146.19534, -4.5821185, -4.429244, 3.1597042, 5.3231797, 1.1330594, 4.401447, -2.2469125, -1.6694878]
2025-08-07 06:18:56,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [23.0, 103.0, 20.0, 20.0, 25.0, 23.0, 14.0, 21.0, 13.0, 20.0]
2025-08-07 06:18:56,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 83/100 (estimated time remaining: 30 minutes, 5 seconds)
2025-08-07 06:20:37,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:20:38,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 58.37714 ± 155.784
2025-08-07 06:20:38,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [6.0589733, 5.5260916, 9.154193, 3.2501135, 7.0493484, 1.9752991, 5.064858, 17.419807, 2.711239, 525.56146]
2025-08-07 06:20:38,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [22.0, 18.0, 29.0, 17.0, 21.0, 24.0, 21.0, 32.0, 31.0, 300.0]
2025-08-07 06:20:38,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 84/100 (estimated time remaining: 28 minutes, 31 seconds)
2025-08-07 06:22:15,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:22:16,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 7.59147 ± 12.705
2025-08-07 06:22:16,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-2.2296956, -1.8073214, 10.604923, 2.9600098, 2.0706549, 43.914326, 5.7183466, 8.063589, 5.3941913, 1.2257087]
2025-08-07 06:22:16,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [31.0, 9.0, 24.0, 23.0, 29.0, 85.0, 33.0, 28.0, 20.0, 26.0]
2025-08-07 06:22:16,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 85/100 (estimated time remaining: 26 minutes, 37 seconds)
2025-08-07 06:23:55,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:23:56,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 8.30734 ± 16.711
2025-08-07 06:23:56,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [1.8929645, 57.450928, 2.1734672, -1.029016, -0.1065594, 1.6132581, -0.9619924, 5.147758, 8.4358635, 8.456706]
2025-08-07 06:23:56,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [17.0, 129.0, 15.0, 18.0, 17.0, 16.0, 19.0, 30.0, 34.0, 31.0]
2025-08-07 06:23:56,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 86/100 (estimated time remaining: 24 minutes, 58 seconds)
2025-08-07 06:25:35,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:25:36,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 42.19623 ± 109.398
2025-08-07 06:25:36,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-0.13666995, 2.6434655, 3.085275, 6.098648, 10.19755, 369.9616, 3.9120328, 15.418598, -3.0994747, 13.881296]
2025-08-07 06:25:36,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [17.0, 17.0, 18.0, 20.0, 31.0, 240.0, 20.0, 33.0, 10.0, 24.0]
2025-08-07 06:25:36,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 87/100 (estimated time remaining: 23 minutes, 15 seconds)
2025-08-07 06:27:16,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:27:17,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 98.96655 ± 246.374
2025-08-07 06:27:17,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [0.46678367, 827.90967, 10.679564, 1.6127768, -0.2966365, -1.7954584, 2.4937305, 6.307216, 3.2416174, 139.04617]
2025-08-07 06:27:17,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [15.0, 501.0, 23.0, 26.0, 28.0, 12.0, 16.0, 29.0, 14.0, 238.0]
2025-08-07 06:27:17,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 88/100 (estimated time remaining: 21 minutes, 42 seconds)
2025-08-07 06:28:56,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:28:57,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 13.36307 ± 23.498
2025-08-07 06:28:57,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [5.8932757, 6.9794703, -0.76143384, 3.060506, 3.6410594, 83.16267, 5.639819, 12.085693, 9.157768, 4.7719116]
2025-08-07 06:28:57,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [16.0, 16.0, 22.0, 13.0, 20.0, 165.0, 19.0, 28.0, 28.0, 17.0]
2025-08-07 06:28:57,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 89/100 (estimated time remaining: 19 minutes, 56 seconds)
2025-08-07 06:30:36,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:30:37,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 2.53438 ± 5.559
2025-08-07 06:30:37,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [0.79147786, 0.98214537, -5.216501, 3.4814107, 7.5930624, 14.887235, -3.0766, 1.0406812, 6.346568, -1.4856467]
2025-08-07 06:30:37,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [18.0, 15.0, 15.0, 20.0, 32.0, 26.0, 16.0, 26.0, 18.0, 26.0]
2025-08-07 06:30:37,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 90/100 (estimated time remaining: 18 minutes, 22 seconds)
2025-08-07 06:32:15,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:32:16,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 9.38964 ± 7.770
2025-08-07 06:32:16,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [3.3549094, 4.9541855, 10.415997, 3.589576, 6.459329, 7.758069, -1.5635524, 15.790507, 16.869524, 26.267832]
2025-08-07 06:32:16,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [21.0, 18.0, 29.0, 27.0, 25.0, 33.0, 18.0, 30.0, 38.0, 45.0]
2025-08-07 06:32:16,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 91/100 (estimated time remaining: 16 minutes, 39 seconds)
2025-08-07 06:33:56,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:33:57,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 5.98481 ± 13.423
2025-08-07 06:33:57,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-0.83116436, 5.0050206, 44.677345, 6.8363004, -2.1454854, 7.7976933, -3.7824724, 2.043276, -1.8120841, 2.0596952]
2025-08-07 06:33:57,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [17.0, 16.0, 114.0, 17.0, 24.0, 27.0, 11.0, 16.0, 12.0, 14.0]
2025-08-07 06:33:57,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 92/100 (estimated time remaining: 15 minutes, 1 second)
2025-08-07 06:35:36,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:35:37,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 22.02607 ± 44.564
2025-08-07 06:35:37,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [11.876709, 8.04199, 0.5269691, -0.93901235, 3.6542046, 154.13412, 4.6108174, 12.631253, 23.388618, 2.3350358]
2025-08-07 06:35:37,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [25.0, 24.0, 22.0, 32.0, 33.0, 103.0, 20.0, 23.0, 77.0, 13.0]
2025-08-07 06:35:37,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 93/100 (estimated time remaining: 13 minutes, 19 seconds)
2025-08-07 06:37:16,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:37:17,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 17.51721 ± 39.292
2025-08-07 06:37:17,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [0.80949384, 3.1647193, -2.1005728, 9.83254, 7.2311726, 3.356663, 134.3043, 17.101488, 0.16298711, 1.309323]
2025-08-07 06:37:17,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [12.0, 12.0, 11.0, 23.0, 20.0, 15.0, 211.0, 32.0, 28.0, 19.0]
2025-08-07 06:37:17,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 94/100 (estimated time remaining: 11 minutes, 39 seconds)
2025-08-07 06:38:58,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:38:59,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 28.73277 ± 74.376
2025-08-07 06:38:59,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [0.7625828, 4.6837225, 3.0761316, 0.7360361, 5.6444683, 2.167907, 14.195033, 251.5778, 1.4862046, 2.9978192]
2025-08-07 06:38:59,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [15.0, 16.0, 16.0, 25.0, 17.0, 21.0, 25.0, 127.0, 29.0, 14.0]
2025-08-07 06:38:59,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 2 seconds)
2025-08-07 06:40:38,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:40:38,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 3.90115 ± 4.245
2025-08-07 06:40:38,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [-2.9018078, 4.0867205, 4.528962, 2.4797494, 6.4956274, 5.085091, 2.285439, -2.947901, 9.383517, 10.516069]
2025-08-07 06:40:38,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [11.0, 17.0, 16.0, 28.0, 17.0, 21.0, 18.0, 17.0, 31.0, 20.0]
2025-08-07 06:40:38,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 22 seconds)
2025-08-07 06:42:18,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:42:18,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 42.13565 ± 112.103
2025-08-07 06:42:18,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [9.452316, 378.2452, -1.3522757, 4.236196, 2.9756777, 4.536068, -0.19321385, 9.339127, 10.972533, 3.1449423]
2025-08-07 06:42:18,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [20.0, 223.0, 10.0, 22.0, 19.0, 20.0, 13.0, 28.0, 32.0, 30.0]
2025-08-07 06:42:18,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 41 seconds)
2025-08-07 06:43:56,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:43:57,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 40.17027 ± 111.353
2025-08-07 06:43:57,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [8.688613, 15.601458, -1.5732055, -2.42254, 373.80493, 2.8739047, 3.1522458, 0.2689283, 5.8937893, -4.5854454]
2025-08-07 06:43:57,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [23.0, 27.0, 14.0, 16.0, 171.0, 27.0, 19.0, 17.0, 19.0, 11.0]
2025-08-07 06:43:57,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes)
2025-08-07 06:45:37,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:45:38,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 83.47929 ± 149.637
2025-08-07 06:45:38,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [0.5046336, -4.1927457, 420.13818, 7.279507, 5.4100566, 2.7133682, 54.98669, 10.265143, 0.27366358, 337.41446]
2025-08-07 06:45:38,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [32.0, 23.0, 197.0, 26.0, 18.0, 16.0, 110.0, 20.0, 12.0, 189.0]
2025-08-07 06:45:38,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 20 seconds)
2025-08-07 06:47:16,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:47:16,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 29.67345 ± 56.296
2025-08-07 06:47:16,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [3.7029464, 3.0025475, 138.51627, 145.78568, 2.2312956, -3.3162205, -0.52926785, 2.5473251, 2.0164049, 2.7774832]
2025-08-07 06:47:16,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [18.0, 14.0, 88.0, 110.0, 13.0, 19.0, 19.0, 18.0, 16.0, 29.0]
2025-08-07 06:47:17,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 39 seconds)
2025-08-07 06:48:56,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:48:57,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1221 [DEBUG]: Total Reward: 76.39236 ± 173.284
2025-08-07 06:48:57,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1222 [DEBUG]: All rewards: [6.1335983, 6.1685944, 70.02482, 6.065014, 11.601218, 2.0921397, 46.247536, 13.198141, 9.885204, 592.5074]
2025-08-07 06:48:57,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1223 [DEBUG]: All trajectory lengths: [32.0, 19.0, 152.0, 24.0, 28.0, 12.0, 90.0, 22.0, 24.0, 295.0]
2025-08-07 06:48:57,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-walker2d):1251 [DEBUG]: Training session finished
