2025-08-07 03:57:03,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc15-walker2d/ExtremeSparseL4U32-bpql-mem32
2025-08-07 03:57:03,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc15-walker2d/ExtremeSparseL4U32-bpql-mem32
2025-08-07 03:57:03,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x150037d0b590>}
2025-08-07 03:57:03,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1111 [DEBUG]: using device: cuda
2025-08-07 03:57:03,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1133 [INFO]: Creating new trainer
2025-08-07 03:57:03,472 baseline-bpql-noiseperc15-walker2d:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=209, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 03:57:03,472 baseline-bpql-noiseperc15-walker2d:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 03:57:05,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1194 [DEBUG]: Starting training session...
2025-08-07 03:57:05,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 1/100
2025-08-07 03:58:34,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:58:34,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 14.39452 ± 10.750
2025-08-07 03:58:34,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [6.6251826, 13.041771, 4.1507363, 19.206059, 10.824352, 30.851496, 7.7115936, 10.3190975, 4.1245193, 37.090393]
2025-08-07 03:58:34,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [22.0, 24.0, 18.0, 43.0, 29.0, 47.0, 23.0, 22.0, 27.0, 50.0]
2025-08-07 03:58:34,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1226 [INFO]: New best (14.39) for latency ExtremeSparseL4U32
2025-08-07 03:58:34,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 27 minutes, 49 seconds)
2025-08-07 04:00:12,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:00:13,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 33.83912 ± 93.606
2025-08-07 04:00:13,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [312.12326, 4.3786035, 10.128717, 13.695936, -33.568623, 5.1271086, 6.59798, 9.286185, 6.4050546, 4.2170186]
2025-08-07 04:00:13,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [253.0, 32.0, 23.0, 209.0, 163.0, 23.0, 17.0, 33.0, 28.0, 18.0]
2025-08-07 04:00:13,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1226 [INFO]: New best (33.84) for latency ExtremeSparseL4U32
2025-08-07 04:00:13,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 33 minutes, 54 seconds)
2025-08-07 04:01:50,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:01:51,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 12.21444 ± 11.919
2025-08-07 04:01:51,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [11.8320055, 6.552308, 14.492659, 3.9998915, 2.6167758, 46.265564, 10.6516285, 12.13301, 5.5481586, 8.0524]
2025-08-07 04:01:51,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [26.0, 16.0, 27.0, 17.0, 15.0, 275.0, 27.0, 28.0, 17.0, 31.0]
2025-08-07 04:01:51,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 34 minutes, 9 seconds)
2025-08-07 04:03:28,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:03:29,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 23.13964 ± 38.196
2025-08-07 04:03:29,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [17.779799, 57.707706, 121.09335, 24.944511, 6.9541035, -26.774668, 10.292962, 6.628391, 9.155089, 3.6152067]
2025-08-07 04:03:29,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [43.0, 209.0, 145.0, 170.0, 23.0, 118.0, 22.0, 18.0, 23.0, 30.0]
2025-08-07 04:03:29,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 33 minutes, 53 seconds)
2025-08-07 04:05:07,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:05:07,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 17.88440 ± 13.634
2025-08-07 04:05:07,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [8.852097, 46.777454, 11.483528, 8.896978, 21.938992, 13.935748, 7.979185, 40.954945, 9.659653, 8.365421]
2025-08-07 04:05:07,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [21.0, 79.0, 29.0, 20.0, 68.0, 28.0, 26.0, 104.0, 30.0, 36.0]
2025-08-07 04:05:07,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 32 minutes, 52 seconds)
2025-08-07 04:06:45,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:06:46,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 5.83736 ± 15.690
2025-08-07 04:06:46,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3.3679678, 8.911205, 2.1359043, 6.2024846, 18.345905, 8.013212, 13.629266, 29.153936, 3.539964, -34.92625]
2025-08-07 04:06:46,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [19.0, 18.0, 15.0, 18.0, 32.0, 20.0, 26.0, 81.0, 17.0, 172.0]
2025-08-07 04:06:46,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 34 minutes, 7 seconds)
2025-08-07 04:08:23,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:08:24,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 20.93903 ± 23.708
2025-08-07 04:08:24,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [31.927053, 79.33473, -5.890831, 6.576119, 44.081017, 14.75097, 2.4387028, 11.335789, 14.68547, 10.151284]
2025-08-07 04:08:24,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [109.0, 85.0, 138.0, 17.0, 80.0, 28.0, 16.0, 30.0, 24.0, 28.0]
2025-08-07 04:08:24,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 32 minutes, 7 seconds)
2025-08-07 04:10:02,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:10:03,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 29.37179 ± 33.139
2025-08-07 04:10:03,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [8.534877, 77.67222, 12.81494, 12.163705, 2.7285542, 8.836183, 97.36333, 58.928127, 12.125466, 2.550504]
2025-08-07 04:10:03,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [24.0, 188.0, 32.0, 123.0, 16.0, 24.0, 81.0, 132.0, 24.0, 17.0]
2025-08-07 04:10:03,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 30 minutes, 51 seconds)
2025-08-07 04:11:39,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:11:39,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 17.74405 ± 14.895
2025-08-07 04:11:39,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [9.74295, 22.230673, 5.896245, 5.602471, 37.419865, 13.155234, 8.006979, 14.987133, 52.84524, 7.553745]
2025-08-07 04:11:39,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [22.0, 76.0, 18.0, 22.0, 45.0, 27.0, 27.0, 24.0, 56.0, 22.0]
2025-08-07 04:11:40,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 28 minutes, 42 seconds)
2025-08-07 04:13:16,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:13:16,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 13.20246 ± 19.929
2025-08-07 04:13:16,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [11.395146, 7.498466, 6.214556, 1.91264, 2.9333918, 12.597467, 8.878986, 72.06826, 6.5235105, 2.0021236]
2025-08-07 04:13:16,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [32.0, 18.0, 18.0, 17.0, 27.0, 26.0, 20.0, 90.0, 19.0, 20.0]
2025-08-07 04:13:16,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 26 minutes, 36 seconds)
2025-08-07 04:14:52,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:14:53,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 32.09827 ± 34.030
2025-08-07 04:14:53,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [90.95665, 6.1530356, 67.89793, 3.0547738, 89.91255, 7.95946, 9.685232, 14.440298, 11.916021, 19.006712]
2025-08-07 04:14:53,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [168.0, 19.0, 161.0, 15.0, 101.0, 21.0, 20.0, 28.0, 33.0, 50.0]
2025-08-07 04:14:53,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 24 minutes, 34 seconds)
2025-08-07 04:16:29,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:16:30,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 17.69591 ± 18.201
2025-08-07 04:16:30,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [56.748108, 7.8488545, -0.90081096, 14.090813, 9.9095545, 12.016363, 3.1589663, 48.273613, 17.743319, 8.070344]
2025-08-07 04:16:30,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [153.0, 19.0, 26.0, 24.0, 20.0, 27.0, 29.0, 123.0, 28.0, 21.0]
2025-08-07 04:16:30,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 22 minutes, 40 seconds)
2025-08-07 04:18:06,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:18:07,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 18.83020 ± 14.379
2025-08-07 04:18:07,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [42.39863, 5.0321636, 6.246564, 14.652139, 10.694029, 43.496277, 5.272533, 9.866207, 17.486214, 33.157234]
2025-08-07 04:18:07,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [56.0, 18.0, 19.0, 30.0, 26.0, 72.0, 18.0, 43.0, 43.0, 111.0]
2025-08-07 04:18:07,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 20 minutes, 27 seconds)
2025-08-07 04:19:42,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:19:43,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 17.72960 ± 18.602
2025-08-07 04:19:43,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [16.426582, 4.4960136, 18.116705, 64.54769, 4.365956, 5.1192956, 9.656742, 4.510187, 39.297447, 10.759354]
2025-08-07 04:19:43,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [32.0, 19.0, 32.0, 120.0, 21.0, 16.0, 24.0, 16.0, 127.0, 34.0]
2025-08-07 04:19:43,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 18 minutes, 36 seconds)
2025-08-07 04:21:19,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:21:19,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 13.48299 ± 9.262
2025-08-07 04:21:19,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [8.581193, 7.108905, 3.0321617, 35.52993, 15.971371, 3.4658482, 17.8777, 14.220414, 20.4128, 8.629567]
2025-08-07 04:21:19,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [19.0, 17.0, 17.0, 55.0, 28.0, 16.0, 31.0, 41.0, 44.0, 24.0]
2025-08-07 04:21:19,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 16 minutes, 52 seconds)
2025-08-07 04:22:55,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:22:55,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 9.69899 ± 3.699
2025-08-07 04:22:55,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [10.066669, 8.140261, 15.610346, 14.753808, 8.497019, 10.819687, 2.7945063, 5.709514, 12.134863, 8.4632]
2025-08-07 04:22:55,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [20.0, 21.0, 43.0, 27.0, 23.0, 28.0, 18.0, 24.0, 31.0, 25.0]
2025-08-07 04:22:55,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 14 minutes, 50 seconds)
2025-08-07 04:24:31,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:24:32,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 65.21793 ± 90.735
2025-08-07 04:24:32,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [2.3063285, 13.107775, 4.266324, 246.82645, 9.770354, 182.15999, 8.660901, 3.490953, 7.454886, 174.13536]
2025-08-07 04:24:32,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [18.0, 28.0, 16.0, 156.0, 31.0, 123.0, 31.0, 14.0, 20.0, 123.0]
2025-08-07 04:24:32,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1226 [INFO]: New best (65.22) for latency ExtremeSparseL4U32
2025-08-07 04:24:32,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 13 minutes, 12 seconds)
2025-08-07 04:26:07,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:26:08,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 76.15372 ± 93.316
2025-08-07 04:26:08,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [181.99532, 55.08577, 281.90143, 5.6775255, 6.9498296, 165.89513, -0.42081434, 38.86164, 10.825689, 14.765677]
2025-08-07 04:26:08,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [97.0, 85.0, 175.0, 19.0, 23.0, 126.0, 23.0, 76.0, 27.0, 31.0]
2025-08-07 04:26:08,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1226 [INFO]: New best (76.15) for latency ExtremeSparseL4U32
2025-08-07 04:26:08,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 11 minutes, 38 seconds)
2025-08-07 04:27:45,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:27:46,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 31.12066 ± 38.092
2025-08-07 04:27:46,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [14.592699, 4.1426945, 122.55273, 9.861931, 33.82549, 11.420171, 11.564398, 85.29523, 7.6525893, 10.29866]
2025-08-07 04:27:46,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [28.0, 15.0, 92.0, 27.0, 167.0, 27.0, 24.0, 100.0, 25.0, 20.0]
2025-08-07 04:27:46,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 10 minutes, 23 seconds)
2025-08-07 04:29:21,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:29:22,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 23.68415 ± 39.674
2025-08-07 04:29:22,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [19.637106, 4.8900204, 14.594281, 12.267456, 141.9231, 10.197308, 8.058947, 3.9851863, 14.043738, 7.2443585]
2025-08-07 04:29:22,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [30.0, 22.0, 26.0, 33.0, 132.0, 20.0, 18.0, 21.0, 25.0, 16.0]
2025-08-07 04:29:22,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 8 minutes, 45 seconds)
2025-08-07 04:30:59,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:31:00,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 83.04647 ± 88.868
2025-08-07 04:31:00,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [18.131193, 4.242742, 175.11595, 10.226074, 12.693492, 212.67702, 11.8355665, 8.961962, 200.665, 175.91571]
2025-08-07 04:31:00,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [114.0, 17.0, 128.0, 27.0, 23.0, 147.0, 30.0, 27.0, 146.0, 111.0]
2025-08-07 04:31:00,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1226 [INFO]: New best (83.05) for latency ExtremeSparseL4U32
2025-08-07 04:31:00,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 7 minutes, 42 seconds)
2025-08-07 04:32:36,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:32:36,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 7.91685 ± 3.862
2025-08-07 04:32:36,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [16.35728, 7.9256763, 2.5380876, 3.438144, 6.597834, 10.918524, 9.458893, 9.337936, 4.2287655, 8.36737]
2025-08-07 04:32:36,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [27.0, 26.0, 14.0, 26.0, 18.0, 26.0, 20.0, 22.0, 16.0, 23.0]
2025-08-07 04:32:36,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 6 minutes, 4 seconds)
2025-08-07 04:34:12,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:34:13,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 28.89917 ± 48.592
2025-08-07 04:34:13,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [9.430399, 2.691658, 4.3945947, 16.28302, 169.59914, 7.7956667, 46.543438, 23.098078, 7.4089713, 1.7467319]
2025-08-07 04:34:13,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [24.0, 15.0, 25.0, 30.0, 101.0, 25.0, 127.0, 31.0, 19.0, 14.0]
2025-08-07 04:34:13,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 4 minutes, 21 seconds)
2025-08-07 04:35:50,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:35:51,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 35.42484 ± 47.365
2025-08-07 04:35:51,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [42.006035, 5.0583386, 150.16798, 4.051024, 10.906653, 17.251425, 5.062322, 13.908278, 99.76939, 6.0669737]
2025-08-07 04:35:51,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [115.0, 25.0, 113.0, 20.0, 30.0, 30.0, 15.0, 28.0, 117.0, 19.0]
2025-08-07 04:35:51,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 2 minutes, 53 seconds)
2025-08-07 04:37:27,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:37:28,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 42.76910 ± 72.262
2025-08-07 04:37:28,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [10.8079195, 6.921252, 164.71948, 12.007921, 2.4747047, 8.452217, 207.04839, 6.192897, -0.087711744, 9.153957]
2025-08-07 04:37:28,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [29.0, 24.0, 105.0, 26.0, 18.0, 25.0, 100.0, 19.0, 19.0, 29.0]
2025-08-07 04:37:28,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 1 minute, 32 seconds)
2025-08-07 04:39:04,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:39:05,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 73.39996 ± 132.521
2025-08-07 04:39:05,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [13.061288, 411.1563, 10.641078, 11.6159935, 12.3862295, 4.729866, 16.567474, 243.94762, 2.24764, 7.646199]
2025-08-07 04:39:05,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [28.0, 218.0, 30.0, 23.0, 27.0, 27.0, 31.0, 138.0, 13.0, 27.0]
2025-08-07 04:39:05,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 27/100 (estimated time remaining: 1 hour, 59 minutes, 43 seconds)
2025-08-07 04:40:41,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:40:42,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 65.81050 ± 130.777
2025-08-07 04:40:42,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [428.05484, 11.825305, 6.1025076, 174.97818, 3.4581294, 3.2051275, 11.283156, 10.972873, 5.9495554, 2.2752833]
2025-08-07 04:40:42,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [222.0, 32.0, 17.0, 109.0, 19.0, 21.0, 24.0, 26.0, 18.0, 12.0]
2025-08-07 04:40:42,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 28/100 (estimated time remaining: 1 hour, 58 minutes, 10 seconds)
2025-08-07 04:42:20,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:42:21,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 93.74136 ± 90.527
2025-08-07 04:42:21,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [202.34593, 5.050952, 3.2103093, 7.48213, 6.5479674, 180.1824, 164.34903, 127.29241, 10.84196, 230.11047]
2025-08-07 04:42:21,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [115.0, 27.0, 25.0, 25.0, 23.0, 99.0, 202.0, 175.0, 21.0, 164.0]
2025-08-07 04:42:21,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1226 [INFO]: New best (93.74) for latency ExtremeSparseL4U32
2025-08-07 04:42:21,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 29/100 (estimated time remaining: 1 hour, 57 minutes, 4 seconds)
2025-08-07 04:43:59,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:44:00,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 50.98476 ± 124.497
2025-08-07 04:44:00,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [14.613366, 7.133063, 9.227885, 424.26788, 3.4613485, 4.8426995, 12.632987, 17.66643, 7.142618, 8.859397]
2025-08-07 04:44:00,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [32.0, 18.0, 20.0, 190.0, 14.0, 23.0, 29.0, 25.0, 19.0, 21.0]
2025-08-07 04:44:00,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 30/100 (estimated time remaining: 1 hour, 55 minutes, 39 seconds)
2025-08-07 04:45:34,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:45:35,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 51.07206 ± 82.972
2025-08-07 04:45:35,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [10.86575, 12.197765, 12.344358, 201.63414, 14.260751, 11.731184, 231.06882, 8.868384, 1.7202785, 6.0292077]
2025-08-07 04:45:35,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [23.0, 23.0, 34.0, 104.0, 33.0, 29.0, 132.0, 24.0, 20.0, 23.0]
2025-08-07 04:45:35,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 53 minutes, 37 seconds)
2025-08-07 04:47:13,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:47:14,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 96.45671 ± 135.981
2025-08-07 04:47:14,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [9.4029255, 4.6146374, 18.334007, 373.85486, 278.07938, 244.95184, 6.6651225, 12.506917, 1.9938209, 14.163553]
2025-08-07 04:47:14,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [27.0, 17.0, 30.0, 310.0, 166.0, 212.0, 17.0, 27.0, 15.0, 28.0]
2025-08-07 04:47:14,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1226 [INFO]: New best (96.46) for latency ExtremeSparseL4U32
2025-08-07 04:47:14,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 52 minutes, 24 seconds)
2025-08-07 04:48:50,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:48:51,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 64.59521 ± 95.371
2025-08-07 04:48:51,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [-1.0180689, 7.626869, 9.902296, 7.3191633, 5.2317567, 8.000046, 6.19365, 130.48499, 192.38782, 279.82367]
2025-08-07 04:48:51,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [11.0, 28.0, 27.0, 17.0, 16.0, 21.0, 19.0, 103.0, 117.0, 138.0]
2025-08-07 04:48:51,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 50 minutes, 44 seconds)
2025-08-07 04:50:29,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:50:30,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 100.84196 ± 102.087
2025-08-07 04:50:30,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1.037626, 7.7905226, 188.42352, 109.46144, 2.4415023, 241.87088, 180.79466, 11.616991, 6.150849, 258.83157]
2025-08-07 04:50:30,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [13.0, 23.0, 109.0, 185.0, 17.0, 140.0, 201.0, 22.0, 22.0, 133.0]
2025-08-07 04:50:30,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1226 [INFO]: New best (100.84) for latency ExtremeSparseL4U32
2025-08-07 04:50:30,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 49 minutes, 15 seconds)
2025-08-07 04:52:06,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:52:07,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 43.78497 ± 74.930
2025-08-07 04:52:07,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [78.8801, 253.33647, 6.6759953, 1.7602713, 11.515768, 1.2691997, 4.84923, 2.72959, 8.827118, 68.00599]
2025-08-07 04:52:07,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [128.0, 146.0, 17.0, 14.0, 25.0, 15.0, 30.0, 16.0, 28.0, 94.0]
2025-08-07 04:52:07,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 47 minutes, 14 seconds)
2025-08-07 04:53:44,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:53:45,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 60.84044 ± 113.397
2025-08-07 04:53:45,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [7.817381, 172.76665, 15.765114, 6.0557704, 14.032385, 6.336053, 367.50323, 6.6828594, 7.152633, 4.2922287]
2025-08-07 04:53:45,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [21.0, 104.0, 28.0, 24.0, 30.0, 31.0, 216.0, 19.0, 20.0, 19.0]
2025-08-07 04:53:45,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 46 minutes, 7 seconds)
2025-08-07 04:55:20,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:55:21,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 45.34601 ± 76.309
2025-08-07 04:55:21,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [19.189701, 8.601888, 165.22374, 2.4825292, 5.2008343, 3.277197, 4.85879, 225.36292, 7.332139, 11.930341]
2025-08-07 04:55:21,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [31.0, 29.0, 103.0, 15.0, 18.0, 25.0, 22.0, 127.0, 20.0, 24.0]
2025-08-07 04:55:21,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 43 minutes, 50 seconds)
2025-08-07 04:56:57,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:56:58,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 69.79283 ± 84.147
2025-08-07 04:56:58,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [9.484928, 68.21694, 12.671621, 5.1341825, 3.9173534, 4.6584573, 193.76996, 163.59508, 223.78206, 12.697782]
2025-08-07 04:56:58,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [29.0, 128.0, 26.0, 17.0, 15.0, 17.0, 113.0, 101.0, 149.0, 30.0]
2025-08-07 04:56:58,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 42 minutes, 24 seconds)
2025-08-07 04:58:35,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:58:36,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 77.29156 ± 153.435
2025-08-07 04:58:36,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [8.561088, 209.78404, 6.06021, 1.6760716, 5.2190824, 12.284742, 5.7114305, 500.54266, 8.344124, 14.732127]
2025-08-07 04:58:36,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [21.0, 109.0, 18.0, 17.0, 18.0, 25.0, 19.0, 272.0, 32.0, 26.0]
2025-08-07 04:58:36,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 40 minutes, 21 seconds)
2025-08-07 05:00:12,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:00:13,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 112.63281 ± 189.862
2025-08-07 05:00:13,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [2.827938, 5.6669297, 112.19826, 16.761131, 3.5849297, 8.733674, 4.285718, 539.0749, 4.5969105, 428.5978]
2025-08-07 05:00:13,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [28.0, 18.0, 91.0, 31.0, 16.0, 28.0, 20.0, 265.0, 19.0, 281.0]
2025-08-07 05:00:13,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1226 [INFO]: New best (112.63) for latency ExtremeSparseL4U32
2025-08-07 05:00:13,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 38 minutes, 50 seconds)
2025-08-07 05:01:50,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:01:51,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 95.63706 ± 182.617
2025-08-07 05:01:51,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [367.9312, 10.8368225, 4.318709, 7.8782325, 7.7454247, 3.406025, 537.7893, 7.0472693, 5.224906, 4.19273]
2025-08-07 05:01:51,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [314.0, 22.0, 18.0, 23.0, 21.0, 15.0, 274.0, 30.0, 21.0, 15.0]
2025-08-07 05:01:51,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 37 minutes, 18 seconds)
2025-08-07 05:03:29,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:03:30,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 55.14614 ± 80.317
2025-08-07 05:03:30,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [133.73216, 1.8775454, 5.856451, 9.495265, 12.343143, 50.395782, 53.044872, 266.91226, 9.614211, 8.18971]
2025-08-07 05:03:30,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [89.0, 14.0, 20.0, 20.0, 28.0, 99.0, 96.0, 145.0, 25.0, 28.0]
2025-08-07 05:03:30,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 36 minutes, 12 seconds)
2025-08-07 05:05:06,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:05:07,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 145.88651 ± 173.492
2025-08-07 05:05:07,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [10.588358, 375.98526, 332.21814, 13.196531, 7.5168123, 9.212671, 456.09357, 14.337674, 8.716497, 230.99942]
2025-08-07 05:05:07,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [30.0, 234.0, 178.0, 32.0, 17.0, 29.0, 254.0, 25.0, 26.0, 203.0]
2025-08-07 05:05:07,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1226 [INFO]: New best (145.89) for latency ExtremeSparseL4U32
2025-08-07 05:05:07,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 34 minutes, 29 seconds)
2025-08-07 05:06:46,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:06:47,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 175.22510 ± 182.274
2025-08-07 05:06:47,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [2.3676064, 385.26315, 12.454751, 217.35612, 11.816635, 521.4833, 261.0176, 320.99615, 9.951801, 9.543874]
2025-08-07 05:06:47,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [16.0, 223.0, 29.0, 106.0, 24.0, 362.0, 135.0, 154.0, 27.0, 21.0]
2025-08-07 05:06:47,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1226 [INFO]: New best (175.23) for latency ExtremeSparseL4U32
2025-08-07 05:06:47,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 33 minutes, 25 seconds)
2025-08-07 05:08:23,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:08:24,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 96.93607 ± 182.758
2025-08-07 05:08:24,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [7.8938046, 10.959717, 9.052181, 8.608855, 3.23566, 529.5198, 5.940237, 7.617408, 2.9599392, 383.5731]
2025-08-07 05:08:24,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [30.0, 30.0, 32.0, 31.0, 32.0, 340.0, 18.0, 25.0, 18.0, 187.0]
2025-08-07 05:08:24,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 31 minutes, 32 seconds)
2025-08-07 05:10:00,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:10:00,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 73.38423 ± 133.663
2025-08-07 05:10:00,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [6.1323576, 284.0295, 6.287966, 388.7791, 14.104158, 7.2571516, 9.163247, -0.6853523, 16.421036, 2.35317]
2025-08-07 05:10:00,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [20.0, 146.0, 19.0, 189.0, 26.0, 20.0, 19.0, 23.0, 28.0, 22.0]
2025-08-07 05:10:01,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 29 minutes, 40 seconds)
2025-08-07 05:11:39,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:11:40,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 138.07523 ± 147.205
2025-08-07 05:11:40,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [237.76285, 5.674447, 7.7967615, 191.59955, 6.862158, 259.39026, -3.0378602, 225.15103, 442.6365, 6.916538]
2025-08-07 05:11:40,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [136.0, 20.0, 18.0, 197.0, 19.0, 130.0, 17.0, 134.0, 249.0, 20.0]
2025-08-07 05:11:40,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 28 minutes, 13 seconds)
2025-08-07 05:13:21,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:13:22,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 99.24313 ± 151.021
2025-08-07 05:13:22,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [5.762517, 3.1797352, 7.4337106, 10.183763, 452.23105, 4.84101, 284.91095, 200.48395, 2.05243, 21.352213]
2025-08-07 05:13:22,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [22.0, 15.0, 18.0, 35.0, 272.0, 27.0, 164.0, 153.0, 23.0, 31.0]
2025-08-07 05:13:22,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 27 minutes, 22 seconds)
2025-08-07 05:14:53,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:14:54,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 55.81968 ± 119.846
2025-08-07 05:14:54,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [9.604212, 63.66646, 14.116832, 412.0167, 8.14811, 3.5996873, 15.298047, 13.648179, 12.476746, 5.621897]
2025-08-07 05:14:54,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [24.0, 91.0, 25.0, 361.0, 21.0, 24.0, 31.0, 24.0, 31.0, 31.0]
2025-08-07 05:14:54,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 24 minutes, 24 seconds)
2025-08-07 05:16:32,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:16:34,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 107.53082 ± 139.503
2025-08-07 05:16:34,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [431.7533, 211.36696, 3.9556007, 247.4285, 131.73082, 17.472672, 8.617322, 6.0598826, 9.626353, 7.2969046]
2025-08-07 05:16:34,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [281.0, 197.0, 17.0, 135.0, 95.0, 29.0, 18.0, 30.0, 21.0, 18.0]
2025-08-07 05:16:34,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 23 minutes, 17 seconds)
2025-08-07 05:18:13,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:18:14,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 113.87056 ± 127.785
2025-08-07 05:18:14,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [264.92618, 248.13875, 13.103679, 217.22469, 336.1945, 6.828118, 8.22158, 14.544715, 9.709633, 19.813766]
2025-08-07 05:18:14,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [127.0, 129.0, 24.0, 154.0, 180.0, 18.0, 20.0, 28.0, 20.0, 31.0]
2025-08-07 05:18:14,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 22 minutes, 18 seconds)
2025-08-07 05:19:53,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:19:55,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 184.39565 ± 200.274
2025-08-07 05:19:55,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [337.4131, 10.652716, 7.8238974, 10.76945, 9.938996, 227.13014, 541.39166, 207.25163, 2.4790251, 489.10583]
2025-08-07 05:19:55,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [184.0, 27.0, 22.0, 30.0, 25.0, 122.0, 284.0, 114.0, 13.0, 232.0]
2025-08-07 05:19:55,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1226 [INFO]: New best (184.40) for latency ExtremeSparseL4U32
2025-08-07 05:19:55,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 20 minutes, 46 seconds)
2025-08-07 05:21:32,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:21:33,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 128.16405 ± 148.426
2025-08-07 05:21:33,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [274.8164, 282.4837, 15.483216, 3.464348, 15.351355, 1.6166117, 285.95203, 9.098294, 10.234555, 383.13992]
2025-08-07 05:21:33,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [147.0, 135.0, 25.0, 21.0, 27.0, 29.0, 152.0, 27.0, 25.0, 197.0]
2025-08-07 05:21:33,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 18 minutes, 38 seconds)
2025-08-07 05:23:12,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:23:13,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 83.90388 ± 155.645
2025-08-07 05:23:13,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [19.65874, 449.46927, 3.7361834, 331.69318, 6.3879733, 3.6390815, 4.772886, 7.420345, 5.349316, 6.9118476]
2025-08-07 05:23:13,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [30.0, 252.0, 16.0, 147.0, 24.0, 17.0, 15.0, 24.0, 16.0, 18.0]
2025-08-07 05:23:13,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 18 minutes, 9 seconds)
2025-08-07 05:24:53,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:24:54,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 122.38418 ± 123.013
2025-08-07 05:24:54,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [8.987966, 243.43202, 4.852724, 266.39224, 20.21959, 5.100754, 11.651624, 312.59595, 101.067375, 249.54167]
2025-08-07 05:24:54,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [27.0, 141.0, 17.0, 137.0, 32.0, 21.0, 25.0, 198.0, 147.0, 130.0]
2025-08-07 05:24:54,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 16 minutes, 42 seconds)
2025-08-07 05:26:34,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:26:35,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 16.20803 ± 28.626
2025-08-07 05:26:35,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [8.136038, 2.8226824, 1.497491, 7.299041, 7.306098, 101.39399, 5.8961496, 15.8972, 5.643028, 6.1886334]
2025-08-07 05:26:35,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [30.0, 21.0, 13.0, 18.0, 29.0, 192.0, 23.0, 27.0, 18.0, 22.0]
2025-08-07 05:26:35,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 15 minutes, 1 second)
2025-08-07 05:28:13,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:28:14,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 89.70221 ± 159.378
2025-08-07 05:28:14,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [20.70966, 6.9975333, 455.6384, 354.52957, 4.491378, 9.524646, 18.944094, 7.0153785, 9.46029, 9.711145]
2025-08-07 05:28:14,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [32.0, 19.0, 236.0, 213.0, 26.0, 20.0, 32.0, 18.0, 33.0, 24.0]
2025-08-07 05:28:14,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 13 minutes, 17 seconds)
2025-08-07 05:29:53,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:29:55,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 96.08552 ± 197.952
2025-08-07 05:29:55,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [7.972389, 33.41572, 8.528539, 9.372645, 669.2691, 12.793431, 185.30641, 10.727918, 14.891617, 8.577326]
2025-08-07 05:29:55,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [29.0, 118.0, 20.0, 24.0, 393.0, 31.0, 110.0, 23.0, 32.0, 31.0]
2025-08-07 05:29:55,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 11 minutes, 52 seconds)
2025-08-07 05:31:35,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:31:36,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 126.13971 ± 181.744
2025-08-07 05:31:36,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [351.21585, 11.7606945, 14.536923, 8.607024, 320.5701, 1.3450128, 17.341852, 12.045084, 512.14575, 11.828809]
2025-08-07 05:31:36,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [170.0, 26.0, 28.0, 30.0, 140.0, 12.0, 28.0, 33.0, 261.0, 22.0]
2025-08-07 05:31:36,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 10 minutes, 21 seconds)
2025-08-07 05:33:14,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:33:14,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 46.10758 ± 85.651
2025-08-07 05:33:14,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [9.607651, 12.334815, 5.559285, 290.66385, 10.673974, 95.93273, 19.992117, 9.106567, -0.46235907, 7.6671724]
2025-08-07 05:33:14,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [20.0, 25.0, 24.0, 138.0, 27.0, 74.0, 30.0, 23.0, 13.0, 20.0]
2025-08-07 05:33:15,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 8 minutes, 24 seconds)
2025-08-07 05:34:54,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:34:56,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 123.08752 ± 199.002
2025-08-07 05:34:56,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [10.751479, 9.504528, 6.92018, 10.143323, 11.309805, 164.28545, 424.2187, 583.0994, 8.829222, 1.8132522]
2025-08-07 05:34:56,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [24.0, 22.0, 22.0, 27.0, 21.0, 98.0, 249.0, 271.0, 26.0, 14.0]
2025-08-07 05:34:56,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 6 minutes, 49 seconds)
2025-08-07 05:36:34,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:36:35,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 96.84499 ± 177.839
2025-08-07 05:36:35,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [0.7436101, 2.8693693, 21.147577, 494.2596, 8.506493, 6.233232, 9.712836, 406.10925, 8.80184, 10.066189]
2025-08-07 05:36:35,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [27.0, 18.0, 33.0, 257.0, 21.0, 21.0, 32.0, 213.0, 29.0, 25.0]
2025-08-07 05:36:35,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 5 minutes, 2 seconds)
2025-08-07 05:38:15,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:38:16,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 148.25648 ± 174.959
2025-08-07 05:38:16,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [483.7624, 12.881728, 13.732356, 290.0893, 4.556816, 19.60494, 363.66064, 272.90515, 13.506486, 7.8649526]
2025-08-07 05:38:16,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [220.0, 31.0, 33.0, 148.0, 19.0, 31.0, 156.0, 128.0, 26.0, 33.0]
2025-08-07 05:38:16,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 3 minutes, 31 seconds)
2025-08-07 05:39:56,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:39:57,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 9.51934 ± 4.120
2025-08-07 05:39:57,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [14.251387, 12.893011, 8.810326, 4.1288514, 9.086629, 10.2371855, 16.034939, 6.278249, 11.054623, 2.4181993]
2025-08-07 05:39:57,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [27.0, 24.0, 29.0, 15.0, 33.0, 23.0, 24.0, 23.0, 29.0, 29.0]
2025-08-07 05:39:57,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 1 minute, 47 seconds)
2025-08-07 05:41:34,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:41:34,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 35.64990 ± 73.770
2025-08-07 05:41:34,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [14.508379, 12.098732, 5.362366, 6.43919, 11.351188, 7.918873, 18.664898, 10.771388, 256.68155, 12.702442]
2025-08-07 05:41:34,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [28.0, 22.0, 21.0, 24.0, 27.0, 21.0, 28.0, 24.0, 148.0, 33.0]
2025-08-07 05:41:34,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 65/100 (estimated time remaining: 59 minutes, 59 seconds)
2025-08-07 05:43:14,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:43:16,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 227.29944 ± 282.210
2025-08-07 05:43:16,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [13.763797, 12.738565, 15.425022, 5.8228483, 681.2993, 650.5787, 584.08417, 8.773937, 289.25677, 11.251272]
2025-08-07 05:43:16,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [26.0, 31.0, 27.0, 17.0, 359.0, 347.0, 290.0, 27.0, 154.0, 27.0]
2025-08-07 05:43:16,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1226 [INFO]: New best (227.30) for latency ExtremeSparseL4U32
2025-08-07 05:43:16,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 66/100 (estimated time remaining: 58 minutes, 20 seconds)
2025-08-07 05:44:55,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:44:56,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 117.39565 ± 139.908
2025-08-07 05:44:56,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [362.57172, 16.721941, 7.6047783, 311.8767, 1.9932212, 180.08043, 9.886057, 10.439382, 268.02155, 4.7607865]
2025-08-07 05:44:56,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [164.0, 26.0, 37.0, 151.0, 14.0, 125.0, 21.0, 28.0, 143.0, 16.0]
2025-08-07 05:44:56,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 67/100 (estimated time remaining: 56 minutes, 50 seconds)
2025-08-07 05:46:36,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:46:37,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 23.55333 ± 44.347
2025-08-07 05:46:37,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [6.4807076, 155.22823, -0.50050986, 12.418104, 12.49134, 2.006795, 3.6753478, 16.212532, 6.467086, 21.053661]
2025-08-07 05:46:37,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [24.0, 222.0, 30.0, 31.0, 25.0, 16.0, 16.0, 28.0, 29.0, 32.0]
2025-08-07 05:46:37,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 68/100 (estimated time remaining: 55 minutes, 5 seconds)
2025-08-07 05:48:16,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:48:17,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 128.57625 ± 236.976
2025-08-07 05:48:17,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [11.565552, 777.32104, 2.6786335, 4.2346315, 138.89207, 8.572806, 320.8533, 5.188669, 12.2898655, 4.165841]
2025-08-07 05:48:17,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [27.0, 353.0, 17.0, 24.0, 155.0, 18.0, 183.0, 17.0, 24.0, 32.0]
2025-08-07 05:48:17,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 69/100 (estimated time remaining: 53 minutes, 23 seconds)
2025-08-07 05:49:57,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:49:57,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 26.13575 ± 59.614
2025-08-07 05:49:57,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1.4673622, 204.83263, 7.2544684, 6.7202816, 4.495598, 9.202816, 4.3590784, 10.082657, 7.7279496, 5.214644]
2025-08-07 05:49:57,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [15.0, 120.0, 32.0, 19.0, 23.0, 23.0, 22.0, 21.0, 19.0, 21.0]
2025-08-07 05:49:57,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 70/100 (estimated time remaining: 51 minutes, 58 seconds)
2025-08-07 05:51:39,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:51:40,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 181.65709 ± 175.273
2025-08-07 05:51:40,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [5.1126504, 292.97046, 13.008184, 8.263739, 363.17184, 352.4585, 349.91226, 4.899096, 11.579464, 415.1947]
2025-08-07 05:51:40,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [23.0, 144.0, 25.0, 29.0, 162.0, 192.0, 199.0, 18.0, 28.0, 209.0]
2025-08-07 05:51:40,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 71/100 (estimated time remaining: 50 minutes, 26 seconds)
2025-08-07 05:53:18,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:53:19,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 99.07593 ± 183.672
2025-08-07 05:53:19,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [5.2662196, 14.736996, 477.12793, 455.29523, 1.8051518, 6.0711746, 7.406096, 3.177159, 6.6242495, 13.249035]
2025-08-07 05:53:19,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [21.0, 32.0, 231.0, 229.0, 13.0, 20.0, 20.0, 17.0, 20.0, 25.0]
2025-08-07 05:53:19,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 72/100 (estimated time remaining: 48 minutes, 34 seconds)
2025-08-07 05:54:59,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:55:01,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 170.50650 ± 252.511
2025-08-07 05:55:01,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [411.18658, 183.61902, 8.453861, 232.54779, 18.936043, 13.528366, 818.97845, 9.191867, 1.8690901, 6.7537794]
2025-08-07 05:55:01,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [186.0, 327.0, 31.0, 268.0, 30.0, 27.0, 378.0, 29.0, 18.0, 32.0]
2025-08-07 05:55:01,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 73/100 (estimated time remaining: 47 minutes, 2 seconds)
2025-08-07 05:56:39,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:56:41,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 124.11401 ± 163.048
2025-08-07 05:56:41,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [292.3141, 397.06207, 92.6348, 5.4606495, 7.5470233, 9.275085, 11.966606, 0.37169993, 409.46466, 15.043415]
2025-08-07 05:56:41,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [160.0, 185.0, 240.0, 19.0, 26.0, 18.0, 23.0, 18.0, 176.0, 30.0]
2025-08-07 05:56:41,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 74/100 (estimated time remaining: 45 minutes, 18 seconds)
2025-08-07 05:58:23,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:58:24,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 100.78479 ± 204.942
2025-08-07 05:58:24,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [8.142598, 14.403528, 9.316266, 17.993504, 38.195984, 11.122743, 3.3583946, 690.9563, 204.42798, 9.930584]
2025-08-07 05:58:24,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [26.0, 26.0, 23.0, 33.0, 80.0, 26.0, 24.0, 365.0, 177.0, 22.0]
2025-08-07 05:58:24,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 75/100 (estimated time remaining: 43 minutes, 54 seconds)
2025-08-07 06:00:00,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:00:01,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 43.82512 ± 106.314
2025-08-07 06:00:01,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [15.058809, 12.181845, 2.727495, 362.3618, 15.0894575, 15.928806, 4.655987, 5.3718877, 2.9647424, 1.9103562]
2025-08-07 06:00:01,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [26.0, 31.0, 22.0, 177.0, 31.0, 29.0, 25.0, 16.0, 18.0, 24.0]
2025-08-07 06:00:01,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 76/100 (estimated time remaining: 41 minutes, 43 seconds)
2025-08-07 06:01:40,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:01:43,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 241.85376 ± 358.475
2025-08-07 06:01:43,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [9.79399, 13.200161, 749.7208, 3.8137121, 1074.8108, 278.31216, 4.401452, 5.1027594, 5.7901154, 273.5918]
2025-08-07 06:01:43,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [29.0, 29.0, 382.0, 28.0, 557.0, 210.0, 16.0, 21.0, 21.0, 133.0]
2025-08-07 06:01:43,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1226 [INFO]: New best (241.85) for latency ExtremeSparseL4U32
2025-08-07 06:01:43,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 77/100 (estimated time remaining: 40 minutes, 17 seconds)
2025-08-07 06:03:22,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:03:23,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 160.90738 ± 212.476
2025-08-07 06:03:23,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [7.886117, 11.188274, 10.377331, 8.237204, 2.3998892, 599.113, 406.35248, 5.2858663, 401.8059, 156.42775]
2025-08-07 06:03:23,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [29.0, 36.0, 30.0, 19.0, 26.0, 355.0, 223.0, 16.0, 187.0, 89.0]
2025-08-07 06:03:23,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 78/100 (estimated time remaining: 38 minutes, 30 seconds)
2025-08-07 06:05:03,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:05:05,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 168.08862 ± 181.076
2025-08-07 06:05:05,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [467.2842, 15.000071, 7.193617, 438.3305, 9.983091, 207.41414, 9.548909, 146.7357, 12.295747, 367.10028]
2025-08-07 06:05:05,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [216.0, 28.0, 24.0, 193.0, 28.0, 114.0, 22.0, 94.0, 23.0, 157.0]
2025-08-07 06:05:05,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 79/100 (estimated time remaining: 36 minutes, 58 seconds)
2025-08-07 06:06:43,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:06:44,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 131.76099 ± 190.362
2025-08-07 06:06:44,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [9.799889, 3.4515443, 360.72754, 8.303999, 8.441291, 475.59808, 6.8995843, 10.702263, 10.423663, 423.2621]
2025-08-07 06:06:44,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [30.0, 17.0, 169.0, 24.0, 21.0, 209.0, 30.0, 28.0, 30.0, 175.0]
2025-08-07 06:06:44,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 80/100 (estimated time remaining: 35 minutes, 1 second)
2025-08-07 06:08:24,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:08:27,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 257.37009 ± 298.440
2025-08-07 06:08:27,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [9.102538, 0.329285, 4.2157993, 550.84314, 304.63193, 3.150713, 850.43384, 221.33022, 619.99945, 9.664001]
2025-08-07 06:08:27,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [18.0, 25.0, 28.0, 269.0, 155.0, 25.0, 436.0, 127.0, 316.0, 24.0]
2025-08-07 06:08:27,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1226 [INFO]: New best (257.37) for latency ExtremeSparseL4U32
2025-08-07 06:08:27,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 81/100 (estimated time remaining: 33 minutes, 43 seconds)
2025-08-07 06:10:06,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:10:07,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 29.48750 ± 68.535
2025-08-07 06:10:07,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [3.7474687, 1.0706452, 234.85394, 11.82082, 8.266239, 4.881002, 6.576506, 6.091722, 5.0534573, 12.513144]
2025-08-07 06:10:07,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [15.0, 21.0, 124.0, 27.0, 23.0, 15.0, 16.0, 28.0, 25.0, 26.0]
2025-08-07 06:10:07,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 82/100 (estimated time remaining: 31 minutes, 56 seconds)
2025-08-07 06:11:44,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:11:45,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 46.15943 ± 116.905
2025-08-07 06:11:45,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [5.048674, 396.73273, 2.137265, 13.486048, 5.121884, 9.979252, 5.6334963, 11.869069, 6.2472334, 5.338662]
2025-08-07 06:11:45,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [15.0, 175.0, 15.0, 26.0, 15.0, 23.0, 17.0, 24.0, 16.0, 21.0]
2025-08-07 06:11:45,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 83/100 (estimated time remaining: 30 minutes, 5 seconds)
2025-08-07 06:13:24,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:13:24,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 44.66298 ± 108.559
2025-08-07 06:13:24,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [12.646867, 370.06656, 18.09872, 0.8321332, 7.8997827, 7.60175, 9.843905, 3.3646, 8.589263, 7.6861863]
2025-08-07 06:13:24,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [23.0, 142.0, 32.0, 32.0, 20.0, 18.0, 22.0, 32.0, 23.0, 29.0]
2025-08-07 06:13:24,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 84/100 (estimated time remaining: 28 minutes, 18 seconds)
2025-08-07 06:15:05,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:15:06,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 101.19897 ± 273.529
2025-08-07 06:15:06,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [8.630446, 7.753082, 17.950306, 921.6654, 2.713838, 12.985274, 6.726233, 5.554777, 10.279474, 17.730913]
2025-08-07 06:15:06,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [19.0, 27.0, 28.0, 433.0, 17.0, 31.0, 18.0, 23.0, 24.0, 27.0]
2025-08-07 06:15:06,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 85/100 (estimated time remaining: 26 minutes, 44 seconds)
2025-08-07 06:16:43,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:16:44,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 52.15112 ± 131.632
2025-08-07 06:16:44,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [17.949957, 8.83198, 4.6385636, 8.7890625, 4.612542, 4.329746, 9.287028, 446.88995, 8.127139, 8.055243]
2025-08-07 06:16:44,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [30.0, 22.0, 30.0, 28.0, 28.0, 26.0, 24.0, 223.0, 23.0, 26.0]
2025-08-07 06:16:44,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 86/100 (estimated time remaining: 24 minutes, 51 seconds)
2025-08-07 06:18:25,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:18:26,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 101.22283 ± 184.682
2025-08-07 06:18:26,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [15.250123, 3.2056036, 7.69899, 411.5227, 522.62396, 5.5915194, 6.4087796, 5.356873, 12.363898, 22.205992]
2025-08-07 06:18:26,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [26.0, 15.0, 21.0, 175.0, 234.0, 20.0, 21.0, 19.0, 26.0, 28.0]
2025-08-07 06:18:26,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 87/100 (estimated time remaining: 23 minutes, 16 seconds)
2025-08-07 06:20:04,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:20:05,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 114.51804 ± 213.821
2025-08-07 06:20:05,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [12.834631, 12.285168, 14.425128, 10.8925, 3.9731011, 14.580914, 3.7544076, 9.912526, 638.211, 424.3111]
2025-08-07 06:20:05,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [25.0, 25.0, 31.0, 25.0, 16.0, 25.0, 23.0, 26.0, 314.0, 184.0]
2025-08-07 06:20:05,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 88/100 (estimated time remaining: 21 minutes, 40 seconds)
2025-08-07 06:21:46,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:21:48,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 275.07269 ± 207.168
2025-08-07 06:21:48,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [16.046942, 447.33533, 238.49867, 354.42145, 515.0442, 188.76428, 4.4192243, 376.37152, 600.25275, 9.572749]
2025-08-07 06:21:48,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [27.0, 185.0, 128.0, 150.0, 235.0, 101.0, 30.0, 160.0, 275.0, 23.0]
2025-08-07 06:21:48,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1226 [INFO]: New best (275.07) for latency ExtremeSparseL4U32
2025-08-07 06:21:48,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 89/100 (estimated time remaining: 20 minutes, 8 seconds)
2025-08-07 06:23:29,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:23:29,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 85.53912 ± 164.728
2025-08-07 06:23:29,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [13.943104, 16.510897, 246.26202, 5.342357, 531.831, 0.04942833, 18.90275, 5.3434086, 7.282241, 9.924096]
2025-08-07 06:23:29,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [29.0, 26.0, 131.0, 15.0, 235.0, 13.0, 28.0, 19.0, 33.0, 25.0]
2025-08-07 06:23:29,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 90/100 (estimated time remaining: 18 minutes, 28 seconds)
2025-08-07 06:25:08,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:25:09,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 59.16443 ± 103.329
2025-08-07 06:25:09,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [278.10602, 4.311219, 4.6774178, 4.214576, 5.2504725, 7.2048397, 11.339971, 252.71362, 9.158993, 14.667145]
2025-08-07 06:25:09,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [147.0, 19.0, 28.0, 24.0, 24.0, 27.0, 22.0, 135.0, 28.0, 31.0]
2025-08-07 06:25:09,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 91/100 (estimated time remaining: 16 minutes, 50 seconds)
2025-08-07 06:26:47,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:26:49,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 118.36250 ± 195.104
2025-08-07 06:26:49,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [13.610423, 4.5208316, 15.587664, 54.470043, 72.7211, 3.0474958, 567.8703, 435.22903, 7.98515, 8.58298]
2025-08-07 06:26:49,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [28.0, 18.0, 34.0, 91.0, 115.0, 27.0, 275.0, 185.0, 28.0, 19.0]
2025-08-07 06:26:49,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 92/100 (estimated time remaining: 15 minutes, 5 seconds)
2025-08-07 06:28:29,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:28:30,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 169.76877 ± 220.825
2025-08-07 06:28:30,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1.4378428, 3.228984, 224.74709, 0.89813364, 6.531155, 634.29315, 12.691069, 424.72784, 377.1688, 11.963801]
2025-08-07 06:28:30,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [27.0, 18.0, 337.0, 17.0, 19.0, 289.0, 27.0, 185.0, 294.0, 22.0]
2025-08-07 06:28:30,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 93/100 (estimated time remaining: 13 minutes, 28 seconds)
2025-08-07 06:30:09,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:30:10,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 7.51402 ± 3.538
2025-08-07 06:30:10,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [6.838902, 2.7524712, 7.858523, 14.990964, 8.702561, 7.9290905, 9.097649, 1.8940071, 9.810358, 5.2656565]
2025-08-07 06:30:10,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [18.0, 18.0, 19.0, 32.0, 21.0, 18.0, 27.0, 23.0, 20.0, 29.0]
2025-08-07 06:30:10,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 94/100 (estimated time remaining: 11 minutes, 42 seconds)
2025-08-07 06:31:50,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:31:51,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 159.31921 ± 259.025
2025-08-07 06:31:51,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [8.519714, 4.6904116, 13.426938, 14.955246, 9.092312, 6.990318, 414.47046, 7.5200024, 299.54584, 813.9809]
2025-08-07 06:31:51,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [24.0, 17.0, 27.0, 29.0, 21.0, 20.0, 180.0, 19.0, 133.0, 582.0]
2025-08-07 06:31:51,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 2 seconds)
2025-08-07 06:33:31,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:33:32,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 149.31725 ± 164.018
2025-08-07 06:33:32,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [13.181343, 150.55461, 217.65958, 274.18112, 356.93802, 468.1058, 1.3370723, 1.7206224, 2.990355, 6.503938]
2025-08-07 06:33:32,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [25.0, 172.0, 120.0, 140.0, 146.0, 215.0, 27.0, 16.0, 19.0, 32.0]
2025-08-07 06:33:32,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 23 seconds)
2025-08-07 06:35:11,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:35:12,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 131.28676 ± 211.937
2025-08-07 06:35:12,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [2.820653, 13.6847925, 0.7494015, 487.3151, 5.670276, 8.074896, 589.93884, 188.67746, 10.388967, 5.54728]
2025-08-07 06:35:12,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [15.0, 31.0, 15.0, 319.0, 17.0, 18.0, 335.0, 101.0, 25.0, 28.0]
2025-08-07 06:35:12,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 42 seconds)
2025-08-07 06:36:52,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:36:54,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 222.02168 ± 261.215
2025-08-07 06:36:54,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [586.4576, 8.714094, 4.258695, 21.399126, 15.966945, 13.403841, 406.69415, 605.53613, 546.0214, 11.764647]
2025-08-07 06:36:54,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [282.0, 20.0, 19.0, 29.0, 31.0, 27.0, 178.0, 264.0, 242.0, 22.0]
2025-08-07 06:36:54,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 2 seconds)
2025-08-07 06:38:32,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:38:33,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 80.57235 ± 143.281
2025-08-07 06:38:33,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [10.177792, 4.9466543, 16.750477, 12.954982, 10.293957, 281.3387, 16.782028, 435.8335, 6.2366934, 10.408724]
2025-08-07 06:38:33,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [29.0, 15.0, 27.0, 26.0, 31.0, 141.0, 29.0, 186.0, 18.0, 22.0]
2025-08-07 06:38:33,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 21 seconds)
2025-08-07 06:40:15,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:40:17,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 204.39534 ± 189.866
2025-08-07 06:40:17,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [9.614589, 176.46277, 427.0034, 7.154009, 502.5567, 11.790529, 393.18216, 376.132, 125.12746, 14.929824]
2025-08-07 06:40:17,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [25.0, 100.0, 173.0, 28.0, 223.0, 30.0, 160.0, 152.0, 95.0, 31.0]
2025-08-07 06:40:17,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 41 seconds)
2025-08-07 06:41:54,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:41:55,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1221 [DEBUG]: Total Reward: 100.60974 ± 185.590
2025-08-07 06:41:55,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1222 [DEBUG]: All rewards: [1.7311269, 12.441715, 15.40539, 5.724123, 481.5724, 12.895981, 3.7492688, 461.6052, 5.6644893, 5.3077016]
2025-08-07 06:41:55,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1223 [DEBUG]: All trajectory lengths: [13.0, 25.0, 28.0, 20.0, 214.0, 24.0, 17.0, 223.0, 19.0, 20.0]
2025-08-07 06:41:55,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-walker2d):1251 [DEBUG]: Training session finished
