2025-08-07 03:51:24,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc10-walker2d/ExtremeSparseL4U32-bpql-mem32
2025-08-07 03:51:24,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc10-walker2d/ExtremeSparseL4U32-bpql-mem32
2025-08-07 03:51:24,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x145b55d8e9d0>}
2025-08-07 03:51:24,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1111 [DEBUG]: using device: cuda
2025-08-07 03:51:24,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1133 [INFO]: Creating new trainer
2025-08-07 03:51:25,011 baseline-bpql-noiseperc10-walker2d:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=209, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 03:51:25,011 baseline-bpql-noiseperc10-walker2d:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 03:51:25,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1194 [DEBUG]: Starting training session...
2025-08-07 03:51:25,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 1/100
2025-08-07 03:52:59,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:53:00,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 9.12898 ± 22.499
2025-08-07 03:53:00,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [35.11313, 19.088734, 15.620212, 18.189615, 11.179826, 1.5433605, -54.05048, 12.422937, 17.191444, 14.991023]
2025-08-07 03:53:00,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [104.0, 98.0, 31.0, 30.0, 21.0, 99.0, 143.0, 24.0, 31.0, 28.0]
2025-08-07 03:53:00,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (9.13) for latency ExtremeSparseL4U32
2025-08-07 03:53:00,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 36 minutes, 7 seconds)
2025-08-07 03:54:42,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:54:43,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 37.76400 ± 62.936
2025-08-07 03:54:43,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [21.177835, 77.006996, 10.785782, 18.180204, -17.355524, 15.976407, 11.846439, 10.602587, 14.911999, 214.5072]
2025-08-07 03:54:43,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [31.0, 163.0, 126.0, 29.0, 162.0, 44.0, 29.0, 23.0, 29.0, 181.0]
2025-08-07 03:54:43,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (37.76) for latency ExtremeSparseL4U32
2025-08-07 03:54:43,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 41 minutes, 13 seconds)
2025-08-07 03:56:25,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:56:27,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 24.78795 ± 29.278
2025-08-07 03:56:27,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [51.394726, 3.5576508, 13.904056, 16.961267, 71.69063, 72.55304, 14.664189, 14.28719, -24.234211, 13.100988]
2025-08-07 03:56:27,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [68.0, 16.0, 31.0, 40.0, 295.0, 469.0, 170.0, 23.0, 161.0, 30.0]
2025-08-07 03:56:27,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 42 minutes, 25 seconds)
2025-08-07 03:58:10,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:58:12,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 46.57571 ± 61.388
2025-08-07 03:58:12,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [58.483555, -2.1294446, 18.039648, 219.94925, 29.223808, 14.323176, 9.056091, 12.172363, 65.03261, 41.605995]
2025-08-07 03:58:12,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [72.0, 137.0, 31.0, 166.0, 124.0, 28.0, 21.0, 84.0, 208.0, 75.0]
2025-08-07 03:58:12,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (46.58) for latency ExtremeSparseL4U32
2025-08-07 03:58:12,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 42 minutes, 27 seconds)
2025-08-07 03:59:53,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:59:54,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 27.78173 ± 23.100
2025-08-07 03:59:54,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [13.554538, 11.012513, 89.58327, 15.753125, 32.993286, 28.76841, 11.117191, 37.99349, 6.1389875, 30.902494]
2025-08-07 03:59:54,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [28.0, 22.0, 116.0, 113.0, 61.0, 48.0, 40.0, 71.0, 18.0, 86.0]
2025-08-07 03:59:54,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 41 minutes, 5 seconds)
2025-08-07 04:01:34,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:01:36,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 53.42041 ± 87.154
2025-08-07 04:01:36,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [13.076402, -11.174222, 9.453671, 32.69645, 147.52315, 8.967904, 9.349757, 283.2249, 15.568527, 25.517527]
2025-08-07 04:01:36,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [24.0, 167.0, 25.0, 44.0, 158.0, 24.0, 22.0, 255.0, 159.0, 145.0]
2025-08-07 04:01:36,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (53.42) for latency ExtremeSparseL4U32
2025-08-07 04:01:36,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 41 minutes, 37 seconds)
2025-08-07 04:03:17,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:03:19,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 48.38815 ± 52.035
2025-08-07 04:03:19,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [66.67306, 6.1349564, 91.250404, 92.67221, 16.485836, 8.723942, 4.8196597, 13.468094, 15.418544, 168.23474]
2025-08-07 04:03:19,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [93.0, 137.0, 260.0, 149.0, 27.0, 20.0, 15.0, 26.0, 32.0, 162.0]
2025-08-07 04:03:19,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 39 minutes, 56 seconds)
2025-08-07 04:05:01,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:05:02,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 28.25754 ± 22.972
2025-08-07 04:05:02,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [22.862556, 40.475014, 78.97908, 10.336966, 23.414728, 5.171907, 44.87798, -0.28792834, 11.675572, 45.06958]
2025-08-07 04:05:02,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [44.0, 56.0, 178.0, 32.0, 31.0, 18.0, 107.0, 196.0, 27.0, 91.0]
2025-08-07 04:05:02,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 38 minutes)
2025-08-07 04:06:43,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:06:44,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 24.80852 ± 17.346
2025-08-07 04:06:44,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [9.128075, 30.88677, 6.7476835, 30.330637, 15.457282, 15.345289, 10.546633, 66.787224, 39.771103, 23.084517]
2025-08-07 04:06:44,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [21.0, 101.0, 32.0, 46.0, 26.0, 28.0, 23.0, 81.0, 53.0, 115.0]
2025-08-07 04:06:44,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 35 minutes, 30 seconds)
2025-08-07 04:08:25,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:08:26,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 29.69462 ± 20.354
2025-08-07 04:08:26,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [20.224657, 41.284912, 54.537926, 14.656069, 25.549532, 12.500325, 10.106162, 75.98439, 27.631706, 14.470554]
2025-08-07 04:08:26,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [33.0, 49.0, 61.0, 24.0, 44.0, 145.0, 23.0, 104.0, 57.0, 33.0]
2025-08-07 04:08:26,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 33 minutes, 31 seconds)
2025-08-07 04:10:08,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:10:08,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 26.01575 ± 21.565
2025-08-07 04:10:08,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [16.493868, 10.925713, 66.28046, 9.73659, 10.651216, 55.917652, 7.9115424, 17.266163, 52.827038, 12.147261]
2025-08-07 04:10:08,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [26.0, 21.0, 92.0, 31.0, 29.0, 79.0, 17.0, 30.0, 57.0, 33.0]
2025-08-07 04:10:08,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 32 minutes)
2025-08-07 04:11:50,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:11:50,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 34.09891 ± 33.046
2025-08-07 04:11:50,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [120.408104, 14.565569, 65.53113, 12.316786, 19.627909, 19.67558, 33.79254, 10.491274, 8.624783, 35.95539]
2025-08-07 04:11:50,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [114.0, 29.0, 67.0, 22.0, 41.0, 41.0, 50.0, 23.0, 21.0, 56.0]
2025-08-07 04:11:50,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 30 minutes, 2 seconds)
2025-08-07 04:13:31,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:13:32,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 83.48258 ± 80.757
2025-08-07 04:13:32,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [20.921356, 53.699642, 67.60079, 88.246864, 28.965258, 16.114483, 9.501583, 266.8727, 83.12277, 199.78029]
2025-08-07 04:13:32,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [33.0, 78.0, 66.0, 95.0, 55.0, 28.0, 24.0, 229.0, 94.0, 139.0]
2025-08-07 04:13:32,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (83.48) for latency ExtremeSparseL4U32
2025-08-07 04:13:32,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 27 minutes, 57 seconds)
2025-08-07 04:15:15,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:15:16,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 35.01994 ± 46.281
2025-08-07 04:15:16,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [89.954056, 6.1879997, 12.148764, 11.516158, 24.49303, 9.734011, 13.905584, 10.297202, 154.87608, 17.086533]
2025-08-07 04:15:16,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [87.0, 18.0, 22.0, 28.0, 44.0, 20.0, 25.0, 29.0, 151.0, 32.0]
2025-08-07 04:15:16,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 26 minutes, 35 seconds)
2025-08-07 04:16:57,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:16:58,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 62.64229 ± 52.285
2025-08-07 04:16:58,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [87.79711, 114.812775, 140.08798, 133.09335, 15.84643, 13.429394, 15.678885, 86.536644, 9.608685, 9.531622]
2025-08-07 04:16:58,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [82.0, 124.0, 97.0, 114.0, 28.0, 24.0, 26.0, 87.0, 21.0, 23.0]
2025-08-07 04:16:58,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 25 minutes, 3 seconds)
2025-08-07 04:18:39,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:18:41,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 63.14970 ± 52.939
2025-08-07 04:18:41,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [13.356517, 32.790257, 131.97261, 43.337513, 11.269336, 9.340922, 94.39702, 51.63278, 176.21039, 67.18965]
2025-08-07 04:18:41,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [40.0, 52.0, 105.0, 55.0, 28.0, 22.0, 111.0, 57.0, 139.0, 71.0]
2025-08-07 04:18:41,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 23 minutes, 25 seconds)
2025-08-07 04:20:22,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:20:24,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 60.88348 ± 50.180
2025-08-07 04:20:24,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [43.946114, 153.60133, 17.087202, 13.985976, 17.489582, 10.8222885, 73.36385, 128.64215, 38.563034, 111.333206]
2025-08-07 04:20:24,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [107.0, 104.0, 28.0, 24.0, 31.0, 23.0, 83.0, 107.0, 57.0, 150.0]
2025-08-07 04:20:24,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 22 minutes)
2025-08-07 04:22:04,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:22:05,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 23.19778 ± 17.344
2025-08-07 04:22:05,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [11.669405, 12.539066, 14.923292, 9.77367, 8.636714, 16.575691, 13.687109, 35.593624, 47.97607, 60.603207]
2025-08-07 04:22:05,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [25.0, 40.0, 27.0, 24.0, 21.0, 30.0, 25.0, 84.0, 95.0, 116.0]
2025-08-07 04:22:05,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 20 minutes, 11 seconds)
2025-08-07 04:23:47,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:23:48,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 56.10622 ± 74.774
2025-08-07 04:23:48,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [23.790167, 254.44875, 43.36745, 130.19507, 11.658813, 14.790285, 12.002356, 9.081987, 11.337809, 50.389545]
2025-08-07 04:23:48,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [50.0, 191.0, 61.0, 120.0, 43.0, 26.0, 24.0, 21.0, 24.0, 96.0]
2025-08-07 04:23:48,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 18 minutes, 24 seconds)
2025-08-07 04:25:30,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:25:31,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 28.36390 ± 38.522
2025-08-07 04:25:31,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [11.686369, 26.9343, 10.815644, 7.7599497, 38.934258, 13.2818365, 9.376786, 12.184827, 12.0550165, 140.61]
2025-08-07 04:25:31,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [24.0, 98.0, 28.0, 22.0, 166.0, 24.0, 20.0, 28.0, 24.0, 144.0]
2025-08-07 04:25:31,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 16 minutes, 43 seconds)
2025-08-07 04:27:12,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:27:14,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 118.98684 ± 111.191
2025-08-07 04:27:14,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [55.734516, 233.93661, 8.558155, 9.899463, 5.8729863, 33.920986, 103.37739, 312.71707, 154.31076, 271.54047]
2025-08-07 04:27:14,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [93.0, 211.0, 33.0, 23.0, 17.0, 55.0, 157.0, 193.0, 256.0, 137.0]
2025-08-07 04:27:14,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (118.99) for latency ExtremeSparseL4U32
2025-08-07 04:27:14,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 15 minutes, 9 seconds)
2025-08-07 04:28:56,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:28:58,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 91.12132 ± 125.554
2025-08-07 04:28:58,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [38.303455, 386.6641, 9.464555, 91.14498, 34.358448, 29.656965, 12.762226, 10.685417, 17.738642, 280.4343]
2025-08-07 04:28:58,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [107.0, 195.0, 19.0, 128.0, 54.0, 113.0, 29.0, 27.0, 39.0, 144.0]
2025-08-07 04:28:58,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 13 minutes, 42 seconds)
2025-08-07 04:30:38,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:30:40,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 121.28004 ± 173.109
2025-08-07 04:30:40,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [9.381557, 13.757885, 42.76209, 12.380121, 3.434441, 294.25464, 576.99365, 116.2273, 47.746693, 95.862045]
2025-08-07 04:30:40,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [29.0, 28.0, 62.0, 24.0, 19.0, 206.0, 366.0, 187.0, 93.0, 154.0]
2025-08-07 04:30:40,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (121.28) for latency ExtremeSparseL4U32
2025-08-07 04:30:40,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 12 minutes, 13 seconds)
2025-08-07 04:32:22,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:32:24,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 91.58580 ± 112.430
2025-08-07 04:32:24,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [11.772234, 130.49084, 53.57925, 7.212659, 62.541348, 276.7314, 8.553906, 12.708099, 330.1194, 22.148928]
2025-08-07 04:32:24,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [24.0, 223.0, 63.0, 17.0, 82.0, 199.0, 21.0, 22.0, 457.0, 29.0]
2025-08-07 04:32:24,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 10 minutes, 39 seconds)
2025-08-07 04:34:08,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:34:10,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 105.91292 ± 148.451
2025-08-07 04:34:10,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [253.94493, 4.879334, 450.3868, 11.833369, 46.46985, 5.4804764, 9.481936, 8.344823, 249.71245, 18.595268]
2025-08-07 04:34:10,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [153.0, 26.0, 345.0, 24.0, 126.0, 22.0, 25.0, 20.0, 153.0, 29.0]
2025-08-07 04:34:10,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 9 minutes, 47 seconds)
2025-08-07 04:35:50,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:35:52,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 180.14249 ± 186.068
2025-08-07 04:35:52,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [352.59445, 554.59845, 12.382092, 14.721401, 350.40796, 22.073336, 13.575298, 278.6322, 193.0602, 9.379614]
2025-08-07 04:35:52,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [238.0, 410.0, 24.0, 31.0, 191.0, 31.0, 42.0, 166.0, 103.0, 33.0]
2025-08-07 04:35:52,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (180.14) for latency ExtremeSparseL4U32
2025-08-07 04:35:52,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 7 minutes, 54 seconds)
2025-08-07 04:37:35,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:37:37,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 204.79149 ± 184.502
2025-08-07 04:37:37,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [367.95688, 345.11023, 12.440528, 117.523544, 299.681, 340.48114, 10.497916, 12.722591, 7.382897, 534.11816]
2025-08-07 04:37:37,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [188.0, 180.0, 27.0, 79.0, 176.0, 169.0, 21.0, 24.0, 19.0, 320.0]
2025-08-07 04:37:37,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (204.79) for latency ExtremeSparseL4U32
2025-08-07 04:37:37,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 6 minutes, 16 seconds)
2025-08-07 04:39:19,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:39:20,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 102.61774 ± 94.401
2025-08-07 04:39:20,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [13.707085, 11.583214, 9.993855, 175.73326, 12.033233, 240.24054, 225.44124, 14.724466, 194.07738, 128.6431]
2025-08-07 04:39:20,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [27.0, 30.0, 22.0, 253.0, 27.0, 164.0, 123.0, 32.0, 100.0, 83.0]
2025-08-07 04:39:20,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 4 minutes, 42 seconds)
2025-08-07 04:41:02,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:41:03,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 74.34200 ± 95.320
2025-08-07 04:41:03,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [12.280234, 5.9341793, 10.558094, 13.508395, 7.8838544, 21.450369, 246.416, 15.371909, 212.45428, 197.56277]
2025-08-07 04:41:03,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [24.0, 16.0, 25.0, 25.0, 19.0, 32.0, 147.0, 29.0, 117.0, 143.0]
2025-08-07 04:41:03,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 2 minutes, 49 seconds)
2025-08-07 04:42:46,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:42:48,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 173.99625 ± 235.277
2025-08-07 04:42:48,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [9.84723, 10.509951, 186.22739, 412.95038, 738.1936, 16.61032, 14.782992, 15.989873, 10.673415, 324.1774]
2025-08-07 04:42:48,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [24.0, 22.0, 261.0, 217.0, 514.0, 32.0, 26.0, 28.0, 22.0, 191.0]
2025-08-07 04:42:48,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 1 minute)
2025-08-07 04:44:29,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:44:31,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 138.96223 ± 169.413
2025-08-07 04:44:31,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [254.52498, 11.365367, 10.154072, 8.224683, 502.6935, 15.297926, 229.68631, 330.07208, 11.655481, 15.947749]
2025-08-07 04:44:31,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [151.0, 21.0, 26.0, 20.0, 473.0, 28.0, 141.0, 175.0, 29.0, 29.0]
2025-08-07 04:44:31,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 59 minutes, 19 seconds)
2025-08-07 04:46:12,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:46:13,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 155.52104 ± 136.213
2025-08-07 04:46:13,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [4.6101475, 265.78192, 331.11984, 290.67505, 8.1811905, 13.987218, 231.92256, 8.090257, 81.76035, 319.08185]
2025-08-07 04:46:13,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [15.0, 155.0, 179.0, 160.0, 24.0, 25.0, 131.0, 20.0, 174.0, 167.0]
2025-08-07 04:46:13,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 57 minutes, 5 seconds)
2025-08-07 04:47:54,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:47:56,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 136.13371 ± 164.354
2025-08-07 04:47:56,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [257.3166, 16.514639, 4.246621, 19.964743, 8.642113, 16.369564, 224.59694, 12.984154, 308.0781, 492.6236]
2025-08-07 04:47:56,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [150.0, 41.0, 16.0, 28.0, 22.0, 32.0, 125.0, 23.0, 163.0, 296.0]
2025-08-07 04:47:56,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 55 minutes, 11 seconds)
2025-08-07 04:49:36,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:49:38,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 181.01079 ± 143.648
2025-08-07 04:49:38,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [7.8688188, 201.83437, 266.29074, 329.7296, 15.463821, 328.54047, 12.987726, 14.1451, 269.10068, 364.14658]
2025-08-07 04:49:38,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [22.0, 113.0, 145.0, 188.0, 31.0, 206.0, 28.0, 27.0, 159.0, 209.0]
2025-08-07 04:49:38,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 53 minutes, 21 seconds)
2025-08-07 04:51:20,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:51:22,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 159.77322 ± 127.772
2025-08-07 04:51:22,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [281.58887, 271.93256, 250.61978, 15.121418, 293.07285, 14.512652, 11.130577, 13.807488, 313.53464, 132.41142]
2025-08-07 04:51:22,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [154.0, 143.0, 129.0, 28.0, 163.0, 26.0, 28.0, 33.0, 148.0, 96.0]
2025-08-07 04:51:22,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 51 minutes, 14 seconds)
2025-08-07 04:53:03,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:53:04,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 104.60813 ± 117.303
2025-08-07 04:53:04,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [55.220425, 272.38068, 10.043384, 6.3353963, 290.30173, 97.20284, 13.96711, 18.209494, 7.948436, 274.47174]
2025-08-07 04:53:04,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [104.0, 149.0, 22.0, 21.0, 165.0, 161.0, 29.0, 29.0, 28.0, 145.0]
2025-08-07 04:53:04,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 49 minutes, 25 seconds)
2025-08-07 04:54:44,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:54:46,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 161.76474 ± 193.190
2025-08-07 04:54:46,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [11.131521, 302.02725, 11.728014, 523.342, 304.3508, 13.273439, 11.989353, 12.326185, 419.97208, 7.5068603]
2025-08-07 04:54:46,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [27.0, 173.0, 21.0, 336.0, 169.0, 26.0, 27.0, 30.0, 307.0, 21.0]
2025-08-07 04:54:46,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 47 minutes, 40 seconds)
2025-08-07 04:56:28,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:56:30,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 200.62373 ± 187.214
2025-08-07 04:56:30,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [513.60065, 11.628992, 340.16577, 6.4413247, 3.122642, 468.17297, 119.31186, 15.359282, 260.60132, 267.83258]
2025-08-07 04:56:30,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [249.0, 27.0, 171.0, 18.0, 21.0, 377.0, 154.0, 25.0, 156.0, 152.0]
2025-08-07 04:56:30,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 46 minutes, 12 seconds)
2025-08-07 04:58:11,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:58:13,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 192.69618 ± 242.172
2025-08-07 04:58:13,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [715.1393, 10.116774, 477.22018, 13.4021435, 364.90964, 20.456121, 11.973954, 15.379625, 7.977571, 290.3865]
2025-08-07 04:58:13,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [538.0, 26.0, 211.0, 29.0, 338.0, 32.0, 23.0, 32.0, 19.0, 161.0]
2025-08-07 04:58:13,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 44 minutes, 45 seconds)
2025-08-07 04:59:53,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:59:55,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 137.13718 ± 157.735
2025-08-07 04:59:55,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [18.219929, 19.069965, 13.418873, 354.33994, 214.95674, 10.752564, 310.69525, 9.3129635, 409.10056, 11.504872]
2025-08-07 04:59:55,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [31.0, 31.0, 23.0, 199.0, 290.0, 29.0, 180.0, 19.0, 235.0, 21.0]
2025-08-07 04:59:55,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 42 minutes, 34 seconds)
2025-08-07 05:01:36,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:01:37,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 119.70065 ± 184.927
2025-08-07 05:01:37,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [243.41502, 11.858358, 7.3176074, 289.47363, 9.914138, 8.509448, 18.197767, 13.119914, 584.1451, 11.0555725]
2025-08-07 05:01:37,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [127.0, 27.0, 18.0, 135.0, 28.0, 25.0, 33.0, 26.0, 334.0, 20.0]
2025-08-07 05:01:37,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 40 minutes, 52 seconds)
2025-08-07 05:03:19,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:03:20,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 167.81012 ± 150.618
2025-08-07 05:03:20,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [16.101131, 7.1645637, 247.7749, 229.2884, 17.602728, 15.135109, 255.08235, 112.79356, 471.8532, 305.30533]
2025-08-07 05:03:20,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [32.0, 20.0, 141.0, 133.0, 29.0, 29.0, 145.0, 78.0, 222.0, 148.0]
2025-08-07 05:03:20,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 39 minutes, 24 seconds)
2025-08-07 05:05:01,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:05:03,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 229.56548 ± 188.000
2025-08-07 05:05:03,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [265.99118, 292.1836, 12.755417, 477.68173, 328.1782, 460.23682, 12.043475, 17.999641, 9.172185, 419.41248]
2025-08-07 05:05:03,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [156.0, 160.0, 25.0, 249.0, 191.0, 212.0, 24.0, 28.0, 28.0, 213.0]
2025-08-07 05:05:03,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (229.57) for latency ExtremeSparseL4U32
2025-08-07 05:05:03,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 37 minutes, 34 seconds)
2025-08-07 05:06:44,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:06:45,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 162.42447 ± 197.689
2025-08-07 05:06:45,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [9.699403, 13.432788, 465.29486, 13.416489, 4.6878138, 215.5552, 8.992886, 10.57165, 468.7647, 413.82892]
2025-08-07 05:06:45,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [26.0, 27.0, 222.0, 27.0, 22.0, 124.0, 32.0, 23.0, 239.0, 231.0]
2025-08-07 05:06:45,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 35 minutes, 35 seconds)
2025-08-07 05:08:31,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:08:32,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 120.20809 ± 120.524
2025-08-07 05:08:32,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [11.197784, 19.267242, 99.251755, 12.401031, 260.31256, 223.37367, 8.373729, 208.28535, 16.279001, 343.33878]
2025-08-07 05:08:32,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [29.0, 29.0, 137.0, 28.0, 158.0, 133.0, 25.0, 144.0, 31.0, 161.0]
2025-08-07 05:08:32,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 34 minutes, 55 seconds)
2025-08-07 05:10:10,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:10:12,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 171.40030 ± 190.578
2025-08-07 05:10:12,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [229.54471, 582.285, 343.61423, 358.99835, 18.431532, 9.759297, 131.45354, 8.489802, 16.405174, 15.021492]
2025-08-07 05:10:12,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [181.0, 326.0, 168.0, 215.0, 31.0, 26.0, 154.0, 25.0, 28.0, 24.0]
2025-08-07 05:10:12,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 32 minutes, 42 seconds)
2025-08-07 05:11:50,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:11:52,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 138.31953 ± 150.776
2025-08-07 05:11:52,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [11.067741, 10.138602, 49.76924, 367.60553, 345.4296, 17.622576, 9.882033, 229.13177, 13.286204, 329.26193]
2025-08-07 05:11:52,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [31.0, 32.0, 98.0, 206.0, 177.0, 32.0, 27.0, 206.0, 26.0, 176.0]
2025-08-07 05:11:52,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 30 minutes, 25 seconds)
2025-08-07 05:13:34,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:13:35,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 187.12883 ± 213.861
2025-08-07 05:13:35,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [300.62488, 332.6856, 220.6072, 12.226729, 17.278921, 707.89484, 10.163811, 17.643364, 235.7243, 16.438717]
2025-08-07 05:13:35,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [144.0, 163.0, 145.0, 25.0, 28.0, 392.0, 24.0, 32.0, 134.0, 27.0]
2025-08-07 05:13:35,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 28 minutes, 47 seconds)
2025-08-07 05:15:16,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:15:17,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 48.13523 ± 109.052
2025-08-07 05:15:17,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [20.049437, 375.09335, 14.94437, 7.9842024, 10.021541, 14.241471, 13.219005, 7.957517, 11.071729, 6.7697153]
2025-08-07 05:15:17,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [30.0, 199.0, 28.0, 18.0, 22.0, 32.0, 25.0, 21.0, 22.0, 29.0]
2025-08-07 05:15:17,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 26 minutes, 54 seconds)
2025-08-07 05:16:56,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:16:58,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 253.06790 ± 175.656
2025-08-07 05:16:58,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [556.4587, 395.34613, 370.3887, 317.0501, 248.88368, 11.817922, 18.506582, 296.94885, 7.593649, 307.68454]
2025-08-07 05:16:58,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [293.0, 220.0, 192.0, 160.0, 141.0, 27.0, 30.0, 159.0, 33.0, 181.0]
2025-08-07 05:16:58,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (253.07) for latency ExtremeSparseL4U32
2025-08-07 05:16:58,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 24 minutes, 17 seconds)
2025-08-07 05:18:39,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:18:42,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 247.10791 ± 235.180
2025-08-07 05:18:42,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [12.835215, 98.128876, 15.347334, 589.08344, 2.6887136, 429.1032, 4.348671, 597.3289, 311.5393, 410.67532]
2025-08-07 05:18:42,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [26.0, 265.0, 29.0, 302.0, 23.0, 234.0, 19.0, 403.0, 277.0, 239.0]
2025-08-07 05:18:42,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 23 minutes, 16 seconds)
2025-08-07 05:20:23,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:20:25,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 207.79370 ± 187.058
2025-08-07 05:20:25,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [263.0105, 387.4364, 206.57645, 326.00717, 3.1029005, 278.82248, 15.358627, 10.151184, 8.426263, 579.04504]
2025-08-07 05:20:25,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [159.0, 197.0, 303.0, 178.0, 17.0, 139.0, 31.0, 24.0, 19.0, 348.0]
2025-08-07 05:20:25,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 22 minutes, 3 seconds)
2025-08-07 05:22:06,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:22:07,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 151.91228 ± 140.521
2025-08-07 05:22:07,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [161.01837, 16.545898, 319.35284, 58.151203, 340.49756, 345.88025, 7.8541813, 13.390894, 9.947978, 246.4836]
2025-08-07 05:22:07,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [162.0, 28.0, 182.0, 96.0, 175.0, 179.0, 22.0, 26.0, 32.0, 144.0]
2025-08-07 05:22:07,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 20 minutes, 10 seconds)
2025-08-07 05:23:49,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:23:51,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 263.74747 ± 220.136
2025-08-07 05:23:51,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [284.74945, 10.377564, 340.20517, 195.755, 649.74677, 248.08475, 263.9692, 625.6053, 6.3902764, 12.590989]
2025-08-07 05:23:51,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [170.0, 30.0, 183.0, 111.0, 371.0, 185.0, 157.0, 397.0, 20.0, 30.0]
2025-08-07 05:23:51,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (263.75) for latency ExtremeSparseL4U32
2025-08-07 05:23:51,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 18 minutes, 52 seconds)
2025-08-07 05:25:31,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:25:32,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 116.98753 ± 112.745
2025-08-07 05:25:32,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [11.386163, 13.825936, 214.3358, 152.03279, 7.871332, 12.229193, 14.040408, 291.14053, 168.11682, 284.8962]
2025-08-07 05:25:32,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [26.0, 28.0, 116.0, 95.0, 18.0, 25.0, 28.0, 156.0, 99.0, 159.0]
2025-08-07 05:25:32,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 17 minutes, 5 seconds)
2025-08-07 05:27:12,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:27:15,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 239.29524 ± 227.634
2025-08-07 05:27:15,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [623.5878, 142.6121, 19.393557, 411.39163, 185.44234, 522.8571, 15.251105, 451.34775, 6.3277636, 14.741283]
2025-08-07 05:27:15,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [415.0, 88.0, 32.0, 220.0, 170.0, 317.0, 31.0, 329.0, 33.0, 24.0]
2025-08-07 05:27:15,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 15 minutes, 17 seconds)
2025-08-07 05:28:56,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:28:57,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 127.56917 ± 240.822
2025-08-07 05:28:57,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [6.022322, 6.066667, 19.921946, 11.464422, 225.61507, 137.46613, 13.827775, 17.583105, 18.29994, 819.42426]
2025-08-07 05:28:57,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [25.0, 16.0, 31.0, 25.0, 117.0, 181.0, 27.0, 28.0, 32.0, 555.0]
2025-08-07 05:28:57,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 13 minutes, 25 seconds)
2025-08-07 05:30:37,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:30:38,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 116.85683 ± 128.970
2025-08-07 05:30:38,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [15.824829, 292.65656, 12.492241, 9.062066, 8.166078, 11.0810175, 315.62735, 222.37856, 258.81274, 22.466936]
2025-08-07 05:30:38,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [28.0, 152.0, 25.0, 20.0, 21.0, 24.0, 153.0, 120.0, 144.0, 31.0]
2025-08-07 05:30:38,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 11 minutes, 32 seconds)
2025-08-07 05:32:17,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:32:19,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 135.56192 ± 129.653
2025-08-07 05:32:19,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [13.743217, 300.86252, 12.59636, 184.80774, 341.18192, 10.021659, 16.073195, 7.4138865, 241.07971, 227.83893]
2025-08-07 05:32:19,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [33.0, 139.0, 23.0, 106.0, 190.0, 21.0, 28.0, 21.0, 139.0, 117.0]
2025-08-07 05:32:19,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 9 minutes, 21 seconds)
2025-08-07 05:33:59,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:34:01,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 203.26511 ± 139.461
2025-08-07 05:34:01,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [409.3973, 328.72275, 10.817599, 252.9486, 219.7105, 11.975199, 335.65817, 271.42416, 10.236439, 181.76036]
2025-08-07 05:34:01,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [261.0, 203.0, 23.0, 143.0, 119.0, 27.0, 163.0, 141.0, 24.0, 241.0]
2025-08-07 05:34:01,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 7 minutes, 49 seconds)
2025-08-07 05:35:41,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:35:43,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 134.46799 ± 126.788
2025-08-07 05:35:43,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [281.46188, 16.139341, 13.329928, 184.6126, 315.4025, 9.218813, 12.466204, 6.26513, 243.90193, 261.88153]
2025-08-07 05:35:43,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [134.0, 33.0, 27.0, 103.0, 183.0, 23.0, 28.0, 18.0, 149.0, 220.0]
2025-08-07 05:35:43,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 5 minutes, 58 seconds)
2025-08-07 05:37:24,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:37:26,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 228.80685 ± 176.218
2025-08-07 05:37:26,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [-1.6368207, 552.63165, 165.79924, 320.02835, 429.11804, 8.141385, 14.113205, 243.17964, 289.36127, 267.3328]
2025-08-07 05:37:26,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [26.0, 586.0, 99.0, 174.0, 201.0, 20.0, 32.0, 154.0, 156.0, 151.0]
2025-08-07 05:37:26,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 4 minutes, 28 seconds)
2025-08-07 05:39:07,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:39:09,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 235.97075 ± 262.385
2025-08-07 05:39:09,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [877.4729, 10.343622, 17.26775, 14.068058, 198.50404, 216.5333, 294.24908, 505.52264, 214.88802, 10.858048]
2025-08-07 05:39:09,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [542.0, 26.0, 32.0, 23.0, 108.0, 118.0, 266.0, 296.0, 231.0, 26.0]
2025-08-07 05:39:09,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 3 minutes, 1 second)
2025-08-07 05:40:49,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:40:51,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 209.32361 ± 256.368
2025-08-07 05:40:51,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [19.597971, 7.2356334, 631.58655, 740.12335, 255.41762, 17.909399, 154.39052, 14.901198, 242.00824, 10.0653105]
2025-08-07 05:40:51,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [30.0, 19.0, 418.0, 497.0, 131.0, 31.0, 81.0, 30.0, 145.0, 21.0]
2025-08-07 05:40:51,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 1 minute, 31 seconds)
2025-08-07 05:42:32,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:42:33,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 112.56403 ± 184.096
2025-08-07 05:42:33,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [11.653511, 16.7803, 9.232757, 14.819466, 607.63293, 14.368174, 13.551879, 8.59131, 174.5514, 254.45857]
2025-08-07 05:42:33,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [26.0, 30.0, 22.0, 31.0, 349.0, 30.0, 24.0, 24.0, 99.0, 141.0]
2025-08-07 05:42:33,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 66/100 (estimated time remaining: 59 minutes, 43 seconds)
2025-08-07 05:44:13,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:44:15,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 155.73416 ± 154.077
2025-08-07 05:44:15,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [5.8963184, 439.0182, 8.332465, 11.86505, 44.261833, 295.08002, 17.18133, 324.00854, 268.6625, 143.03542]
2025-08-07 05:44:15,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [24.0, 211.0, 19.0, 24.0, 102.0, 180.0, 30.0, 173.0, 153.0, 95.0]
2025-08-07 05:44:15,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 67/100 (estimated time remaining: 58 minutes, 2 seconds)
2025-08-07 05:45:58,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:46:00,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 185.97466 ± 158.183
2025-08-07 05:46:00,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [14.076485, 513.03076, 78.82661, 10.4819355, 256.6666, 311.80814, 301.6694, 130.78871, 7.6800003, 234.71794]
2025-08-07 05:46:00,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [24.0, 315.0, 143.0, 30.0, 141.0, 143.0, 173.0, 78.0, 20.0, 148.0]
2025-08-07 05:46:00,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 68/100 (estimated time remaining: 56 minutes, 28 seconds)
2025-08-07 05:47:37,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:47:39,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 186.91216 ± 151.931
2025-08-07 05:47:39,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [5.750356, 220.06897, 88.094345, 271.13867, 262.95865, 406.7277, 14.829151, 11.993289, 147.32248, 440.2379]
2025-08-07 05:47:39,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [25.0, 132.0, 135.0, 153.0, 152.0, 267.0, 31.0, 25.0, 106.0, 216.0]
2025-08-07 05:47:39,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 69/100 (estimated time remaining: 54 minutes, 22 seconds)
2025-08-07 05:49:21,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:49:23,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 165.09103 ± 152.236
2025-08-07 05:49:23,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [136.87453, 11.055042, 13.125746, 11.615328, 294.9689, 179.73283, 283.40646, 482.77774, 228.83824, 8.515327]
2025-08-07 05:49:23,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [259.0, 28.0, 24.0, 23.0, 162.0, 248.0, 170.0, 267.0, 130.0, 28.0]
2025-08-07 05:49:23,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 70/100 (estimated time remaining: 52 minutes, 50 seconds)
2025-08-07 05:51:02,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:51:03,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 189.64230 ± 207.857
2025-08-07 05:51:03,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [238.40187, 543.6604, 248.41705, 15.638453, 563.3575, 18.55726, 8.3023615, 243.73738, 9.612201, 6.738653]
2025-08-07 05:51:03,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [139.0, 234.0, 133.0, 24.0, 301.0, 32.0, 24.0, 134.0, 21.0, 18.0]
2025-08-07 05:51:03,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 71/100 (estimated time remaining: 51 minutes, 2 seconds)
2025-08-07 05:52:45,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:52:47,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 264.28278 ± 273.710
2025-08-07 05:52:47,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [612.50916, 225.77138, 366.48605, 5.5011835, 861.751, 10.248703, 294.68134, 239.16708, 9.703997, 17.007576]
2025-08-07 05:52:47,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [301.0, 125.0, 216.0, 16.0, 509.0, 23.0, 170.0, 136.0, 21.0, 30.0]
2025-08-07 05:52:47,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (264.28) for latency ExtremeSparseL4U32
2025-08-07 05:52:47,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 72/100 (estimated time remaining: 49 minutes, 31 seconds)
2025-08-07 05:54:28,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:54:29,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 110.35398 ± 129.733
2025-08-07 05:54:29,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [10.533422, 7.822322, 15.421968, 11.274357, 15.30173, 7.8080277, 368.78955, 280.70157, 193.83418, 192.05276]
2025-08-07 05:54:29,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [25.0, 21.0, 29.0, 24.0, 31.0, 21.0, 201.0, 161.0, 108.0, 265.0]
2025-08-07 05:54:29,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 73/100 (estimated time remaining: 47 minutes, 34 seconds)
2025-08-07 05:56:08,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:56:10,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 209.32913 ± 98.048
2025-08-07 05:56:10,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [260.0111, 204.07004, 250.94644, 231.74916, 228.24023, 375.8639, 291.03345, 149.45737, 13.611238, 88.30833]
2025-08-07 05:56:10,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [147.0, 132.0, 166.0, 303.0, 138.0, 192.0, 168.0, 99.0, 27.0, 123.0]
2025-08-07 05:56:10,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 74/100 (estimated time remaining: 46 minutes, 1 second)
2025-08-07 05:57:53,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:57:54,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 109.59889 ± 127.120
2025-08-07 05:57:54,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [9.315887, 6.6392837, 251.81715, 255.46443, 15.907412, 7.1510153, 6.8500156, 6.488668, 202.30542, 334.0496]
2025-08-07 05:57:54,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [21.0, 17.0, 140.0, 145.0, 26.0, 19.0, 21.0, 31.0, 117.0, 177.0]
2025-08-07 05:57:54,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 75/100 (estimated time remaining: 44 minutes, 17 seconds)
2025-08-07 05:59:34,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:59:37,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 343.61777 ± 314.864
2025-08-07 05:59:37,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [598.01575, 336.40018, 365.0996, 137.9965, 17.65038, 552.54767, 1078.1162, 64.042984, 277.2334, 9.075212]
2025-08-07 05:59:37,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [323.0, 182.0, 209.0, 82.0, 31.0, 305.0, 583.0, 94.0, 151.0, 20.0]
2025-08-07 05:59:37,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1226 [INFO]: New best (343.62) for latency ExtremeSparseL4U32
2025-08-07 05:59:37,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 76/100 (estimated time remaining: 42 minutes, 51 seconds)
2025-08-07 06:01:16,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:01:18,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 177.44914 ± 120.393
2025-08-07 06:01:18,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [9.541936, 161.02959, 193.37105, 274.19675, 340.93234, 264.4119, 283.75876, 233.30435, 11.116224, 2.8286355]
2025-08-07 06:01:18,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [22.0, 91.0, 118.0, 155.0, 172.0, 147.0, 139.0, 137.0, 24.0, 25.0]
2025-08-07 06:01:18,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 77/100 (estimated time remaining: 40 minutes, 51 seconds)
2025-08-07 06:02:58,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:03:00,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 139.29715 ± 191.025
2025-08-07 06:03:00,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [480.0253, 252.20715, 12.016239, 11.769482, 13.7626505, 8.601393, 505.03403, 5.8727107, 7.7046885, 95.97802]
2025-08-07 06:03:00,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [286.0, 137.0, 32.0, 28.0, 28.0, 21.0, 297.0, 16.0, 22.0, 205.0]
2025-08-07 06:03:00,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 78/100 (estimated time remaining: 39 minutes, 9 seconds)
2025-08-07 06:04:40,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:04:43,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 291.38757 ± 261.576
2025-08-07 06:04:43,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [193.54967, 777.476, 9.621606, 19.032425, 16.442745, 466.9361, 408.18036, 17.123856, 512.0163, 493.49664]
2025-08-07 06:04:43,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [105.0, 372.0, 26.0, 29.0, 32.0, 195.0, 193.0, 32.0, 261.0, 205.0]
2025-08-07 06:04:43,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 79/100 (estimated time remaining: 37 minutes, 33 seconds)
2025-08-07 06:06:23,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:06:25,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 180.61963 ± 237.900
2025-08-07 06:06:25,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [357.65997, 15.595827, 415.36563, 12.054758, 11.537048, 11.411653, 732.0941, 234.5341, 9.250717, 6.692495]
2025-08-07 06:06:25,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [207.0, 27.0, 353.0, 22.0, 21.0, 23.0, 510.0, 123.0, 23.0, 18.0]
2025-08-07 06:06:25,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 80/100 (estimated time remaining: 35 minutes, 46 seconds)
2025-08-07 06:08:06,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:08:07,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 139.57202 ± 149.018
2025-08-07 06:08:07,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [283.64902, 99.88248, 11.96711, 18.136919, 197.12463, 448.64493, 9.029238, 11.445123, 24.260988, 291.5798]
2025-08-07 06:08:07,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [161.0, 167.0, 30.0, 32.0, 113.0, 251.0, 23.0, 31.0, 33.0, 181.0]
2025-08-07 06:08:07,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 81/100 (estimated time remaining: 33 minutes, 59 seconds)
2025-08-07 06:09:48,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:09:49,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 72.50553 ± 124.124
2025-08-07 06:09:49,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [23.219212, 15.662199, 177.94211, 11.021193, 16.721983, 15.990983, 14.2453985, 21.717278, 12.975717, 415.55927]
2025-08-07 06:09:49,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [31.0, 27.0, 103.0, 21.0, 32.0, 26.0, 28.0, 33.0, 23.0, 180.0]
2025-08-07 06:09:49,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 82/100 (estimated time remaining: 32 minutes, 21 seconds)
2025-08-07 06:11:29,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:11:30,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 189.88835 ± 199.326
2025-08-07 06:11:30,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [389.37695, 20.246447, 278.47675, 13.896086, 8.636518, 324.70273, 6.384919, 243.7805, 10.86827, 602.51447]
2025-08-07 06:11:30,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [197.0, 32.0, 158.0, 24.0, 25.0, 160.0, 26.0, 138.0, 22.0, 303.0]
2025-08-07 06:11:30,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 83/100 (estimated time remaining: 30 minutes, 36 seconds)
2025-08-07 06:13:11,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:13:12,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 165.47672 ± 131.797
2025-08-07 06:13:12,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [217.47849, 270.61536, 18.015932, 202.24078, 7.934686, 372.48474, 264.68192, 14.617676, 9.935271, 276.76215]
2025-08-07 06:13:12,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [116.0, 146.0, 27.0, 106.0, 18.0, 196.0, 129.0, 38.0, 21.0, 140.0]
2025-08-07 06:13:12,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 84/100 (estimated time remaining: 28 minutes, 53 seconds)
2025-08-07 06:14:52,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:14:53,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 186.49838 ± 173.573
2025-08-07 06:14:53,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [7.3084044, 236.87592, 302.22305, 6.7300262, 6.3929243, 237.32875, 207.58078, 563.97314, 289.7885, 6.78221]
2025-08-07 06:14:53,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [18.0, 139.0, 182.0, 31.0, 19.0, 149.0, 118.0, 296.0, 160.0, 21.0]
2025-08-07 06:14:53,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 85/100 (estimated time remaining: 27 minutes, 6 seconds)
2025-08-07 06:16:36,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:16:38,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 172.94016 ± 138.403
2025-08-07 06:16:38,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [11.344454, 291.3597, 336.50473, 267.0185, 245.83083, 13.668433, 199.05101, 5.2442703, 347.37558, 12.003897]
2025-08-07 06:16:38,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [25.0, 203.0, 180.0, 147.0, 124.0, 29.0, 117.0, 28.0, 176.0, 26.0]
2025-08-07 06:16:38,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 86/100 (estimated time remaining: 25 minutes, 31 seconds)
2025-08-07 06:18:16,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:18:17,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 130.54265 ± 154.738
2025-08-07 06:18:17,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [215.95293, 10.766117, 9.888739, 328.69647, 410.8731, 8.45045, 9.2428255, 5.3421316, 13.326077, 292.88776]
2025-08-07 06:18:17,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [126.0, 23.0, 27.0, 183.0, 166.0, 26.0, 19.0, 16.0, 27.0, 168.0]
2025-08-07 06:18:17,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 87/100 (estimated time remaining: 23 minutes, 44 seconds)
2025-08-07 06:19:58,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:20:00,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 145.56694 ± 191.233
2025-08-07 06:20:00,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [7.651365, 308.45016, 17.038866, 16.680315, 15.850661, 617.5371, 214.61716, 9.546193, 231.96382, 16.333702]
2025-08-07 06:20:00,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [25.0, 157.0, 27.0, 33.0, 28.0, 479.0, 123.0, 33.0, 135.0, 26.0]
2025-08-07 06:20:00,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 88/100 (estimated time remaining: 22 minutes, 4 seconds)
2025-08-07 06:21:40,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:21:42,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 238.04948 ± 196.919
2025-08-07 06:21:42,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [12.166751, 531.5437, 326.01944, 262.8323, 313.68417, 461.81366, 9.980979, 16.32619, 429.02213, 17.105463]
2025-08-07 06:21:42,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [23.0, 245.0, 177.0, 137.0, 158.0, 222.0, 23.0, 32.0, 202.0, 31.0]
2025-08-07 06:21:42,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 89/100 (estimated time remaining: 20 minutes, 23 seconds)
2025-08-07 06:23:23,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:23:25,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 305.01883 ± 198.650
2025-08-07 06:23:25,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [7.134108, 428.81076, 366.64627, 349.81332, 410.9471, 13.40541, 476.15823, 496.36026, 8.959364, 491.95337]
2025-08-07 06:23:25,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [32.0, 198.0, 170.0, 168.0, 164.0, 30.0, 221.0, 235.0, 21.0, 201.0]
2025-08-07 06:23:25,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 90/100 (estimated time remaining: 18 minutes, 45 seconds)
2025-08-07 06:25:06,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:25:09,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 296.09125 ± 202.774
2025-08-07 06:25:09,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [9.063905, 498.93317, 10.537131, 339.51474, 14.510795, 309.29456, 480.04364, 511.8378, 280.98663, 506.18997]
2025-08-07 06:25:09,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [25.0, 221.0, 22.0, 179.0, 29.0, 157.0, 228.0, 279.0, 145.0, 253.0]
2025-08-07 06:25:09,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 91/100 (estimated time remaining: 17 minutes, 1 second)
2025-08-07 06:26:48,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:26:50,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 255.86604 ± 212.770
2025-08-07 06:26:50,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [10.328647, 424.1489, 480.23557, 8.443566, 10.572987, 460.12524, 504.3117, 7.9017434, 232.69702, 419.8951]
2025-08-07 06:26:50,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [21.0, 199.0, 216.0, 18.0, 30.0, 195.0, 241.0, 19.0, 129.0, 203.0]
2025-08-07 06:26:50,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 92/100 (estimated time remaining: 15 minutes, 22 seconds)
2025-08-07 06:28:33,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:28:35,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 172.23735 ± 186.856
2025-08-07 06:28:35,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [18.4094, 134.08527, 10.609089, 550.43353, 363.6156, 340.0121, 7.1993175, 276.1467, 7.1624837, 14.700085]
2025-08-07 06:28:35,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [32.0, 97.0, 21.0, 384.0, 168.0, 164.0, 21.0, 132.0, 17.0, 33.0]
2025-08-07 06:28:35,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 93/100 (estimated time remaining: 13 minutes, 44 seconds)
2025-08-07 06:30:16,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:30:19,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 318.56241 ± 166.708
2025-08-07 06:30:19,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [16.27821, 327.24716, 18.21764, 310.63272, 362.82178, 301.46265, 535.89935, 474.3271, 386.80536, 451.9319]
2025-08-07 06:30:19,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [30.0, 161.0, 28.0, 196.0, 163.0, 154.0, 260.0, 226.0, 217.0, 247.0]
2025-08-07 06:30:19,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 94/100 (estimated time remaining: 12 minutes, 3 seconds)
2025-08-07 06:31:57,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:31:58,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 199.20058 ± 205.014
2025-08-07 06:31:58,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [11.862175, 419.75034, 322.06937, 16.913042, 10.890265, 23.463537, 17.466282, 299.4477, 251.2876, 618.85535]
2025-08-07 06:31:58,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [25.0, 186.0, 148.0, 33.0, 25.0, 32.0, 29.0, 144.0, 141.0, 321.0]
2025-08-07 06:31:59,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 16 seconds)
2025-08-07 06:33:38,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:33:40,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 262.02835 ± 222.348
2025-08-07 06:33:40,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [13.345154, 521.23114, 8.168433, 396.84665, 224.0017, 18.220297, 569.816, 505.84204, 346.73666, 16.075588]
2025-08-07 06:33:40,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [37.0, 215.0, 22.0, 215.0, 122.0, 29.0, 227.0, 240.0, 194.0, 27.0]
2025-08-07 06:33:40,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 31 seconds)
2025-08-07 06:35:21,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:35:22,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 110.26711 ± 158.152
2025-08-07 06:35:22,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [336.8753, 12.906979, 13.928618, 13.195146, 447.1301, 9.0709095, 239.25392, 19.06544, 3.277585, 7.9670277]
2025-08-07 06:35:22,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [168.0, 28.0, 26.0, 32.0, 218.0, 26.0, 123.0, 31.0, 30.0, 20.0]
2025-08-07 06:35:22,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 49 seconds)
2025-08-07 06:37:03,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:37:04,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 195.92679 ± 253.626
2025-08-07 06:37:04,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [661.2471, 9.45139, 12.715756, 18.978182, 642.45074, 247.43709, 10.119295, 3.656579, 339.104, 14.107649]
2025-08-07 06:37:04,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [281.0, 20.0, 36.0, 28.0, 378.0, 139.0, 28.0, 24.0, 182.0, 26.0]
2025-08-07 06:37:04,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 5 seconds)
2025-08-07 06:38:45,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:38:47,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 242.47261 ± 215.423
2025-08-07 06:38:47,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [534.09503, 12.704653, 123.18246, 468.7427, 5.2201543, 421.15643, 10.268568, 10.538585, 453.0401, 385.77728]
2025-08-07 06:38:47,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [249.0, 21.0, 248.0, 225.0, 19.0, 185.0, 21.0, 22.0, 206.0, 217.0]
2025-08-07 06:38:47,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 23 seconds)
2025-08-07 06:40:29,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:40:31,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 256.63004 ± 208.990
2025-08-07 06:40:31,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [15.637793, 10.965869, 496.81445, 350.4606, 421.45654, 9.785465, 7.4337516, 297.69675, 446.943, 509.10608]
2025-08-07 06:40:31,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [30.0, 27.0, 241.0, 183.0, 191.0, 24.0, 28.0, 163.0, 203.0, 239.0]
2025-08-07 06:40:31,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 42 seconds)
2025-08-07 06:42:09,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:42:11,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1221 [DEBUG]: Total Reward: 215.45593 ± 197.732
2025-08-07 06:42:11,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1222 [DEBUG]: All rewards: [636.36096, 11.8367195, 13.384531, 410.53494, 305.92435, 296.86087, 19.897884, 14.983098, 214.33057, 230.44522]
2025-08-07 06:42:11,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1223 [DEBUG]: All trajectory lengths: [277.0, 26.0, 23.0, 186.0, 144.0, 156.0, 32.0, 24.0, 128.0, 119.0]
2025-08-07 06:42:11,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-walker2d):1251 [DEBUG]: Training session finished
