2025-08-07 03:44:55,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc0-walker2d/ExtremeSparseL4U32-bpql-mem32
2025-08-07 03:44:55,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc0-walker2d/ExtremeSparseL4U32-bpql-mem32
2025-08-07 03:44:55,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x14718a3efc10>}
2025-08-07 03:44:55,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1111 [DEBUG]: using device: cuda
2025-08-07 03:44:55,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1133 [INFO]: Creating new trainer
2025-08-07 03:44:55,188 baseline-bpql-noiseperc0-walker2d:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=209, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 03:44:55,188 baseline-bpql-noiseperc0-walker2d:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 03:44:56,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1194 [DEBUG]: Starting training session...
2025-08-07 03:44:56,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 1/100
2025-08-07 03:46:31,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:46:32,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 46.98864 ± 17.618
2025-08-07 03:46:32,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [33.45883, 43.89282, 47.432693, 95.1874, 48.0663, 47.1435, 36.103413, 53.579628, 34.208797, 30.812983]
2025-08-07 03:46:32,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [121.0, 96.0, 120.0, 169.0, 94.0, 120.0, 93.0, 94.0, 94.0, 94.0]
2025-08-07 03:46:32,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (46.99) for latency ExtremeSparseL4U32
2025-08-07 03:46:32,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 39 minutes, 29 seconds)
2025-08-07 03:48:13,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:48:15,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 11.42465 ± 13.126
2025-08-07 03:48:15,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [5.9298186, 7.873048, 10.044012, 39.27, -6.435177, 14.871997, 19.043331, 2.0016067, 25.389454, -3.7416115]
2025-08-07 03:48:15,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [114.0, 149.0, 113.0, 56.0, 121.0, 103.0, 111.0, 112.0, 105.0, 113.0]
2025-08-07 03:48:15,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 42 minutes, 51 seconds)
2025-08-07 03:49:57,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:50:00,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 46.93734 ± 33.522
2025-08-07 03:50:00,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [51.456455, 55.73253, 79.833786, 49.994324, 41.11158, 22.457476, 10.861924, -16.0731, 108.59891, 65.3995]
2025-08-07 03:50:00,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [87.0, 104.0, 219.0, 157.0, 121.0, 139.0, 181.0, 128.0, 330.0, 118.0]
2025-08-07 03:50:00,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 43 minutes, 44 seconds)
2025-08-07 03:51:42,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:51:44,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 182.02936 ± 36.473
2025-08-07 03:51:44,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [236.33879, 135.06955, 141.05272, 179.7915, 227.12297, 171.49117, 157.00598, 216.3748, 142.7985, 213.24759]
2025-08-07 03:51:44,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [131.0, 102.0, 107.0, 119.0, 153.0, 122.0, 116.0, 135.0, 126.0, 134.0]
2025-08-07 03:51:44,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (182.03) for latency ExtremeSparseL4U32
2025-08-07 03:51:44,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 43 minutes, 13 seconds)
2025-08-07 03:53:25,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:53:27,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 140.99739 ± 72.398
2025-08-07 03:53:27,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [223.10239, 93.30413, 219.52835, 54.840565, 147.49252, 165.98349, 270.00482, 101.59732, 64.50603, 69.61432]
2025-08-07 03:53:27,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [150.0, 111.0, 171.0, 79.0, 154.0, 168.0, 201.0, 165.0, 82.0, 115.0]
2025-08-07 03:53:27,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 41 minutes, 59 seconds)
2025-08-07 03:55:10,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:55:12,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 196.84802 ± 46.785
2025-08-07 03:55:12,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [209.70038, 143.23094, 172.63919, 248.90833, 243.61008, 180.41022, 107.443504, 216.49542, 182.1189, 263.92325]
2025-08-07 03:55:12,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [147.0, 124.0, 121.0, 162.0, 153.0, 111.0, 111.0, 138.0, 129.0, 178.0]
2025-08-07 03:55:12,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (196.85) for latency ExtremeSparseL4U32
2025-08-07 03:55:12,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 42 minutes, 48 seconds)
2025-08-07 03:56:53,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:56:55,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 168.46683 ± 39.095
2025-08-07 03:56:55,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [219.08595, 193.53494, 245.67555, 165.48573, 133.95172, 179.1842, 114.53757, 142.04051, 136.54077, 154.63133]
2025-08-07 03:56:55,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [134.0, 115.0, 145.0, 108.0, 125.0, 109.0, 113.0, 99.0, 115.0, 105.0]
2025-08-07 03:56:55,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 41 minutes, 7 seconds)
2025-08-07 03:58:37,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:58:39,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 131.97507 ± 10.034
2025-08-07 03:58:39,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [109.52813, 132.00005, 149.85521, 127.62344, 136.36998, 131.46318, 125.73901, 142.0949, 133.71545, 131.36125]
2025-08-07 03:58:39,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [107.0, 129.0, 110.0, 128.0, 137.0, 99.0, 132.0, 130.0, 132.0, 131.0]
2025-08-07 03:58:39,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 39 minutes, 15 seconds)
2025-08-07 04:00:21,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:00:23,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 138.67976 ± 31.442
2025-08-07 04:00:23,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [150.94513, 144.18922, 115.77765, 85.31757, 184.82948, 140.41145, 173.95645, 137.79874, 163.3377, 90.234375]
2025-08-07 04:00:23,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [143.0, 127.0, 127.0, 92.0, 160.0, 129.0, 143.0, 127.0, 142.0, 99.0]
2025-08-07 04:00:23,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 37 minutes, 23 seconds)
2025-08-07 04:02:03,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:02:04,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 103.13937 ± 12.980
2025-08-07 04:02:04,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [97.58433, 117.663994, 116.80861, 87.63358, 89.13123, 121.40334, 101.221535, 95.58325, 116.72872, 87.63511]
2025-08-07 04:02:04,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [123.0, 110.0, 123.0, 126.0, 121.0, 123.0, 124.0, 122.0, 122.0, 104.0]
2025-08-07 04:02:04,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 35 minutes, 9 seconds)
2025-08-07 04:03:47,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:03:48,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 55.12899 ± 5.748
2025-08-07 04:03:48,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [47.70939, 52.128716, 59.950344, 50.30669, 57.195923, 63.047062, 58.555103, 48.53808, 63.683495, 50.1751]
2025-08-07 04:03:48,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [61.0, 63.0, 69.0, 66.0, 66.0, 67.0, 66.0, 63.0, 69.0, 64.0]
2025-08-07 04:03:48,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 32 minutes, 59 seconds)
2025-08-07 04:05:29,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:05:31,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 230.12497 ± 35.053
2025-08-07 04:05:31,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [249.2018, 228.89116, 277.6565, 218.92763, 201.4575, 157.93823, 266.75946, 247.59377, 196.16791, 256.65585]
2025-08-07 04:05:31,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [179.0, 144.0, 166.0, 134.0, 121.0, 98.0, 152.0, 158.0, 120.0, 144.0]
2025-08-07 04:05:31,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (230.12) for latency ExtremeSparseL4U32
2025-08-07 04:05:31,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 31 minutes, 23 seconds)
2025-08-07 04:07:12,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:07:14,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 187.10001 ± 40.897
2025-08-07 04:07:14,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [143.6758, 212.1745, 280.39658, 179.99756, 167.9478, 177.02075, 136.33553, 168.79384, 173.93771, 230.72003]
2025-08-07 04:07:14,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [100.0, 113.0, 155.0, 112.0, 106.0, 105.0, 92.0, 105.0, 107.0, 132.0]
2025-08-07 04:07:14,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 29 minutes, 25 seconds)
2025-08-07 04:08:56,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:08:58,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 144.94133 ± 67.658
2025-08-07 04:08:58,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [87.69352, 101.09275, 85.141785, 80.07722, 175.36804, 292.00266, 103.69055, 193.11887, 217.39743, 113.83057]
2025-08-07 04:08:58,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [92.0, 90.0, 79.0, 81.0, 111.0, 212.0, 88.0, 118.0, 120.0, 127.0]
2025-08-07 04:08:58,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 27 minutes, 46 seconds)
2025-08-07 04:10:39,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:10:41,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 172.59274 ± 96.319
2025-08-07 04:10:41,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [88.371056, 94.17069, 92.395515, 315.84723, 87.48445, 129.23033, 303.35214, 83.23821, 244.1199, 287.71802]
2025-08-07 04:10:41,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [85.0, 85.0, 86.0, 159.0, 82.0, 97.0, 164.0, 79.0, 135.0, 157.0]
2025-08-07 04:10:41,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 26 minutes, 13 seconds)
2025-08-07 04:12:22,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:12:24,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 153.57867 ± 91.106
2025-08-07 04:12:24,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [203.21071, 101.75818, 95.16197, 146.14838, 381.85144, 131.9834, 89.315186, 82.56242, 72.42247, 231.3726]
2025-08-07 04:12:24,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [139.0, 111.0, 129.0, 148.0, 219.0, 119.0, 121.0, 95.0, 111.0, 157.0]
2025-08-07 04:12:24,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 24 minutes, 30 seconds)
2025-08-07 04:14:05,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:14:07,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 209.79292 ± 68.413
2025-08-07 04:14:07,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [271.2981, 143.74467, 178.22383, 263.58612, 140.30423, 349.3952, 108.77384, 224.66446, 212.34755, 205.5911]
2025-08-07 04:14:07,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [152.0, 133.0, 125.0, 145.0, 106.0, 194.0, 109.0, 156.0, 134.0, 132.0]
2025-08-07 04:14:07,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 22 minutes, 40 seconds)
2025-08-07 04:15:47,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:15:48,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 150.90485 ± 72.663
2025-08-07 04:15:48,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [177.31906, 224.67838, 286.9505, 42.25168, 128.02075, 41.833786, 106.18455, 155.34926, 151.91695, 194.54356]
2025-08-07 04:15:48,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [98.0, 133.0, 155.0, 46.0, 95.0, 46.0, 92.0, 99.0, 113.0, 115.0]
2025-08-07 04:15:48,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 20 minutes, 34 seconds)
2025-08-07 04:17:29,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:17:32,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 277.70770 ± 75.681
2025-08-07 04:17:32,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [403.05685, 184.38405, 268.02005, 307.1072, 220.20738, 250.34895, 151.48235, 338.8772, 367.71204, 285.8814]
2025-08-07 04:17:32,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [189.0, 140.0, 136.0, 155.0, 209.0, 127.0, 114.0, 166.0, 172.0, 149.0]
2025-08-07 04:17:32,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (277.71) for latency ExtremeSparseL4U32
2025-08-07 04:17:32,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 18 minutes, 43 seconds)
2025-08-07 04:19:11,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:19:13,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 280.81741 ± 54.348
2025-08-07 04:19:13,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [266.89948, 255.47421, 161.51463, 294.0988, 298.64563, 391.35608, 263.35284, 300.6042, 309.92502, 266.3032]
2025-08-07 04:19:13,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [154.0, 128.0, 127.0, 139.0, 144.0, 168.0, 140.0, 140.0, 152.0, 139.0]
2025-08-07 04:19:13,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (280.82) for latency ExtremeSparseL4U32
2025-08-07 04:19:13,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 16 minutes, 39 seconds)
2025-08-07 04:20:53,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:20:57,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 297.60611 ± 122.709
2025-08-07 04:20:57,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [169.58525, 586.44836, 244.18211, 248.56053, 333.02924, 238.88043, 321.52716, 128.84383, 299.06177, 405.94235]
2025-08-07 04:20:57,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [172.0, 372.0, 169.0, 148.0, 179.0, 154.0, 188.0, 426.0, 156.0, 227.0]
2025-08-07 04:20:57,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (297.61) for latency ExtremeSparseL4U32
2025-08-07 04:20:57,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 15 minutes, 2 seconds)
2025-08-07 04:22:38,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:22:41,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 358.67514 ± 88.485
2025-08-07 04:22:41,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [336.39264, 367.05066, 444.7284, 437.2894, 445.61993, 234.26472, 381.20425, 224.1497, 464.0945, 251.95732]
2025-08-07 04:22:41,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [159.0, 191.0, 203.0, 214.0, 214.0, 146.0, 173.0, 156.0, 252.0, 152.0]
2025-08-07 04:22:41,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (358.68) for latency ExtremeSparseL4U32
2025-08-07 04:22:41,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 13 minutes, 37 seconds)
2025-08-07 04:24:19,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:24:22,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 225.70181 ± 68.240
2025-08-07 04:24:22,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [386.34256, 163.92552, 252.47914, 148.32298, 201.96071, 266.10596, 233.53612, 208.49648, 142.49727, 253.35124]
2025-08-07 04:24:22,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [166.0, 137.0, 160.0, 120.0, 147.0, 157.0, 145.0, 140.0, 122.0, 157.0]
2025-08-07 04:24:22,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 11 minutes, 44 seconds)
2025-08-07 04:26:02,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:26:03,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 87.28738 ± 43.187
2025-08-07 04:26:03,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [48.943035, 54.281155, 163.70453, 134.72784, 49.18824, 125.301544, 129.18208, 47.71817, 71.84915, 47.978073]
2025-08-07 04:26:03,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [53.0, 56.0, 88.0, 79.0, 54.0, 79.0, 79.0, 53.0, 66.0, 53.0]
2025-08-07 04:26:03,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 9 minutes, 32 seconds)
2025-08-07 04:27:43,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:27:46,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 376.58966 ± 146.531
2025-08-07 04:27:46,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [565.73364, 257.24255, 205.28445, 396.15155, 313.31097, 377.0927, 349.38223, 304.65933, 280.2143, 716.82465]
2025-08-07 04:27:46,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [236.0, 157.0, 145.0, 254.0, 171.0, 165.0, 191.0, 230.0, 179.0, 294.0]
2025-08-07 04:27:46,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (376.59) for latency ExtremeSparseL4U32
2025-08-07 04:27:46,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 8 minutes, 14 seconds)
2025-08-07 04:29:27,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:29:29,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 148.77866 ± 25.190
2025-08-07 04:29:29,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [110.4019, 125.645325, 167.12234, 170.35275, 153.06068, 135.36206, 182.2667, 151.8054, 178.97415, 112.79539]
2025-08-07 04:29:29,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [117.0, 118.0, 144.0, 139.0, 136.0, 121.0, 138.0, 123.0, 138.0, 114.0]
2025-08-07 04:29:29,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 6 minutes, 23 seconds)
2025-08-07 04:31:08,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:31:11,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 269.79962 ± 33.672
2025-08-07 04:31:11,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [220.2333, 281.10617, 341.80548, 279.07047, 292.1553, 233.45346, 263.2287, 232.05788, 283.66217, 271.22305]
2025-08-07 04:31:11,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [126.0, 147.0, 243.0, 143.0, 146.0, 128.0, 134.0, 127.0, 145.0, 141.0]
2025-08-07 04:31:11,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 4 minutes, 5 seconds)
2025-08-07 04:32:51,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:32:53,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 113.39236 ± 16.930
2025-08-07 04:32:53,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [121.31294, 126.44799, 97.72773, 134.13377, 95.62926, 146.08475, 104.79405, 94.59916, 101.41934, 111.77457]
2025-08-07 04:32:53,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [117.0, 129.0, 113.0, 136.0, 116.0, 131.0, 141.0, 99.0, 99.0, 119.0]
2025-08-07 04:32:53,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 2 minutes, 43 seconds)
2025-08-07 04:34:33,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:34:35,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 293.77539 ± 46.926
2025-08-07 04:34:35,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [334.5146, 272.84665, 361.6412, 328.57785, 201.11827, 246.58897, 281.4037, 332.83746, 317.1459, 261.07904]
2025-08-07 04:34:35,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [166.0, 161.0, 174.0, 149.0, 109.0, 137.0, 135.0, 164.0, 156.0, 141.0]
2025-08-07 04:34:35,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 1 minute, 6 seconds)
2025-08-07 04:36:18,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:36:22,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 303.63940 ± 151.457
2025-08-07 04:36:22,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [71.697495, 256.40384, 56.2738, 355.11328, 474.9602, 464.5854, 422.2683, 367.50394, 416.70367, 150.88417]
2025-08-07 04:36:22,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [136.0, 162.0, 116.0, 277.0, 333.0, 286.0, 238.0, 200.0, 251.0, 158.0]
2025-08-07 04:36:22,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 19 seconds)
2025-08-07 04:37:59,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:38:02,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 307.35104 ± 115.515
2025-08-07 04:38:02,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [283.96683, 90.463264, 435.06335, 356.60986, 445.7905, 127.38229, 414.42624, 325.61682, 339.22992, 254.96132]
2025-08-07 04:38:02,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [159.0, 125.0, 239.0, 184.0, 228.0, 166.0, 229.0, 191.0, 181.0, 157.0]
2025-08-07 04:38:02,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 58 minutes, 3 seconds)
2025-08-07 04:39:41,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:39:45,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 401.43988 ± 44.252
2025-08-07 04:39:45,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [416.87943, 354.95853, 364.14288, 369.73697, 491.03888, 420.27945, 372.70337, 352.97958, 415.05786, 456.62155]
2025-08-07 04:39:45,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [214.0, 191.0, 195.0, 187.0, 288.0, 207.0, 187.0, 183.0, 207.0, 239.0]
2025-08-07 04:39:45,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (401.44) for latency ExtremeSparseL4U32
2025-08-07 04:39:45,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 56 minutes, 29 seconds)
2025-08-07 04:41:25,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:41:29,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 398.19000 ± 174.746
2025-08-07 04:41:29,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [353.62567, 489.7256, 207.8926, 358.59198, 853.63477, 284.0811, 349.5614, 275.86655, 306.07846, 502.84232]
2025-08-07 04:41:29,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [204.0, 292.0, 194.0, 208.0, 474.0, 297.0, 207.0, 158.0, 181.0, 293.0]
2025-08-07 04:41:29,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 55 minutes, 10 seconds)
2025-08-07 04:43:10,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:43:12,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 250.44156 ± 40.151
2025-08-07 04:43:12,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [280.33865, 270.13293, 267.85022, 194.86743, 327.41656, 254.42807, 236.84235, 177.84598, 251.36594, 243.32741]
2025-08-07 04:43:12,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [161.0, 156.0, 155.0, 121.0, 189.0, 147.0, 145.0, 114.0, 150.0, 144.0]
2025-08-07 04:43:12,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 53 minutes, 43 seconds)
2025-08-07 04:44:49,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:44:51,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 148.13220 ± 173.286
2025-08-07 04:44:51,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [24.145832, 93.67945, 525.5352, 66.906494, 28.603212, 33.095776, 205.50597, 25.484423, 51.465576, 426.9002]
2025-08-07 04:44:51,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [115.0, 136.0, 342.0, 105.0, 116.0, 109.0, 173.0, 113.0, 100.0, 272.0]
2025-08-07 04:44:51,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 50 minutes, 26 seconds)
2025-08-07 04:46:32,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:46:35,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 177.91348 ± 162.354
2025-08-07 04:46:35,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [81.1681, 312.90668, 54.873493, 244.2035, 79.615616, 84.59756, 61.074467, 600.1778, 158.56404, 101.953445]
2025-08-07 04:46:35,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [135.0, 191.0, 125.0, 149.0, 130.0, 120.0, 122.0, 318.0, 103.0, 173.0]
2025-08-07 04:46:35,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 49 minutes, 20 seconds)
2025-08-07 04:48:13,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:48:16,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 120.46494 ± 123.039
2025-08-07 04:48:16,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [103.25254, 74.69015, 74.45695, 77.11379, 487.8303, 84.57195, 59.680016, 83.1054, 93.811226, 66.13705]
2025-08-07 04:48:16,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [169.0, 105.0, 130.0, 108.0, 357.0, 158.0, 113.0, 110.0, 145.0, 120.0]
2025-08-07 04:48:16,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 47 minutes, 21 seconds)
2025-08-07 04:49:54,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:49:58,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 339.89432 ± 102.806
2025-08-07 04:49:58,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [431.61096, 327.78833, 305.28336, 165.188, 449.99667, 476.7915, 215.36919, 435.0549, 240.60951, 351.25046]
2025-08-07 04:49:58,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [212.0, 234.0, 268.0, 174.0, 272.0, 267.0, 217.0, 288.0, 208.0, 198.0]
2025-08-07 04:49:58,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 45 minutes, 15 seconds)
2025-08-07 04:51:38,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:51:40,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 101.05338 ± 12.375
2025-08-07 04:51:40,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [104.08338, 109.46155, 77.796394, 102.3032, 84.83365, 111.92573, 98.99347, 122.54779, 104.54929, 94.03926]
2025-08-07 04:51:40,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [114.0, 113.0, 108.0, 112.0, 114.0, 116.0, 113.0, 118.0, 113.0, 97.0]
2025-08-07 04:51:40,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 43 minutes, 15 seconds)
2025-08-07 04:53:19,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:53:22,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 329.78369 ± 70.870
2025-08-07 04:53:22,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [346.0934, 271.5248, 280.18484, 259.22922, 345.6434, 292.19705, 514.56775, 374.8119, 317.58978, 295.9946]
2025-08-07 04:53:22,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [184.0, 149.0, 157.0, 143.0, 192.0, 163.0, 240.0, 206.0, 171.0, 162.0]
2025-08-07 04:53:22,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 42 minutes, 9 seconds)
2025-08-07 04:55:03,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:55:07,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 368.33575 ± 210.171
2025-08-07 04:55:07,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [230.79202, 397.927, 81.379135, 631.2846, 598.8819, 204.1299, 216.09724, 514.86115, 671.52185, 136.4827]
2025-08-07 04:55:07,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [220.0, 217.0, 130.0, 324.0, 329.0, 167.0, 202.0, 280.0, 337.0, 124.0]
2025-08-07 04:55:07,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 40 minutes, 38 seconds)
2025-08-07 04:56:47,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:56:51,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 321.73749 ± 172.881
2025-08-07 04:56:51,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [143.03998, 506.6434, 318.5082, 400.79755, 147.23935, 578.89355, 110.992516, 493.32315, 103.760185, 414.17694]
2025-08-07 04:56:51,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [227.0, 254.0, 209.0, 230.0, 129.0, 319.0, 130.0, 284.0, 177.0, 218.0]
2025-08-07 04:56:51,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 39 minutes, 35 seconds)
2025-08-07 04:58:29,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:58:32,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 211.71806 ± 145.725
2025-08-07 04:58:32,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [568.91907, 130.59543, 134.03365, 267.93954, 384.64087, 148.13258, 114.03763, 125.28335, 91.31507, 152.2832]
2025-08-07 04:58:32,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [301.0, 141.0, 140.0, 212.0, 211.0, 147.0, 128.0, 132.0, 137.0, 149.0]
2025-08-07 04:58:32,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 37 minutes, 35 seconds)
2025-08-07 05:00:11,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:00:15,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 406.22336 ± 139.324
2025-08-07 05:00:15,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [573.7808, 390.06152, 464.37137, 425.02612, 553.79926, 155.76514, 195.70934, 306.73743, 442.39923, 554.58356]
2025-08-07 05:00:15,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [306.0, 204.0, 232.0, 228.0, 240.0, 141.0, 143.0, 222.0, 251.0, 262.0]
2025-08-07 05:00:15,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (406.22) for latency ExtremeSparseL4U32
2025-08-07 05:00:15,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 36 minutes, 8 seconds)
2025-08-07 05:01:55,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:01:58,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 320.75949 ± 83.831
2025-08-07 05:01:58,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [257.47092, 445.46414, 231.73653, 358.81723, 364.7611, 380.08484, 373.56122, 191.89674, 390.74173, 213.06047]
2025-08-07 05:01:58,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [139.0, 213.0, 133.0, 170.0, 168.0, 218.0, 177.0, 142.0, 209.0, 118.0]
2025-08-07 05:01:58,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 34 minutes, 30 seconds)
2025-08-07 05:03:36,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:03:40,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 461.54083 ± 148.166
2025-08-07 05:03:40,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [483.564, 386.99414, 354.7713, 557.69116, 606.4757, 344.3781, 162.6795, 668.65204, 430.7116, 619.4906]
2025-08-07 05:03:40,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [198.0, 175.0, 165.0, 329.0, 251.0, 158.0, 119.0, 288.0, 169.0, 277.0]
2025-08-07 05:03:40,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (461.54) for latency ExtremeSparseL4U32
2025-08-07 05:03:40,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 32 minutes, 22 seconds)
2025-08-07 05:05:19,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:05:22,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 434.45938 ± 126.152
2025-08-07 05:05:22,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [402.7837, 394.9567, 703.33453, 384.44162, 642.2268, 369.2622, 289.6071, 357.81946, 450.81116, 349.3507]
2025-08-07 05:05:22,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [167.0, 180.0, 277.0, 162.0, 266.0, 169.0, 209.0, 153.0, 196.0, 159.0]
2025-08-07 05:05:22,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 30 minutes, 16 seconds)
2025-08-07 05:07:03,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:07:07,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 373.42981 ± 119.721
2025-08-07 05:07:07,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [250.32062, 373.32806, 278.1641, 398.41858, 458.32724, 232.64517, 506.8667, 243.80011, 380.7017, 611.7258]
2025-08-07 05:07:07,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [159.0, 171.0, 169.0, 213.0, 217.0, 155.0, 273.0, 156.0, 170.0, 319.0]
2025-08-07 05:07:07,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 29 minutes, 14 seconds)
2025-08-07 05:08:44,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:08:47,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 481.87030 ± 207.254
2025-08-07 05:08:47,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [255.39832, 715.08923, 741.0391, 181.73949, 454.55188, 692.7956, 196.58427, 392.04553, 537.95917, 651.5007]
2025-08-07 05:08:47,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [198.0, 277.0, 318.0, 138.0, 182.0, 281.0, 159.0, 177.0, 213.0, 265.0]
2025-08-07 05:08:47,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (481.87) for latency ExtremeSparseL4U32
2025-08-07 05:08:47,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 27 minutes, 11 seconds)
2025-08-07 05:10:29,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:10:32,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 325.05106 ± 44.108
2025-08-07 05:10:32,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [359.5735, 334.42978, 208.79565, 356.10233, 342.20667, 325.39957, 373.63046, 308.33395, 340.11768, 301.92117]
2025-08-07 05:10:32,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [177.0, 162.0, 222.0, 185.0, 165.0, 156.0, 181.0, 151.0, 189.0, 152.0]
2025-08-07 05:10:32,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 25 minutes, 42 seconds)
2025-08-07 05:12:10,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:12:11,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 110.12505 ± 3.479
2025-08-07 05:12:11,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [108.093895, 107.53954, 108.68944, 106.74131, 107.95412, 107.285835, 116.2363, 113.132614, 109.3734, 116.20399]
2025-08-07 05:12:11,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [73.0, 73.0, 73.0, 73.0, 74.0, 73.0, 77.0, 75.0, 73.0, 76.0]
2025-08-07 05:12:11,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 23 minutes, 28 seconds)
2025-08-07 05:13:50,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:13:52,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 323.94940 ± 98.975
2025-08-07 05:13:52,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [369.7093, 152.91158, 152.52814, 373.31213, 354.67697, 297.9955, 406.4258, 485.5241, 311.7381, 334.6723]
2025-08-07 05:13:52,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [154.0, 86.0, 85.0, 170.0, 151.0, 127.0, 169.0, 229.0, 129.0, 138.0]
2025-08-07 05:13:52,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 21 minutes, 38 seconds)
2025-08-07 05:15:31,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:15:33,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 294.56036 ± 10.846
2025-08-07 05:15:33,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [307.41263, 295.76553, 304.04623, 306.46567, 277.84515, 304.39682, 288.89554, 276.7925, 287.46875, 296.51477]
2025-08-07 05:15:33,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [140.0, 136.0, 140.0, 141.0, 134.0, 139.0, 135.0, 133.0, 136.0, 139.0]
2025-08-07 05:15:33,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 19 minutes, 20 seconds)
2025-08-07 05:17:13,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:17:16,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 507.57977 ± 153.814
2025-08-07 05:17:16,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [436.77603, 328.3833, 330.1534, 617.5789, 454.88287, 394.6529, 805.58136, 663.539, 406.3615, 637.8889]
2025-08-07 05:17:16,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [196.0, 157.0, 155.0, 237.0, 188.0, 166.0, 307.0, 249.0, 170.0, 258.0]
2025-08-07 05:17:16,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (507.58) for latency ExtremeSparseL4U32
2025-08-07 05:17:16,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 17 minutes, 58 seconds)
2025-08-07 05:18:56,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:18:59,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 485.72705 ± 54.540
2025-08-07 05:18:59,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [533.3397, 531.7346, 461.9991, 425.71692, 465.1115, 528.34326, 523.8519, 360.67276, 526.4849, 500.0157]
2025-08-07 05:18:59,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [196.0, 197.0, 172.0, 156.0, 168.0, 196.0, 192.0, 140.0, 195.0, 185.0]
2025-08-07 05:18:59,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 16 minutes)
2025-08-07 05:20:39,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:20:41,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 459.44794 ± 145.361
2025-08-07 05:20:41,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [853.1413, 338.59808, 382.80313, 375.5423, 391.798, 420.80078, 584.99536, 391.6535, 414.8525, 440.29453]
2025-08-07 05:20:41,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [310.0, 151.0, 161.0, 165.0, 170.0, 177.0, 219.0, 170.0, 170.0, 167.0]
2025-08-07 05:20:42,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 14 minutes, 54 seconds)
2025-08-07 05:22:20,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:22:23,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 369.84836 ± 101.338
2025-08-07 05:22:23,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [356.79324, 130.0743, 402.53653, 430.73566, 258.60187, 410.56412, 525.28467, 384.02426, 394.8536, 405.01532]
2025-08-07 05:22:23,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [149.0, 84.0, 155.0, 166.0, 136.0, 161.0, 215.0, 153.0, 159.0, 159.0]
2025-08-07 05:22:23,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 13 minutes, 12 seconds)
2025-08-07 05:24:02,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:24:06,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 694.70911 ± 105.025
2025-08-07 05:24:06,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [737.6201, 822.1949, 808.26855, 624.6654, 675.3591, 692.4123, 608.82025, 710.5648, 805.24524, 461.9406]
2025-08-07 05:24:06,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [295.0, 309.0, 334.0, 258.0, 268.0, 282.0, 259.0, 329.0, 314.0, 208.0]
2025-08-07 05:24:06,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (694.71) for latency ExtremeSparseL4U32
2025-08-07 05:24:06,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 11 minutes, 51 seconds)
2025-08-07 05:25:46,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:25:48,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 135.40634 ± 5.434
2025-08-07 05:25:48,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [128.98523, 129.17868, 144.23907, 143.34047, 131.43747, 137.52385, 132.85019, 133.40889, 132.02708, 141.07239]
2025-08-07 05:25:48,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [86.0, 88.0, 93.0, 91.0, 88.0, 90.0, 88.0, 88.0, 88.0, 91.0]
2025-08-07 05:25:48,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 9 minutes, 56 seconds)
2025-08-07 05:27:27,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:27:31,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 429.30063 ± 239.119
2025-08-07 05:27:31,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [314.3466, 217.83148, 234.86392, 246.95107, 668.46643, 373.02225, 567.94385, 994.017, 220.4295, 455.13388]
2025-08-07 05:27:31,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [264.0, 180.0, 187.0, 226.0, 264.0, 184.0, 242.0, 484.0, 182.0, 240.0]
2025-08-07 05:27:31,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 8 minutes, 18 seconds)
2025-08-07 05:29:11,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:29:13,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 392.18231 ± 15.029
2025-08-07 05:29:13,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [410.18643, 421.68103, 380.49484, 404.87042, 376.71106, 374.16455, 378.63907, 394.7137, 388.62183, 391.7403]
2025-08-07 05:29:13,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [183.0, 188.0, 181.0, 175.0, 170.0, 170.0, 182.0, 173.0, 182.0, 179.0]
2025-08-07 05:29:13,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 6 minutes, 32 seconds)
2025-08-07 05:30:53,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:30:56,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 455.34000 ± 173.524
2025-08-07 05:30:56,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [403.83786, 494.17667, 340.53632, 375.0316, 411.95435, 844.8436, 304.5181, 370.52225, 714.99164, 292.9875]
2025-08-07 05:30:56,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [174.0, 219.0, 163.0, 167.0, 177.0, 322.0, 152.0, 176.0, 266.0, 182.0]
2025-08-07 05:30:56,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 4 minutes, 57 seconds)
2025-08-07 05:32:37,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:32:39,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 393.96957 ± 15.043
2025-08-07 05:32:39,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [392.2206, 366.4305, 397.1559, 412.146, 409.7101, 391.8615, 415.2456, 380.36264, 377.75882, 396.80423]
2025-08-07 05:32:39,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [161.0, 148.0, 159.0, 164.0, 163.0, 157.0, 165.0, 150.0, 151.0, 162.0]
2025-08-07 05:32:39,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 3 minutes, 13 seconds)
2025-08-07 05:34:18,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:34:20,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 355.98492 ± 26.223
2025-08-07 05:34:20,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [397.77243, 352.65945, 367.51596, 401.60098, 341.0436, 371.55252, 332.12122, 329.06378, 340.76184, 325.75766]
2025-08-07 05:34:20,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [169.0, 152.0, 162.0, 164.0, 145.0, 169.0, 147.0, 147.0, 153.0, 154.0]
2025-08-07 05:34:20,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 1 minute, 28 seconds)
2025-08-07 05:36:01,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:36:04,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 414.81406 ± 56.145
2025-08-07 05:36:04,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [391.82492, 429.99658, 302.39044, 471.9886, 346.484, 381.89777, 460.9903, 415.6183, 480.1098, 466.84006]
2025-08-07 05:36:04,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [159.0, 170.0, 133.0, 183.0, 146.0, 156.0, 179.0, 164.0, 195.0, 182.0]
2025-08-07 05:36:04,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 66/100 (estimated time remaining: 59 minutes, 48 seconds)
2025-08-07 05:37:44,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:37:47,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 458.62427 ± 152.695
2025-08-07 05:37:47,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [390.24808, 396.19675, 395.87018, 529.5932, 349.20572, 434.72836, 889.82635, 356.3841, 466.87296, 377.31696]
2025-08-07 05:37:47,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [156.0, 159.0, 171.0, 206.0, 147.0, 177.0, 336.0, 146.0, 195.0, 161.0]
2025-08-07 05:37:47,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 67/100 (estimated time remaining: 58 minutes, 11 seconds)
2025-08-07 05:39:25,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:39:28,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 482.11761 ± 298.830
2025-08-07 05:39:28,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1104.9667, 410.47632, 421.72742, 292.8276, 430.09344, 1012.3486, 420.6947, 264.58948, 238.98833, 224.46391]
2025-08-07 05:39:28,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [473.0, 156.0, 163.0, 193.0, 161.0, 404.0, 159.0, 246.0, 161.0, 159.0]
2025-08-07 05:39:28,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 68/100 (estimated time remaining: 56 minutes, 23 seconds)
2025-08-07 05:41:08,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:41:11,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 421.74567 ± 147.420
2025-08-07 05:41:11,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [385.87976, 382.47144, 450.5892, 136.98491, 388.00177, 422.16187, 399.72098, 448.14517, 782.65674, 420.8448]
2025-08-07 05:41:11,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [146.0, 143.0, 169.0, 80.0, 145.0, 159.0, 152.0, 170.0, 263.0, 159.0]
2025-08-07 05:41:11,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 69/100 (estimated time remaining: 54 minutes, 35 seconds)
2025-08-07 05:42:50,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:42:55,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 874.88904 ± 296.191
2025-08-07 05:42:55,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1087.8623, 398.11206, 436.86578, 1004.4079, 1370.3512, 700.2331, 1142.9902, 749.6049, 1044.36, 814.10345]
2025-08-07 05:42:55,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [362.0, 147.0, 161.0, 365.0, 477.0, 251.0, 374.0, 259.0, 372.0, 280.0]
2025-08-07 05:42:55,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (874.89) for latency ExtremeSparseL4U32
2025-08-07 05:42:55,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 70/100 (estimated time remaining: 53 minutes, 11 seconds)
2025-08-07 05:44:36,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:44:39,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 480.31732 ± 213.980
2025-08-07 05:44:39,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [421.87183, 401.6206, 328.59265, 601.0245, 426.9252, 212.28207, 406.81543, 336.4579, 1006.9903, 660.5927]
2025-08-07 05:44:39,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [164.0, 152.0, 159.0, 208.0, 161.0, 99.0, 152.0, 161.0, 317.0, 219.0]
2025-08-07 05:44:39,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 71/100 (estimated time remaining: 51 minutes, 31 seconds)
2025-08-07 05:46:17,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:46:20,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 551.83411 ± 111.298
2025-08-07 05:46:20,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [403.2315, 557.005, 587.4555, 571.05145, 425.49817, 670.87384, 644.8112, 614.86584, 688.4621, 355.08646]
2025-08-07 05:46:20,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [164.0, 216.0, 210.0, 197.0, 160.0, 222.0, 229.0, 222.0, 245.0, 138.0]
2025-08-07 05:46:20,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 72/100 (estimated time remaining: 49 minutes, 36 seconds)
2025-08-07 05:48:02,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:48:06,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 803.50531 ± 265.781
2025-08-07 05:48:06,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [536.18604, 917.9009, 608.926, 1025.859, 407.83044, 1075.7393, 401.61813, 998.8019, 1048.8575, 1013.33417]
2025-08-07 05:48:06,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [183.0, 331.0, 203.0, 377.0, 151.0, 379.0, 148.0, 352.0, 381.0, 332.0]
2025-08-07 05:48:06,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 73/100 (estimated time remaining: 48 minutes, 19 seconds)
2025-08-07 05:49:45,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:49:48,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 551.12964 ± 274.296
2025-08-07 05:49:48,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [653.8917, 693.2879, 389.98288, 372.3191, 146.83926, 751.508, 235.7792, 1106.942, 422.2794, 738.46716]
2025-08-07 05:49:48,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [248.0, 250.0, 145.0, 140.0, 83.0, 259.0, 105.0, 363.0, 157.0, 242.0]
2025-08-07 05:49:48,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 74/100 (estimated time remaining: 46 minutes, 31 seconds)
2025-08-07 05:51:27,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:51:30,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 519.42432 ± 177.133
2025-08-07 05:51:30,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [484.76645, 399.18472, 529.3181, 493.2828, 380.68896, 362.83847, 983.31195, 673.4171, 403.80627, 483.62817]
2025-08-07 05:51:30,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [180.0, 162.0, 187.0, 183.0, 158.0, 156.0, 315.0, 250.0, 161.0, 182.0]
2025-08-07 05:51:30,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 75/100 (estimated time remaining: 44 minutes, 39 seconds)
2025-08-07 05:53:09,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:53:12,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 467.00620 ± 105.692
2025-08-07 05:53:12,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [419.5586, 418.4714, 391.5815, 654.68506, 683.49243, 397.18353, 378.77463, 472.8551, 385.59662, 467.86258]
2025-08-07 05:53:12,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [161.0, 161.0, 152.0, 233.0, 247.0, 152.0, 148.0, 172.0, 151.0, 174.0]
2025-08-07 05:53:12,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 76/100 (estimated time remaining: 42 minutes, 45 seconds)
2025-08-07 05:54:52,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:54:57,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 1021.24512 ± 272.175
2025-08-07 05:54:57,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1004.72565, 889.39246, 1559.6188, 846.569, 553.1862, 834.2327, 863.1131, 1252.7191, 1243.2377, 1165.6576]
2025-08-07 05:54:57,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [370.0, 290.0, 521.0, 279.0, 189.0, 268.0, 286.0, 458.0, 393.0, 383.0]
2025-08-07 05:54:57,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (1021.25) for latency ExtremeSparseL4U32
2025-08-07 05:54:58,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 77/100 (estimated time remaining: 41 minutes, 23 seconds)
2025-08-07 05:56:38,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:56:42,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 811.76654 ± 269.747
2025-08-07 05:56:42,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [948.20166, 651.7256, 1181.3075, 580.75604, 915.53656, 814.3641, 858.2929, 651.65027, 284.04666, 1231.7842]
2025-08-07 05:56:42,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [311.0, 243.0, 377.0, 198.0, 314.0, 263.0, 286.0, 217.0, 176.0, 431.0]
2025-08-07 05:56:42,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 78/100 (estimated time remaining: 39 minutes, 35 seconds)
2025-08-07 05:58:24,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:58:28,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 852.00800 ± 312.158
2025-08-07 05:58:28,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1140.1813, 938.50226, 395.0234, 1358.9391, 481.74612, 478.1334, 662.0195, 1066.5553, 1087.7588, 911.2212]
2025-08-07 05:58:28,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [372.0, 314.0, 152.0, 427.0, 168.0, 166.0, 211.0, 358.0, 338.0, 311.0]
2025-08-07 05:58:28,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 79/100 (estimated time remaining: 38 minutes, 8 seconds)
2025-08-07 06:00:08,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:00:13,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 932.27704 ± 437.521
2025-08-07 06:00:13,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1006.54095, 1731.4087, 641.0635, 397.87888, 1001.85065, 992.42676, 646.64044, 562.1465, 1699.7667, 643.04816]
2025-08-07 06:00:13,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [323.0, 569.0, 329.0, 325.0, 323.0, 320.0, 324.0, 188.0, 604.0, 239.0]
2025-08-07 06:00:13,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 80/100 (estimated time remaining: 36 minutes, 38 seconds)
2025-08-07 06:01:52,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:01:56,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 793.76178 ± 156.570
2025-08-07 06:01:56,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [618.3263, 721.9662, 691.2889, 642.4401, 695.1342, 759.92084, 922.87213, 1167.2949, 849.6637, 868.71027]
2025-08-07 06:01:56,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [202.0, 225.0, 219.0, 208.0, 218.0, 240.0, 286.0, 351.0, 266.0, 286.0]
2025-08-07 06:01:56,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 81/100 (estimated time remaining: 34 minutes, 55 seconds)
2025-08-07 06:03:37,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:03:41,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 852.71991 ± 298.047
2025-08-07 06:03:41,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [407.4526, 1321.1217, 680.69604, 1279.8309, 991.56366, 1014.263, 421.1371, 760.3496, 733.2648, 917.5197]
2025-08-07 06:03:41,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [155.0, 412.0, 215.0, 412.0, 326.0, 306.0, 157.0, 245.0, 254.0, 301.0]
2025-08-07 06:03:41,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 82/100 (estimated time remaining: 33 minutes, 9 seconds)
2025-08-07 06:05:22,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:05:27,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 1091.60657 ± 466.803
2025-08-07 06:05:27,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1507.6736, 590.7011, 870.9962, 388.9377, 1790.0543, 1057.9897, 1164.4596, 942.2083, 750.20166, 1852.8425]
2025-08-07 06:05:27,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [495.0, 195.0, 263.0, 147.0, 542.0, 349.0, 363.0, 296.0, 234.0, 572.0]
2025-08-07 06:05:27,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (1091.61) for latency ExtremeSparseL4U32
2025-08-07 06:05:27,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 83/100 (estimated time remaining: 31 minutes, 29 seconds)
2025-08-07 06:07:07,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:07:11,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 755.01312 ± 255.552
2025-08-07 06:07:11,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [514.6757, 897.5863, 673.1149, 1012.0222, 1105.9962, 463.5929, 623.4461, 1169.9213, 635.2776, 454.49847]
2025-08-07 06:07:11,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [172.0, 289.0, 258.0, 296.0, 357.0, 162.0, 198.0, 363.0, 202.0, 344.0]
2025-08-07 06:07:11,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 84/100 (estimated time remaining: 29 minutes, 39 seconds)
2025-08-07 06:08:50,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:08:54,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 967.01886 ± 157.928
2025-08-07 06:08:54,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1099.5128, 971.8154, 835.8895, 1187.8564, 1001.01294, 758.79926, 1166.7228, 728.79, 1070.8511, 848.9387]
2025-08-07 06:08:54,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [328.0, 291.0, 257.0, 359.0, 305.0, 239.0, 355.0, 233.0, 342.0, 265.0]
2025-08-07 06:08:54,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 85/100 (estimated time remaining: 27 minutes, 46 seconds)
2025-08-07 06:10:36,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:10:42,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 1029.56567 ± 591.094
2025-08-07 06:10:42,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1075.6656, 1071.7778, 275.97766, 720.91815, 1922.2505, 2275.2234, 467.31647, 672.0757, 860.0667, 954.38513]
2025-08-07 06:10:42,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [433.0, 333.0, 182.0, 230.0, 737.0, 776.0, 166.0, 215.0, 271.0, 290.0]
2025-08-07 06:10:42,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 86/100 (estimated time remaining: 26 minutes, 18 seconds)
2025-08-07 06:12:20,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:12:25,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 954.35645 ± 266.210
2025-08-07 06:12:25,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [601.8818, 772.4582, 1321.7549, 971.7095, 1229.7982, 750.79425, 746.3467, 1236.4343, 1252.7764, 659.6105]
2025-08-07 06:12:25,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [203.0, 244.0, 417.0, 313.0, 408.0, 264.0, 246.0, 395.0, 413.0, 217.0]
2025-08-07 06:12:25,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 87/100 (estimated time remaining: 24 minutes, 25 seconds)
2025-08-07 06:14:06,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:14:13,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 1248.87219 ± 447.763
2025-08-07 06:14:13,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1010.00073, 1621.6364, 1567.3992, 1393.046, 1665.8015, 701.40265, 1146.3508, 1414.3036, 264.9018, 1703.8787]
2025-08-07 06:14:13,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [304.0, 601.0, 484.0, 500.0, 519.0, 225.0, 392.0, 431.0, 170.0, 530.0]
2025-08-07 06:14:13,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (1248.87) for latency ExtremeSparseL4U32
2025-08-07 06:14:13,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 88/100 (estimated time remaining: 22 minutes, 46 seconds)
2025-08-07 06:15:53,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:15:56,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 638.13135 ± 144.927
2025-08-07 06:15:56,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [653.9609, 635.0848, 478.6863, 434.04126, 674.6483, 658.08734, 468.71173, 670.4559, 949.1605, 758.47614]
2025-08-07 06:15:56,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [213.0, 211.0, 175.0, 159.0, 221.0, 216.0, 166.0, 220.0, 291.0, 240.0]
2025-08-07 06:15:56,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 89/100 (estimated time remaining: 20 minutes, 59 seconds)
2025-08-07 06:17:36,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:17:39,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 772.27899 ± 181.137
2025-08-07 06:17:39,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [669.6838, 1019.84436, 1039.2808, 893.6224, 628.3484, 604.5902, 798.0624, 692.43005, 912.4642, 464.46356]
2025-08-07 06:17:39,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [217.0, 306.0, 316.0, 271.0, 207.0, 199.0, 258.0, 219.0, 282.0, 164.0]
2025-08-07 06:17:39,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 90/100 (estimated time remaining: 19 minutes, 15 seconds)
2025-08-07 06:19:20,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:19:24,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 984.50439 ± 228.009
2025-08-07 06:19:24,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1050.347, 717.21643, 805.7306, 1379.7664, 1325.9762, 1021.39087, 1056.8693, 746.3527, 712.4402, 1028.954]
2025-08-07 06:19:24,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [323.0, 244.0, 260.0, 411.0, 395.0, 311.0, 320.0, 252.0, 243.0, 322.0]
2025-08-07 06:19:24,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 91/100 (estimated time remaining: 17 minutes, 24 seconds)
2025-08-07 06:21:06,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:21:10,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 850.41943 ± 218.998
2025-08-07 06:21:10,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [396.59973, 720.2462, 961.6132, 1083.7935, 1013.1718, 652.61554, 668.54865, 1080.5516, 884.04083, 1043.0128]
2025-08-07 06:21:10,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [151.0, 233.0, 301.0, 324.0, 304.0, 212.0, 216.0, 327.0, 269.0, 312.0]
2025-08-07 06:21:10,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 92/100 (estimated time remaining: 15 minutes, 46 seconds)
2025-08-07 06:22:50,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:22:56,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 931.70129 ± 327.478
2025-08-07 06:22:56,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [354.84357, 1222.1932, 1219.7739, 1062.9525, 1123.3861, 930.9918, 297.34656, 1032.9646, 831.6155, 1240.945]
2025-08-07 06:22:56,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [262.0, 367.0, 415.0, 426.0, 403.0, 365.0, 222.0, 302.0, 290.0, 418.0]
2025-08-07 06:22:56,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 93/100 (estimated time remaining: 13 minutes, 56 seconds)
2025-08-07 06:24:35,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:24:40,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 1118.98254 ± 165.105
2025-08-07 06:24:40,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1295.0743, 1387.7045, 957.5835, 1126.0659, 1272.4229, 1029.1569, 962.8794, 1208.802, 1112.5573, 837.5785]
2025-08-07 06:24:40,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [377.0, 402.0, 321.0, 336.0, 391.0, 321.0, 319.0, 354.0, 331.0, 279.0]
2025-08-07 06:24:40,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 94/100 (estimated time remaining: 12 minutes, 14 seconds)
2025-08-07 06:26:19,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:26:24,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 1025.38599 ± 306.080
2025-08-07 06:26:24,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [861.873, 846.39374, 765.23895, 963.24854, 1649.8297, 987.5338, 610.0636, 861.8472, 1347.8171, 1360.0137]
2025-08-07 06:26:24,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [286.0, 286.0, 240.0, 294.0, 500.0, 292.0, 199.0, 287.0, 405.0, 398.0]
2025-08-07 06:26:24,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 29 seconds)
2025-08-07 06:28:04,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:28:11,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 1293.03906 ± 211.348
2025-08-07 06:28:11,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1359.6843, 824.1047, 1200.9993, 1383.9529, 1467.7029, 1075.6843, 1290.4038, 1419.5764, 1281.0422, 1627.2404]
2025-08-07 06:28:11,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [469.0, 280.0, 404.0, 430.0, 496.0, 374.0, 447.0, 490.0, 392.0, 543.0]
2025-08-07 06:28:11,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (1293.04) for latency ExtremeSparseL4U32
2025-08-07 06:28:11,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 46 seconds)
2025-08-07 06:29:53,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:29:58,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 969.10242 ± 276.559
2025-08-07 06:29:58,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [1126.0471, 620.70685, 1047.556, 708.6971, 688.7328, 1449.6644, 1004.46234, 1306.6769, 1093.0377, 645.4431]
2025-08-07 06:29:58,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [334.0, 202.0, 306.0, 228.0, 219.0, 460.0, 306.0, 415.0, 318.0, 208.0]
2025-08-07 06:29:58,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 1 second)
2025-08-07 06:31:37,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:31:41,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 761.94733 ± 273.469
2025-08-07 06:31:41,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [986.66394, 684.085, 335.14438, 441.8197, 1147.6519, 1184.6958, 681.4538, 570.5481, 663.2504, 924.1601]
2025-08-07 06:31:41,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [300.0, 223.0, 136.0, 164.0, 352.0, 362.0, 220.0, 196.0, 217.0, 278.0]
2025-08-07 06:31:41,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 15 seconds)
2025-08-07 06:33:20,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:33:26,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 1352.43945 ± 328.512
2025-08-07 06:33:26,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [2044.5575, 1008.9715, 1356.1797, 724.17267, 1577.0892, 1357.1102, 1489.3242, 1288.0568, 1256.5057, 1422.4282]
2025-08-07 06:33:26,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [643.0, 301.0, 421.0, 234.0, 460.0, 395.0, 440.0, 416.0, 368.0, 422.0]
2025-08-07 06:33:26,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1226 [INFO]: New best (1352.44) for latency ExtremeSparseL4U32
2025-08-07 06:33:26,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 30 seconds)
2025-08-07 06:35:07,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:35:12,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 937.89130 ± 190.359
2025-08-07 06:35:12,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [933.50507, 951.22626, 1126.875, 732.11383, 1058.898, 993.0419, 692.1135, 957.5194, 1286.1547, 647.4655]
2025-08-07 06:35:12,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [283.0, 294.0, 338.0, 236.0, 324.0, 304.0, 226.0, 298.0, 377.0, 214.0]
2025-08-07 06:35:12,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 45 seconds)
2025-08-07 06:36:52,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:36:55,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1221 [DEBUG]: Total Reward: 651.89520 ± 75.337
2025-08-07 06:36:55,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1222 [DEBUG]: All rewards: [794.4065, 686.1319, 621.474, 481.18924, 688.23865, 672.5415, 621.3466, 648.84827, 687.88116, 616.89404]
2025-08-07 06:36:55,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1223 [DEBUG]: All trajectory lengths: [249.0, 222.0, 204.0, 175.0, 224.0, 219.0, 205.0, 211.0, 223.0, 204.0]
2025-08-07 06:36:55,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc0-walker2d):1251 [DEBUG]: Training session finished
