2025-08-07 00:48:08,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc20-hopper/ExtremeSparseL4U32-bpql-mem32
2025-08-07 00:48:08,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc20-hopper/ExtremeSparseL4U32-bpql-mem32
2025-08-07 00:48:08,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x15298904dfd0>}
2025-08-07 00:48:08,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1111 [DEBUG]: using device: cuda
2025-08-07 00:48:08,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1133 [INFO]: Creating new trainer
2025-08-07 00:48:08,673 baseline-bpql-noiseperc20-hopper:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=107, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-08-07 00:48:08,673 baseline-bpql-noiseperc20-hopper:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 00:48:09,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1194 [DEBUG]: Starting training session...
2025-08-07 00:48:09,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 1/100
2025-08-07 00:49:33,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:49:34,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 16.30273 ± 5.940
2025-08-07 00:49:34,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [10.241601, 15.093721, 15.538927, 8.968871, 15.279269, 18.839052, 12.603433, 12.699643, 25.798992, 27.963762]
2025-08-07 00:49:34,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [16.0, 17.0, 20.0, 22.0, 19.0, 18.0, 21.0, 16.0, 26.0, 27.0]
2025-08-07 00:49:34,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1226 [INFO]: New best (16.30) for latency ExtremeSparseL4U32
2025-08-07 00:49:34,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 19 minutes, 28 seconds)
2025-08-07 00:51:05,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:51:05,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 27.98020 ± 44.593
2025-08-07 00:51:05,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [17.876747, 14.836093, 12.792364, 11.064389, 9.915503, 161.56256, 13.772222, 10.706493, 16.015606, 11.260018]
2025-08-07 00:51:05,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [38.0, 21.0, 16.0, 14.0, 15.0, 126.0, 19.0, 15.0, 20.0, 17.0]
2025-08-07 00:51:05,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1226 [INFO]: New best (27.98) for latency ExtremeSparseL4U32
2025-08-07 00:51:05,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 23 minutes, 52 seconds)
2025-08-07 00:52:38,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:52:38,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 32.00027 ± 44.957
2025-08-07 00:52:38,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [9.085729, 14.279845, 13.9326105, 32.71216, 26.545334, 15.751882, 9.807302, 22.326511, 10.507815, 165.05354]
2025-08-07 00:52:38,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [12.0, 16.0, 26.0, 31.0, 42.0, 18.0, 18.0, 26.0, 18.0, 120.0]
2025-08-07 00:52:38,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1226 [INFO]: New best (32.00) for latency ExtremeSparseL4U32
2025-08-07 00:52:38,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 24 minutes, 49 seconds)
2025-08-07 00:54:10,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:54:10,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 26.82720 ± 21.105
2025-08-07 00:54:10,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [21.156225, 15.090658, 20.757322, 11.208256, 15.986068, 14.332463, 81.68011, 50.9624, 19.367413, 17.73108]
2025-08-07 00:54:10,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [25.0, 16.0, 20.0, 14.0, 26.0, 15.0, 67.0, 55.0, 20.0, 22.0]
2025-08-07 00:54:10,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 24 minutes, 27 seconds)
2025-08-07 00:55:42,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:55:43,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 46.65555 ± 58.180
2025-08-07 00:55:43,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [27.827036, 118.50533, 11.092488, 195.65201, 21.527613, 9.522775, 12.109154, 21.773464, 19.643877, 28.901764]
2025-08-07 00:55:43,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [27.0, 94.0, 25.0, 108.0, 31.0, 14.0, 15.0, 22.0, 21.0, 50.0]
2025-08-07 00:55:43,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1226 [INFO]: New best (46.66) for latency ExtremeSparseL4U32
2025-08-07 00:55:43,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 23 minutes, 38 seconds)
2025-08-07 00:57:15,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:57:16,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 49.14219 ± 59.892
2025-08-07 00:57:16,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [10.964598, 28.53477, 18.709242, 27.048382, 61.91132, 13.648519, 96.116714, 9.993881, 210.85573, 13.638771]
2025-08-07 00:57:16,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [14.0, 27.0, 22.0, 27.0, 38.0, 15.0, 74.0, 13.0, 114.0, 32.0]
2025-08-07 00:57:16,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1226 [INFO]: New best (49.14) for latency ExtremeSparseL4U32
2025-08-07 00:57:16,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 24 minutes, 42 seconds)
2025-08-07 00:58:48,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:58:48,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 28.24390 ± 33.007
2025-08-07 00:58:48,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [12.346611, 14.484027, 14.296701, 15.488615, 21.470297, 125.92724, 21.554817, 30.548624, 12.010926, 14.311126]
2025-08-07 00:58:48,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [14.0, 15.0, 17.0, 16.0, 27.0, 94.0, 27.0, 33.0, 16.0, 18.0]
2025-08-07 00:58:48,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 23 minutes, 30 seconds)
2025-08-07 01:00:20,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:00:21,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 25.75605 ± 21.706
2025-08-07 01:00:21,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [24.200626, 12.041754, 65.614174, 71.37023, 13.357754, 14.435607, 13.431246, 12.124115, 18.762201, 12.22283]
2025-08-07 01:00:21,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [26.0, 15.0, 80.0, 47.0, 20.0, 21.0, 29.0, 20.0, 21.0, 24.0]
2025-08-07 01:00:21,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 21 minutes, 52 seconds)
2025-08-07 01:01:51,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:01:52,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 38.46214 ± 59.965
2025-08-07 01:01:52,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [217.66765, 29.791954, 18.202946, 15.55335, 11.830772, 21.123655, 21.235504, 10.026883, 21.37829, 17.810371]
2025-08-07 01:01:52,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [118.0, 30.0, 26.0, 27.0, 19.0, 21.0, 26.0, 14.0, 28.0, 32.0]
2025-08-07 01:01:52,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 19 minutes, 55 seconds)
2025-08-07 01:03:23,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:03:23,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 39.49826 ± 66.619
2025-08-07 01:03:23,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [20.141077, 15.882066, 10.136294, 13.367924, 26.021011, 11.534189, 238.68161, 27.63863, 16.390766, 15.188951]
2025-08-07 01:03:23,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [20.0, 24.0, 14.0, 17.0, 29.0, 16.0, 140.0, 28.0, 19.0, 26.0]
2025-08-07 01:03:23,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 18 minutes, 9 seconds)
2025-08-07 01:04:54,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:04:55,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 27.35844 ± 19.384
2025-08-07 01:04:55,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [24.486187, 18.09511, 17.643755, 51.543644, 76.668915, 17.645876, 19.018276, 13.866828, 18.828657, 15.787145]
2025-08-07 01:04:55,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [28.0, 18.0, 23.0, 55.0, 51.0, 25.0, 24.0, 18.0, 21.0, 24.0]
2025-08-07 01:04:55,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 16 minutes, 9 seconds)
2025-08-07 01:06:26,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:06:26,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 54.04531 ± 48.375
2025-08-07 01:06:26,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [110.320694, 17.100792, 19.623411, 21.492662, 123.024666, 13.683332, 127.33782, 10.756499, 86.84316, 10.270023]
2025-08-07 01:06:26,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [70.0, 19.0, 21.0, 31.0, 74.0, 18.0, 83.0, 23.0, 64.0, 12.0]
2025-08-07 01:06:26,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1226 [INFO]: New best (54.05) for latency ExtremeSparseL4U32
2025-08-07 01:06:26,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 14 minutes, 17 seconds)
2025-08-07 01:07:57,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:07:57,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 24.51691 ± 20.968
2025-08-07 01:07:57,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [10.809444, 14.651698, 10.974564, 9.449096, 16.57407, 34.497658, 77.103294, 7.3857527, 45.493484, 18.230076]
2025-08-07 01:07:57,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [15.0, 30.0, 24.0, 13.0, 20.0, 44.0, 56.0, 12.0, 50.0, 22.0]
2025-08-07 01:07:57,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 12 minutes, 26 seconds)
2025-08-07 01:09:27,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:09:28,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 17.12322 ± 7.463
2025-08-07 01:09:28,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [32.815296, 9.067022, 17.884926, 15.278751, 26.246368, 16.476093, 19.278854, 9.799351, 17.25957, 7.1260204]
2025-08-07 01:09:28,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [29.0, 12.0, 30.0, 25.0, 30.0, 18.0, 21.0, 12.0, 18.0, 15.0]
2025-08-07 01:09:28,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 10 minutes, 40 seconds)
2025-08-07 01:10:59,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:10:59,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 49.09058 ± 49.604
2025-08-07 01:10:59,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [21.253878, 26.505241, 25.751808, 13.266699, 19.935957, 10.050379, 9.54682, 91.43164, 151.61864, 121.54466]
2025-08-07 01:10:59,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [29.0, 29.0, 22.0, 40.0, 21.0, 18.0, 16.0, 71.0, 84.0, 76.0]
2025-08-07 01:10:59,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 9 minutes, 13 seconds)
2025-08-07 01:12:30,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:12:30,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 33.42776 ± 31.390
2025-08-07 01:12:30,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [15.058994, 12.079024, 21.343664, 91.86995, 25.38273, 15.483687, 23.638384, 7.3810654, 23.503872, 98.53617]
2025-08-07 01:12:30,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [18.0, 17.0, 25.0, 67.0, 25.0, 19.0, 22.0, 11.0, 28.0, 64.0]
2025-08-07 01:12:30,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 7 minutes, 36 seconds)
2025-08-07 01:14:01,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:14:01,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 45.91988 ± 45.181
2025-08-07 01:14:01,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [16.250736, 9.543573, 20.817919, 15.018608, 99.73305, 70.61624, 10.778676, 9.627631, 58.923336, 147.88899]
2025-08-07 01:14:01,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [30.0, 14.0, 25.0, 17.0, 76.0, 68.0, 17.0, 13.0, 47.0, 93.0]
2025-08-07 01:14:01,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 5 minutes, 55 seconds)
2025-08-07 01:15:32,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:15:32,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 25.20689 ± 28.122
2025-08-07 01:15:32,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [11.377702, 13.2253, 9.8550205, 25.042492, 108.13281, 15.852393, 16.330536, 23.466553, 9.00596, 19.780123]
2025-08-07 01:15:32,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [16.0, 18.0, 14.0, 26.0, 72.0, 20.0, 18.0, 23.0, 13.0, 28.0]
2025-08-07 01:15:32,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 4 minutes, 21 seconds)
2025-08-07 01:17:03,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:17:04,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 36.82172 ± 28.616
2025-08-07 01:17:04,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [27.781029, 65.2942, 18.109102, 12.014192, 100.59696, 66.37494, 30.330515, 20.760279, 13.5069, 13.449129]
2025-08-07 01:17:04,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [39.0, 39.0, 21.0, 14.0, 71.0, 71.0, 33.0, 26.0, 24.0, 21.0]
2025-08-07 01:17:04,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 3 minutes, 12 seconds)
2025-08-07 01:18:34,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:18:35,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 29.70248 ± 24.111
2025-08-07 01:18:35,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [19.93069, 10.151981, 22.69907, 27.363241, 21.17176, 11.671167, 70.50688, 12.22417, 18.755188, 82.55068]
2025-08-07 01:18:35,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [21.0, 13.0, 30.0, 32.0, 25.0, 16.0, 53.0, 26.0, 20.0, 69.0]
2025-08-07 01:18:35,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 1 minute, 26 seconds)
2025-08-07 01:20:06,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:20:07,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 42.81851 ± 37.739
2025-08-07 01:20:07,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [100.22146, 14.14733, 10.53721, 107.53307, 19.22953, 65.1435, 11.698934, 14.048628, 74.78047, 10.844963]
2025-08-07 01:20:07,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [78.0, 25.0, 17.0, 62.0, 24.0, 55.0, 14.0, 19.0, 87.0, 13.0]
2025-08-07 01:20:07,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 9 seconds)
2025-08-07 01:21:37,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:21:38,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 37.96649 ± 41.846
2025-08-07 01:21:38,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [17.081968, 142.12044, 84.14994, 7.1009026, 10.2912245, 13.6291, 23.835833, 55.704884, 15.079047, 10.671562]
2025-08-07 01:21:38,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [23.0, 105.0, 116.0, 11.0, 15.0, 23.0, 27.0, 59.0, 16.0, 15.0]
2025-08-07 01:21:38,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 23/100 (estimated time remaining: 1 hour, 58 minutes, 43 seconds)
2025-08-07 01:23:08,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:23:08,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 40.57610 ± 55.804
2025-08-07 01:23:08,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [14.343359, 196.78026, 12.780608, 18.449514, 13.300623, 82.003914, 10.562503, 19.47918, 24.319399, 13.741631]
2025-08-07 01:23:08,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [28.0, 89.0, 18.0, 22.0, 16.0, 62.0, 13.0, 22.0, 30.0, 19.0]
2025-08-07 01:23:08,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 24/100 (estimated time remaining: 1 hour, 57 minutes, 4 seconds)
2025-08-07 01:24:38,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:24:39,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 25.03415 ± 29.379
2025-08-07 01:24:39,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [16.9608, 12.923913, 12.707839, 111.75369, 26.087563, 11.0408325, 12.833508, 24.137877, 12.351138, 9.544352]
2025-08-07 01:24:39,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 16.0, 17.0, 77.0, 27.0, 13.0, 18.0, 26.0, 15.0, 28.0]
2025-08-07 01:24:39,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 25/100 (estimated time remaining: 1 hour, 55 minutes, 14 seconds)
2025-08-07 01:26:07,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:26:08,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 29.81942 ± 39.481
2025-08-07 01:26:08,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [18.5405, 19.201103, 26.777119, 14.860516, 19.373985, 10.048753, 147.32152, 10.679529, 20.318956, 11.0721655]
2025-08-07 01:26:08,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [20.0, 28.0, 24.0, 24.0, 25.0, 17.0, 88.0, 20.0, 23.0, 15.0]
2025-08-07 01:26:08,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 26/100 (estimated time remaining: 1 hour, 53 minutes, 9 seconds)
2025-08-07 01:27:35,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:27:36,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 17.15496 ± 8.213
2025-08-07 01:27:36,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [11.217854, 15.7525835, 13.098153, 11.741948, 7.6800094, 10.918526, 15.375126, 26.954012, 35.039852, 23.771599]
2025-08-07 01:27:36,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [14.0, 19.0, 15.0, 21.0, 14.0, 17.0, 18.0, 26.0, 30.0, 24.0]
2025-08-07 01:27:36,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 27/100 (estimated time remaining: 1 hour, 50 minutes, 44 seconds)
2025-08-07 01:29:04,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:29:04,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 37.40260 ± 60.489
2025-08-07 01:29:04,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [9.515736, 10.501082, 16.722258, 9.251735, 22.004698, 22.411919, 19.119392, 29.674313, 16.892805, 217.93207]
2025-08-07 01:29:04,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [11.0, 16.0, 20.0, 13.0, 22.0, 32.0, 19.0, 30.0, 20.0, 191.0]
2025-08-07 01:29:04,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 28/100 (estimated time remaining: 1 hour, 48 minutes, 39 seconds)
2025-08-07 01:30:33,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:30:33,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 35.72095 ± 32.057
2025-08-07 01:30:33,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [99.35, 17.688856, 25.800476, 13.163842, 20.098688, 21.844944, 25.583689, 99.07306, 22.98045, 11.625485]
2025-08-07 01:30:33,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [79.0, 21.0, 23.0, 16.0, 27.0, 31.0, 22.0, 73.0, 23.0, 28.0]
2025-08-07 01:30:33,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 29/100 (estimated time remaining: 1 hour, 46 minutes, 45 seconds)
2025-08-07 01:32:01,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:32:01,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 19.49763 ± 6.416
2025-08-07 01:32:01,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [15.186653, 22.459122, 16.472275, 13.753447, 19.694735, 37.310677, 18.547945, 16.647604, 19.383583, 15.520306]
2025-08-07 01:32:01,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [26.0, 23.0, 32.0, 15.0, 21.0, 31.0, 23.0, 29.0, 24.0, 21.0]
2025-08-07 01:32:01,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 30/100 (estimated time remaining: 1 hour, 44 minutes, 43 seconds)
2025-08-07 01:33:30,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:33:30,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 25.55930 ± 28.721
2025-08-07 01:33:30,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [16.204256, 30.186632, 13.140448, 110.09613, 13.759991, 10.207069, 15.262439, 9.421284, 19.769312, 17.545448]
2025-08-07 01:33:30,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [29.0, 32.0, 22.0, 80.0, 17.0, 15.0, 20.0, 12.0, 28.0, 20.0]
2025-08-07 01:33:30,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 43 minutes, 17 seconds)
2025-08-07 01:34:59,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:34:59,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 46.12557 ± 54.493
2025-08-07 01:34:59,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [15.591605, 148.31737, 9.917477, 31.789112, 10.955488, 160.0268, 24.259596, 14.213189, 18.894007, 27.291115]
2025-08-07 01:34:59,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [22.0, 103.0, 12.0, 31.0, 14.0, 137.0, 22.0, 24.0, 25.0, 25.0]
2025-08-07 01:34:59,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 42 minutes, 1 second)
2025-08-07 01:36:28,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:36:28,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 34.33418 ± 36.451
2025-08-07 01:36:28,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [11.907505, 15.341596, 12.946634, 11.726709, 15.845714, 91.57103, 21.443365, 119.46035, 28.505718, 14.593147]
2025-08-07 01:36:28,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [13.0, 17.0, 17.0, 16.0, 17.0, 81.0, 21.0, 73.0, 42.0, 24.0]
2025-08-07 01:36:28,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 40 minutes, 35 seconds)
2025-08-07 01:37:57,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:37:58,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 49.48129 ± 57.468
2025-08-07 01:37:58,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [132.73164, 12.589616, 8.777832, 19.453636, 8.916412, 174.76018, 88.50249, 13.880341, 15.1314535, 20.069315]
2025-08-07 01:37:58,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [111.0, 16.0, 25.0, 27.0, 12.0, 150.0, 63.0, 18.0, 19.0, 19.0]
2025-08-07 01:37:58,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 39 minutes, 15 seconds)
2025-08-07 01:39:27,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:39:27,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 34.78585 ± 36.754
2025-08-07 01:39:27,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [130.3241, 72.9463, 31.026083, 13.877218, 10.505331, 12.550298, 17.342335, 11.97053, 11.07058, 36.245716]
2025-08-07 01:39:27,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [123.0, 71.0, 31.0, 16.0, 13.0, 20.0, 23.0, 18.0, 15.0, 32.0]
2025-08-07 01:39:27,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 38 minutes, 9 seconds)
2025-08-07 01:40:56,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:40:56,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 41.25613 ± 50.669
2025-08-07 01:40:56,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [13.095309, 22.978546, 103.15319, 18.135424, 14.816395, 17.376991, 17.662596, 14.0166855, 19.201838, 172.12427]
2025-08-07 01:40:56,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [14.0, 23.0, 92.0, 30.0, 16.0, 25.0, 19.0, 18.0, 26.0, 149.0]
2025-08-07 01:40:56,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 36 minutes, 36 seconds)
2025-08-07 01:42:25,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:42:25,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 22.78892 ± 21.103
2025-08-07 01:42:25,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [11.5864525, 32.55893, 13.745279, 11.460961, 9.3099985, 16.139963, 82.69814, 16.84386, 23.83002, 9.715543]
2025-08-07 01:42:25,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [42.0, 29.0, 25.0, 18.0, 16.0, 30.0, 89.0, 27.0, 23.0, 15.0]
2025-08-07 01:42:25,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 35 minutes, 8 seconds)
2025-08-07 01:43:54,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:43:54,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 28.95259 ± 26.298
2025-08-07 01:43:54,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [33.33947, 10.086538, 14.523174, 49.276276, 14.43395, 13.536999, 14.942183, 99.855675, 13.500087, 26.031506]
2025-08-07 01:43:54,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [31.0, 13.0, 20.0, 83.0, 15.0, 20.0, 27.0, 72.0, 20.0, 32.0]
2025-08-07 01:43:54,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 33 minutes, 40 seconds)
2025-08-07 01:45:24,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:45:25,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 38.98190 ± 37.079
2025-08-07 01:45:25,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [10.758927, 106.53206, 13.164908, 10.070488, 23.898811, 22.791151, 15.553968, 13.765296, 104.9361, 68.34732]
2025-08-07 01:45:25,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 138.0, 15.0, 18.0, 26.0, 22.0, 28.0, 16.0, 133.0, 99.0]
2025-08-07 01:45:25,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 32 minutes, 30 seconds)
2025-08-07 01:46:56,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:46:56,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 51.48793 ± 60.084
2025-08-07 01:46:56,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [222.69614, 21.853626, 23.867666, 34.14439, 15.445469, 79.92728, 26.750872, 10.828971, 29.793821, 49.57111]
2025-08-07 01:46:56,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [214.0, 21.0, 27.0, 31.0, 21.0, 65.0, 27.0, 18.0, 32.0, 83.0]
2025-08-07 01:46:56,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 31 minutes, 18 seconds)
2025-08-07 01:48:24,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:48:25,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 39.22178 ± 64.023
2025-08-07 01:48:25,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [21.090881, 24.556675, 7.9788775, 13.942631, 230.59103, 14.404334, 14.405621, 28.094572, 18.845488, 18.307686]
2025-08-07 01:48:25,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [25.0, 23.0, 16.0, 31.0, 211.0, 21.0, 33.0, 24.0, 27.0, 21.0]
2025-08-07 01:48:25,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 29 minutes, 43 seconds)
2025-08-07 01:49:53,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:49:53,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 27.77137 ± 17.266
2025-08-07 01:49:53,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [16.083261, 21.938595, 56.664436, 20.73599, 22.60694, 24.632841, 12.152825, 14.179678, 23.022034, 65.69706]
2025-08-07 01:49:53,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [18.0, 30.0, 74.0, 21.0, 23.0, 30.0, 16.0, 21.0, 23.0, 58.0]
2025-08-07 01:49:53,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 28 minutes, 6 seconds)
2025-08-07 01:51:23,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:51:23,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 15.79771 ± 7.440
2025-08-07 01:51:23,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [12.220253, 10.621322, 19.662077, 9.4861765, 10.404535, 17.098463, 15.720125, 35.995365, 11.296022, 15.472788]
2025-08-07 01:51:23,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [22.0, 13.0, 23.0, 13.0, 14.0, 22.0, 20.0, 29.0, 22.0, 20.0]
2025-08-07 01:51:23,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 26 minutes, 42 seconds)
2025-08-07 01:52:51,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:52:51,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 41.10286 ± 49.271
2025-08-07 01:52:51,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [22.359264, 171.11119, 27.407919, 9.60744, 17.270184, 18.934834, 20.767982, 95.09094, 16.761833, 11.717042]
2025-08-07 01:52:51,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [23.0, 93.0, 26.0, 13.0, 19.0, 26.0, 25.0, 90.0, 26.0, 29.0]
2025-08-07 01:52:51,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 24 minutes, 45 seconds)
2025-08-07 01:54:20,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:54:20,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 16.78805 ± 7.888
2025-08-07 01:54:20,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [23.412592, 11.291105, 26.927519, 18.507689, 32.878727, 13.650341, 10.322333, 8.733258, 12.014975, 10.141988]
2025-08-07 01:54:20,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [28.0, 16.0, 25.0, 30.0, 51.0, 23.0, 13.0, 12.0, 16.0, 19.0]
2025-08-07 01:54:20,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 22 minutes, 45 seconds)
2025-08-07 01:55:48,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:55:49,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 46.15646 ± 51.551
2025-08-07 01:55:49,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [77.81534, 16.78125, 139.51085, 8.175656, 10.498195, 12.153352, 16.87387, 15.223924, 144.32294, 20.20922]
2025-08-07 01:55:49,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [66.0, 24.0, 75.0, 12.0, 20.0, 14.0, 21.0, 29.0, 94.0, 21.0]
2025-08-07 01:55:49,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 21 minutes, 26 seconds)
2025-08-07 01:57:18,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:57:18,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 33.56125 ± 38.757
2025-08-07 01:57:18,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [27.900988, 12.115169, 134.39732, 12.517158, 16.25084, 10.707843, 15.411948, 14.06062, 13.944109, 78.30647]
2025-08-07 01:57:18,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [30.0, 14.0, 116.0, 24.0, 31.0, 16.0, 20.0, 15.0, 26.0, 53.0]
2025-08-07 01:57:18,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 20 minutes, 4 seconds)
2025-08-07 01:58:47,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:58:47,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 52.25584 ± 51.664
2025-08-07 01:58:47,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [18.005642, 13.18389, 152.32854, 30.552515, 33.250736, 21.357641, 14.087659, 143.36029, 13.569462, 82.86196]
2025-08-07 01:58:47,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [23.0, 21.0, 106.0, 33.0, 34.0, 30.0, 19.0, 83.0, 17.0, 75.0]
2025-08-07 01:58:47,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 18 minutes, 29 seconds)
2025-08-07 02:00:16,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:00:16,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 63.69313 ± 56.774
2025-08-07 02:00:16,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [43.82078, 106.50768, 26.025578, 16.862247, 136.65579, 15.6229925, 11.425808, 85.06335, 15.963556, 178.98354]
2025-08-07 02:00:16,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [54.0, 67.0, 28.0, 22.0, 75.0, 27.0, 16.0, 59.0, 25.0, 183.0]
2025-08-07 02:00:16,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1226 [INFO]: New best (63.69) for latency ExtremeSparseL4U32
2025-08-07 02:00:16,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 17 minutes, 10 seconds)
2025-08-07 02:01:46,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:01:46,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 34.32081 ± 35.637
2025-08-07 02:01:46,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [13.340138, 9.7081, 86.151886, 10.257695, 119.58079, 19.822355, 9.700181, 29.66509, 23.304544, 21.677301]
2025-08-07 02:01:46,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [18.0, 20.0, 69.0, 12.0, 63.0, 26.0, 13.0, 31.0, 23.0, 21.0]
2025-08-07 02:01:46,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 15 minutes, 50 seconds)
2025-08-07 02:03:14,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:03:14,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 18.28140 ± 8.762
2025-08-07 02:03:14,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [16.058952, 16.122576, 12.801575, 24.862326, 35.34552, 31.84975, 9.219679, 10.276278, 10.861008, 15.416326]
2025-08-07 02:03:14,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [26.0, 22.0, 18.0, 22.0, 28.0, 32.0, 12.0, 13.0, 15.0, 17.0]
2025-08-07 02:03:14,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 14 minutes, 15 seconds)
2025-08-07 02:04:43,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:04:43,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 32.97437 ± 50.356
2025-08-07 02:04:43,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [10.789901, 17.284462, 10.010605, 182.9539, 31.19547, 11.1086855, 13.904377, 16.748129, 21.952349, 13.79579]
2025-08-07 02:04:43,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [17.0, 27.0, 16.0, 134.0, 33.0, 16.0, 17.0, 17.0, 20.0, 20.0]
2025-08-07 02:04:43,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 12 minutes, 43 seconds)
2025-08-07 02:06:12,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:06:13,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 70.14106 ± 69.028
2025-08-07 02:06:13,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [16.222218, 20.342415, 109.24969, 15.889172, 15.822111, 97.354546, 185.89491, 196.79962, 20.263565, 23.57234]
2025-08-07 02:06:13,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [23.0, 27.0, 83.0, 31.0, 20.0, 88.0, 109.0, 102.0, 27.0, 26.0]
2025-08-07 02:06:13,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1226 [INFO]: New best (70.14) for latency ExtremeSparseL4U32
2025-08-07 02:06:13,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 11 minutes, 20 seconds)
2025-08-07 02:07:43,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:07:43,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 40.08208 ± 43.917
2025-08-07 02:07:43,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [11.155898, 23.435465, 23.969618, 20.180067, 16.732803, 22.53669, 17.309607, 133.80357, 120.77659, 10.920511]
2025-08-07 02:07:43,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 21.0, 26.0, 30.0, 20.0, 27.0, 18.0, 99.0, 80.0, 26.0]
2025-08-07 02:07:43,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 9 minutes, 59 seconds)
2025-08-07 02:09:11,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:09:11,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 40.36098 ± 50.055
2025-08-07 02:09:11,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [17.841024, 14.426643, 155.06897, 11.48152, 13.696676, 122.94567, 11.57222, 13.29472, 28.355167, 14.927246]
2025-08-07 02:09:11,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [26.0, 21.0, 97.0, 17.0, 25.0, 81.0, 19.0, 29.0, 30.0, 23.0]
2025-08-07 02:09:11,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 8 minutes, 18 seconds)
2025-08-07 02:10:40,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:10:40,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 50.80805 ± 54.756
2025-08-07 02:10:40,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [121.150475, 12.2653055, 168.6704, 10.517247, 19.166395, 16.854708, 18.346582, 102.87797, 26.383442, 11.847992]
2025-08-07 02:10:40,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [77.0, 20.0, 95.0, 15.0, 21.0, 17.0, 20.0, 80.0, 27.0, 24.0]
2025-08-07 02:10:40,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 6 minutes, 52 seconds)
2025-08-07 02:12:09,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:12:09,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 48.25759 ± 53.424
2025-08-07 02:12:09,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [98.610725, 73.33064, 185.09097, 19.231188, 13.270573, 27.67992, 16.98551, 23.41175, 10.442176, 14.522456]
2025-08-07 02:12:09,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [87.0, 55.0, 103.0, 20.0, 22.0, 28.0, 25.0, 28.0, 20.0, 16.0]
2025-08-07 02:12:09,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 5 minutes, 25 seconds)
2025-08-07 02:13:38,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:13:38,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 32.96902 ± 46.435
2025-08-07 02:13:38,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [17.9941, 15.218551, 16.810356, 24.573807, 18.40924, 17.756058, 15.654709, 172.00526, 12.826125, 18.442045]
2025-08-07 02:13:38,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [24.0, 19.0, 25.0, 25.0, 25.0, 27.0, 19.0, 119.0, 20.0, 22.0]
2025-08-07 02:13:39,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 3 minutes, 50 seconds)
2025-08-07 02:15:07,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:15:07,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 15.13944 ± 6.966
2025-08-07 02:15:07,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [7.7814107, 8.404934, 7.8265867, 19.56378, 23.866934, 10.026499, 29.47195, 16.527739, 14.357994, 13.566568]
2025-08-07 02:15:07,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [15.0, 13.0, 10.0, 32.0, 30.0, 13.0, 31.0, 18.0, 29.0, 15.0]
2025-08-07 02:15:07,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 2 minutes, 11 seconds)
2025-08-07 02:16:36,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:16:37,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 76.53688 ± 66.869
2025-08-07 02:16:37,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [21.259344, 27.981905, 86.15665, 24.511778, 76.29651, 12.94044, 161.09842, 20.41592, 220.56079, 114.147095]
2025-08-07 02:16:37,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [31.0, 33.0, 64.0, 26.0, 58.0, 16.0, 93.0, 25.0, 139.0, 81.0]
2025-08-07 02:16:37,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1226 [INFO]: New best (76.54) for latency ExtremeSparseL4U32
2025-08-07 02:16:37,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 52 seconds)
2025-08-07 02:18:06,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:18:06,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 31.02971 ± 29.811
2025-08-07 02:18:06,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [31.933865, 47.447357, 25.684576, 11.090615, 10.86577, 114.61998, 19.316265, 17.651377, 12.770387, 18.916866]
2025-08-07 02:18:06,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [28.0, 53.0, 26.0, 20.0, 12.0, 70.0, 26.0, 19.0, 14.0, 28.0]
2025-08-07 02:18:06,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 61/100 (estimated time remaining: 59 minutes, 25 seconds)
2025-08-07 02:19:34,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:19:35,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 16.83711 ± 8.293
2025-08-07 02:19:35,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [12.5146675, 15.835905, 14.501961, 19.85271, 14.402089, 40.231308, 16.139156, 15.01146, 10.751312, 9.1305065]
2025-08-07 02:19:35,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [21.0, 18.0, 28.0, 19.0, 16.0, 53.0, 31.0, 23.0, 19.0, 12.0]
2025-08-07 02:19:35,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 62/100 (estimated time remaining: 57 minutes, 52 seconds)
2025-08-07 02:21:04,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:21:04,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 28.77904 ± 19.474
2025-08-07 02:21:04,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [20.092918, 24.36008, 67.30182, 11.924501, 25.557089, 17.293459, 14.370417, 66.46446, 22.717806, 17.7079]
2025-08-07 02:21:04,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [23.0, 23.0, 58.0, 14.0, 30.0, 22.0, 17.0, 43.0, 28.0, 18.0]
2025-08-07 02:21:04,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 63/100 (estimated time remaining: 56 minutes, 27 seconds)
2025-08-07 02:22:33,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:22:33,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 50.87528 ± 72.703
2025-08-07 02:22:33,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [17.332207, 11.83525, 256.44788, 12.052266, 91.46225, 12.965984, 18.469936, 21.24089, 13.192811, 53.753326]
2025-08-07 02:22:33,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [22.0, 13.0, 160.0, 23.0, 78.0, 18.0, 23.0, 22.0, 25.0, 55.0]
2025-08-07 02:22:33,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 64/100 (estimated time remaining: 55 minutes)
2025-08-07 02:24:02,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:24:03,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 61.73928 ± 82.100
2025-08-07 02:24:03,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [80.645836, 15.590289, 36.12454, 18.856802, 16.423412, 10.614207, 286.50937, 118.43193, 13.030211, 21.166203]
2025-08-07 02:24:03,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [64.0, 17.0, 32.0, 26.0, 18.0, 21.0, 136.0, 85.0, 16.0, 29.0]
2025-08-07 02:24:03,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 65/100 (estimated time remaining: 53 minutes, 32 seconds)
2025-08-07 02:25:33,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:25:33,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 30.15720 ± 40.859
2025-08-07 02:25:33,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [14.095684, 12.37522, 14.247732, 31.742592, 14.344402, 14.795273, 14.651261, 8.311376, 151.16231, 25.846117]
2025-08-07 02:25:33,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [17.0, 15.0, 15.0, 29.0, 31.0, 19.0, 27.0, 11.0, 192.0, 27.0]
2025-08-07 02:25:33,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 66/100 (estimated time remaining: 52 minutes, 10 seconds)
2025-08-07 02:27:01,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:27:02,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 41.13886 ± 33.902
2025-08-07 02:27:02,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [87.54727, 11.8623495, 74.69772, 20.554762, 22.487276, 18.441164, 17.773504, 38.160378, 108.96368, 10.900499]
2025-08-07 02:27:02,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [65.0, 22.0, 48.0, 21.0, 23.0, 21.0, 24.0, 31.0, 78.0, 16.0]
2025-08-07 02:27:02,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 67/100 (estimated time remaining: 50 minutes, 40 seconds)
2025-08-07 02:28:31,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:28:31,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 31.06170 ± 42.317
2025-08-07 02:28:31,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [18.703445, 11.614182, 156.04343, 32.515205, 21.075527, 26.62298, 10.716762, 7.745742, 10.613413, 14.96628]
2025-08-07 02:28:31,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [18.0, 27.0, 79.0, 32.0, 32.0, 24.0, 18.0, 12.0, 18.0, 16.0]
2025-08-07 02:28:31,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 68/100 (estimated time remaining: 49 minutes, 11 seconds)
2025-08-07 02:30:00,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:30:00,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 62.43782 ± 78.950
2025-08-07 02:30:00,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [201.02438, 14.716102, 10.944256, 84.21227, 27.036385, 10.418554, 21.35198, 227.62022, 12.125462, 14.928578]
2025-08-07 02:30:00,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [138.0, 18.0, 18.0, 49.0, 31.0, 13.0, 28.0, 110.0, 15.0, 16.0]
2025-08-07 02:30:00,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 69/100 (estimated time remaining: 47 minutes, 39 seconds)
2025-08-07 02:31:29,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:31:30,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 63.70636 ± 70.960
2025-08-07 02:31:30,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [18.500294, 19.700018, 13.992783, 15.1900425, 185.60573, 186.3573, 20.39969, 27.941042, 10.466576, 138.91014]
2025-08-07 02:31:30,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [24.0, 22.0, 19.0, 20.0, 108.0, 111.0, 27.0, 26.0, 20.0, 108.0]
2025-08-07 02:31:30,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 70/100 (estimated time remaining: 46 minutes, 8 seconds)
2025-08-07 02:32:59,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:32:59,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 31.38923 ± 54.603
2025-08-07 02:32:59,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [8.49107, 195.01535, 12.342255, 15.553565, 14.386407, 8.425202, 14.56066, 15.994152, 15.088459, 14.03509]
2025-08-07 02:32:59,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [12.0, 138.0, 21.0, 21.0, 23.0, 22.0, 21.0, 28.0, 18.0, 18.0]
2025-08-07 02:32:59,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 71/100 (estimated time remaining: 44 minutes, 34 seconds)
2025-08-07 02:34:28,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:34:28,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 45.62088 ± 56.284
2025-08-07 02:34:28,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [14.413228, 167.78165, 17.049057, 19.530287, 27.921478, 20.283949, 13.871146, 15.106174, 13.008534, 147.24332]
2025-08-07 02:34:28,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [32.0, 97.0, 28.0, 53.0, 31.0, 32.0, 19.0, 17.0, 15.0, 85.0]
2025-08-07 02:34:28,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 72/100 (estimated time remaining: 43 minutes, 11 seconds)
2025-08-07 02:35:57,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:35:57,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 53.99517 ± 61.803
2025-08-07 02:35:57,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [195.3269, 122.00586, 20.640598, 11.696531, 13.572177, 14.358515, 18.36509, 10.235851, 22.301422, 111.44868]
2025-08-07 02:35:57,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [122.0, 91.0, 24.0, 16.0, 18.0, 17.0, 28.0, 17.0, 25.0, 108.0]
2025-08-07 02:35:57,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 73/100 (estimated time remaining: 41 minutes, 37 seconds)
2025-08-07 02:37:26,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:37:27,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 36.70448 ± 43.064
2025-08-07 02:37:27,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [21.244951, 12.999664, 21.02295, 20.295868, 21.950096, 105.87233, 8.675071, 7.7057133, 136.3789, 10.899188]
2025-08-07 02:37:27,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [24.0, 16.0, 21.0, 33.0, 29.0, 81.0, 12.0, 10.0, 89.0, 17.0]
2025-08-07 02:37:27,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 74/100 (estimated time remaining: 40 minutes, 10 seconds)
2025-08-07 02:38:55,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:38:56,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 27.19566 ± 32.674
2025-08-07 02:38:56,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [123.78329, 18.911652, 23.286074, 22.920252, 11.505375, 8.917425, 18.43046, 8.373944, 12.396326, 23.431768]
2025-08-07 02:38:56,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [112.0, 24.0, 31.0, 22.0, 13.0, 16.0, 23.0, 26.0, 17.0, 24.0]
2025-08-07 02:38:56,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 75/100 (estimated time remaining: 38 minutes, 40 seconds)
2025-08-07 02:40:24,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:40:25,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 30.95372 ± 40.839
2025-08-07 02:40:25,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [151.8659, 9.134577, 22.49512, 13.756235, 20.376232, 24.22183, 18.968237, 8.73328, 29.480873, 10.504958]
2025-08-07 02:40:25,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [101.0, 12.0, 30.0, 17.0, 22.0, 34.0, 21.0, 13.0, 27.0, 13.0]
2025-08-07 02:40:25,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 76/100 (estimated time remaining: 37 minutes, 9 seconds)
2025-08-07 02:41:54,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:41:54,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 36.75779 ± 46.052
2025-08-07 02:41:54,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [22.055962, 10.601499, 112.251434, 11.8626995, 9.335205, 8.828566, 19.64827, 142.51003, 10.045508, 20.438774]
2025-08-07 02:41:54,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [21.0, 14.0, 78.0, 15.0, 16.0, 14.0, 33.0, 105.0, 15.0, 28.0]
2025-08-07 02:41:54,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 77/100 (estimated time remaining: 35 minutes, 39 seconds)
2025-08-07 02:43:24,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:43:25,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 27.85028 ± 27.091
2025-08-07 02:43:25,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [13.493138, 107.937935, 12.18156, 15.534003, 17.577513, 26.756987, 19.635223, 26.415184, 17.799967, 21.171228]
2025-08-07 02:43:25,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [27.0, 92.0, 18.0, 26.0, 18.0, 24.0, 24.0, 27.0, 30.0, 26.0]
2025-08-07 02:43:25,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 78/100 (estimated time remaining: 34 minutes, 16 seconds)
2025-08-07 02:44:53,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:44:53,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 22.72657 ± 29.354
2025-08-07 02:44:53,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [17.31006, 7.7183533, 10.554711, 11.070929, 12.560387, 9.736615, 110.2033, 18.840387, 16.58746, 12.683546]
2025-08-07 02:44:53,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [23.0, 11.0, 16.0, 13.0, 15.0, 12.0, 81.0, 22.0, 23.0, 16.0]
2025-08-07 02:44:53,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 79/100 (estimated time remaining: 32 minutes, 43 seconds)
2025-08-07 02:46:22,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:46:23,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 51.38370 ± 65.439
2025-08-07 02:46:23,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [9.789156, 170.84462, 185.92226, 10.878955, 12.184345, 14.767983, 13.465768, 64.89974, 12.948678, 18.135443]
2025-08-07 02:46:23,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 86.0, 103.0, 27.0, 15.0, 16.0, 17.0, 42.0, 16.0, 33.0]
2025-08-07 02:46:23,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 80/100 (estimated time remaining: 31 minutes, 17 seconds)
2025-08-07 02:47:51,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:47:52,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 73.34450 ± 89.384
2025-08-07 02:47:52,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [156.15166, 197.22449, 19.708265, 12.505648, 260.8986, 16.579798, 14.229382, 15.186476, 31.6164, 9.344157]
2025-08-07 02:47:52,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [143.0, 106.0, 25.0, 16.0, 153.0, 22.0, 20.0, 32.0, 32.0, 13.0]
2025-08-07 02:47:52,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 81/100 (estimated time remaining: 29 minutes, 47 seconds)
2025-08-07 02:49:21,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:49:21,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 40.08585 ± 52.368
2025-08-07 02:49:21,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [94.02737, 22.679268, 12.824185, 17.119213, 11.813162, 10.724499, 27.47216, 179.87779, 15.225133, 9.095693]
2025-08-07 02:49:21,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [72.0, 31.0, 18.0, 19.0, 23.0, 20.0, 24.0, 124.0, 16.0, 16.0]
2025-08-07 02:49:21,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 82/100 (estimated time remaining: 28 minutes, 17 seconds)
2025-08-07 02:50:50,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:50:51,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 43.28124 ± 44.513
2025-08-07 02:50:51,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [16.291729, 15.7271805, 141.3204, 17.663593, 20.9789, 9.327336, 106.501175, 12.834801, 73.18244, 18.98486]
2025-08-07 02:50:51,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [20.0, 20.0, 82.0, 25.0, 30.0, 12.0, 82.0, 28.0, 55.0, 19.0]
2025-08-07 02:50:51,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 83/100 (estimated time remaining: 26 minutes, 46 seconds)
2025-08-07 02:52:20,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:52:20,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 53.66095 ± 78.101
2025-08-07 02:52:20,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [22.560015, 13.29821, 21.731995, 77.97969, 7.6872716, 278.18335, 21.915554, 11.621379, 64.92152, 16.710505]
2025-08-07 02:52:20,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [27.0, 20.0, 28.0, 53.0, 14.0, 146.0, 22.0, 25.0, 58.0, 19.0]
2025-08-07 02:52:20,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 84/100 (estimated time remaining: 25 minutes, 20 seconds)
2025-08-07 02:53:49,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:53:50,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 33.26636 ± 42.269
2025-08-07 02:53:50,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [11.727398, 13.039025, 18.824406, 9.326369, 10.939578, 87.773315, 15.889136, 9.296464, 15.1643915, 140.68347]
2025-08-07 02:53:50,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [18.0, 21.0, 23.0, 12.0, 19.0, 69.0, 21.0, 16.0, 27.0, 112.0]
2025-08-07 02:53:50,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 85/100 (estimated time remaining: 23 minutes, 49 seconds)
2025-08-07 02:55:19,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:55:19,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 46.57055 ± 67.888
2025-08-07 02:55:19,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [17.858831, 18.176792, 17.942469, 20.523163, 11.247523, 241.37564, 9.72322, 81.303825, 14.849733, 32.704357]
2025-08-07 02:55:19,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [26.0, 19.0, 26.0, 20.0, 14.0, 179.0, 18.0, 63.0, 23.0, 33.0]
2025-08-07 02:55:19,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 86/100 (estimated time remaining: 22 minutes, 23 seconds)
2025-08-07 02:56:48,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:56:49,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 65.55225 ± 94.738
2025-08-07 02:56:49,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [126.75214, 84.08857, 14.0250435, 327.68185, 28.581396, 14.218085, 20.047096, 11.14245, 9.551565, 19.434292]
2025-08-07 02:56:49,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [98.0, 65.0, 21.0, 172.0, 28.0, 16.0, 25.0, 18.0, 19.0, 19.0]
2025-08-07 02:56:49,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 87/100 (estimated time remaining: 20 minutes, 53 seconds)
2025-08-07 02:58:18,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:58:18,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 22.56662 ± 27.077
2025-08-07 02:58:18,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [14.061435, 15.321139, 11.83849, 102.8985, 18.522318, 10.298494, 14.154011, 6.551787, 10.600324, 21.41972]
2025-08-07 02:58:18,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [18.0, 16.0, 22.0, 70.0, 30.0, 14.0, 19.0, 14.0, 21.0, 30.0]
2025-08-07 02:58:18,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 88/100 (estimated time remaining: 19 minutes, 23 seconds)
2025-08-07 02:59:48,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:59:48,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 24.46862 ± 21.765
2025-08-07 02:59:48,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [25.37308, 88.52829, 19.647055, 13.683885, 22.930147, 12.005562, 18.525507, 12.577733, 17.314064, 14.100882]
2025-08-07 02:59:48,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [32.0, 62.0, 29.0, 14.0, 32.0, 14.0, 21.0, 14.0, 22.0, 21.0]
2025-08-07 02:59:48,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 89/100 (estimated time remaining: 17 minutes, 54 seconds)
2025-08-07 03:01:16,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:01:17,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 32.72340 ± 36.365
2025-08-07 03:01:17,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [22.991993, 128.88492, 16.706598, 12.563247, 72.28609, 10.564389, 16.7002, 18.969305, 13.908561, 13.658705]
2025-08-07 03:01:17,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [27.0, 91.0, 27.0, 22.0, 59.0, 19.0, 18.0, 26.0, 22.0, 16.0]
2025-08-07 03:01:17,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 90/100 (estimated time remaining: 16 minutes, 24 seconds)
2025-08-07 03:02:46,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:02:46,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 48.35697 ± 49.531
2025-08-07 03:02:46,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [15.451207, 10.24198, 134.5606, 144.18663, 19.698353, 82.3407, 15.810737, 22.526009, 18.947151, 19.806377]
2025-08-07 03:02:46,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [33.0, 14.0, 93.0, 109.0, 20.0, 57.0, 19.0, 30.0, 24.0, 21.0]
2025-08-07 03:02:46,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 91/100 (estimated time remaining: 14 minutes, 53 seconds)
2025-08-07 03:04:15,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:04:16,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 26.46448 ± 24.277
2025-08-07 03:04:16,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [10.676595, 9.830236, 24.03705, 19.420187, 24.422682, 93.23724, 9.870603, 42.994225, 19.641039, 10.514937]
2025-08-07 03:04:16,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [13.0, 15.0, 28.0, 21.0, 23.0, 60.0, 13.0, 51.0, 24.0, 12.0]
2025-08-07 03:04:16,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 92/100 (estimated time remaining: 13 minutes, 24 seconds)
2025-08-07 03:05:45,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:05:45,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 47.46732 ± 65.628
2025-08-07 03:05:45,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [31.204466, 95.995155, 21.520956, 12.187607, 230.35567, 17.451576, 13.485073, 32.32734, 11.335617, 8.809785]
2025-08-07 03:05:45,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [32.0, 86.0, 26.0, 17.0, 134.0, 19.0, 18.0, 26.0, 29.0, 13.0]
2025-08-07 03:05:45,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 93/100 (estimated time remaining: 11 minutes, 54 seconds)
2025-08-07 03:07:15,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:07:16,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 62.85690 ± 76.026
2025-08-07 03:07:16,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [22.6591, 15.68402, 213.83722, 14.370961, 17.24619, 13.313695, 187.68253, 11.777751, 10.917297, 121.08019]
2025-08-07 03:07:16,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [26.0, 17.0, 141.0, 19.0, 17.0, 22.0, 122.0, 18.0, 25.0, 79.0]
2025-08-07 03:07:16,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 94/100 (estimated time remaining: 10 minutes, 26 seconds)
2025-08-07 03:08:44,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:08:44,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 40.60579 ± 51.391
2025-08-07 03:08:44,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [25.178043, 15.827825, 8.759452, 154.71416, 14.350039, 12.281224, 13.429845, 15.623872, 15.60003, 130.29338]
2025-08-07 03:08:44,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [23.0, 24.0, 23.0, 107.0, 30.0, 16.0, 25.0, 25.0, 20.0, 84.0]
2025-08-07 03:08:44,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 95/100 (estimated time remaining: 8 minutes, 56 seconds)
2025-08-07 03:10:13,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:10:14,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 28.92209 ± 28.999
2025-08-07 03:10:14,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [21.521936, 33.343613, 10.901009, 10.73551, 19.250755, 22.862875, 19.10682, 113.84685, 14.530154, 23.121416]
2025-08-07 03:10:14,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [33.0, 26.0, 15.0, 15.0, 22.0, 32.0, 22.0, 92.0, 18.0, 22.0]
2025-08-07 03:10:14,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 96/100 (estimated time remaining: 7 minutes, 27 seconds)
2025-08-07 03:11:43,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:11:43,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 43.22656 ± 42.511
2025-08-07 03:11:43,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [57.53448, 16.981554, 13.167533, 17.141909, 107.94333, 138.89992, 13.324176, 28.588417, 18.115576, 20.568716]
2025-08-07 03:11:43,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [55.0, 19.0, 19.0, 27.0, 74.0, 109.0, 23.0, 30.0, 21.0, 27.0]
2025-08-07 03:11:44,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 97/100 (estimated time remaining: 5 minutes, 58 seconds)
2025-08-07 03:13:12,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:13:13,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 30.43792 ± 30.859
2025-08-07 03:13:13,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [19.535313, 9.321037, 9.955559, 19.730053, 19.497799, 14.6048355, 119.062836, 24.523394, 42.369797, 25.7786]
2025-08-07 03:13:13,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [21.0, 11.0, 12.0, 22.0, 27.0, 17.0, 87.0, 25.0, 34.0, 32.0]
2025-08-07 03:13:13,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 98/100 (estimated time remaining: 4 minutes, 28 seconds)
2025-08-07 03:14:42,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:14:42,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 50.69016 ± 71.839
2025-08-07 03:14:42,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [12.47557, 13.7779665, 13.346489, 21.945992, 19.206038, 16.894207, 223.33255, 10.274996, 16.286402, 159.36136]
2025-08-07 03:14:42,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 18.0, 15.0, 27.0, 22.0, 22.0, 182.0, 13.0, 26.0, 105.0]
2025-08-07 03:14:42,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 99/100 (estimated time remaining: 2 minutes, 58 seconds)
2025-08-07 03:16:11,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:16:12,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 29.60286 ± 35.913
2025-08-07 03:16:12,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [14.77171, 20.563278, 19.04494, 11.408007, 22.855503, 136.65671, 20.82221, 16.964296, 10.748718, 22.193274]
2025-08-07 03:16:12,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [25.0, 26.0, 24.0, 15.0, 23.0, 99.0, 23.0, 22.0, 15.0, 21.0]
2025-08-07 03:16:12,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 29 seconds)
2025-08-07 03:17:41,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:17:41,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 36.50133 ± 42.599
2025-08-07 03:17:41,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [14.3414135, 18.687334, 14.159242, 8.845431, 124.03953, 18.42537, 15.065703, 11.327094, 118.736374, 21.385763]
2025-08-07 03:17:41,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [17.0, 21.0, 17.0, 11.0, 76.0, 20.0, 18.0, 19.0, 93.0, 29.0]
2025-08-07 03:17:41,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1251 [DEBUG]: Training session finished
