2025-08-07 06:46:11,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc25-hopper/ExtremeClogL1U23-bpql-mem24
2025-08-07 06:46:11,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc25-hopper/ExtremeClogL1U23-bpql-mem24
2025-08-07 06:46:11,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x148ce11e7750>}
2025-08-07 06:46:11,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1111 [DEBUG]: using device: cuda
2025-08-07 06:46:11,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1133 [INFO]: Creating new trainer
2025-08-07 06:46:11,820 baseline-bpql-noiseperc25-hopper:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=83, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-08-07 06:46:11,820 baseline-bpql-noiseperc25-hopper:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 06:46:13,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1194 [DEBUG]: Starting training session...
2025-08-07 06:46:13,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 1/100
2025-08-07 06:47:43,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:47:43,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 32.19659 ± 25.626
2025-08-07 06:47:43,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [15.427247, 11.871544, 10.674203, 25.160547, 12.509277, 14.10252, 61.929165, 84.18621, 22.776468, 63.32869]
2025-08-07 06:47:43,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [16.0, 14.0, 21.0, 29.0, 24.0, 18.0, 56.0, 50.0, 22.0, 44.0]
2025-08-07 06:47:43,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1226 [INFO]: New best (32.20) for latency ExtremeClogL1U23
2025-08-07 06:47:43,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 29 minutes, 1 second)
2025-08-07 06:49:19,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:49:20,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 47.70568 ± 50.177
2025-08-07 06:49:20,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [19.18391, 54.824677, 96.47522, 176.52881, 14.129321, 9.639438, 24.466139, 14.669641, 14.443402, 52.696243]
2025-08-07 06:49:20,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [21.0, 43.0, 67.0, 123.0, 17.0, 12.0, 22.0, 17.0, 21.0, 38.0]
2025-08-07 06:49:20,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1226 [INFO]: New best (47.71) for latency ExtremeClogL1U23
2025-08-07 06:49:20,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 32 minutes, 52 seconds)
2025-08-07 06:50:56,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:50:57,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 22.74495 ± 16.433
2025-08-07 06:50:57,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [20.400616, 17.47939, 11.065078, 17.903044, 15.539392, 70.7529, 12.22149, 20.589048, 17.045015, 24.4535]
2025-08-07 06:50:57,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [21.0, 18.0, 18.0, 43.0, 16.0, 47.0, 15.0, 26.0, 21.0, 28.0]
2025-08-07 06:50:57,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 33 minutes, 1 second)
2025-08-07 06:52:33,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:52:33,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 62.73210 ± 58.375
2025-08-07 06:52:33,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [15.331473, 193.21819, 89.42768, 13.285946, 9.211563, 13.892624, 81.34793, 79.26621, 12.543614, 119.795784]
2025-08-07 06:52:33,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 123.0, 78.0, 21.0, 13.0, 17.0, 74.0, 68.0, 18.0, 118.0]
2025-08-07 06:52:33,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1226 [INFO]: New best (62.73) for latency ExtremeClogL1U23
2025-08-07 06:52:33,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 32 minutes, 16 seconds)
2025-08-07 06:54:10,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:54:10,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 48.22335 ± 66.169
2025-08-07 06:54:10,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [48.008827, 16.653345, 235.71033, 12.896515, 36.819237, 9.495584, 13.186705, 81.93853, 16.735868, 10.788617]
2025-08-07 06:54:10,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [45.0, 18.0, 112.0, 24.0, 55.0, 11.0, 22.0, 52.0, 21.0, 14.0]
2025-08-07 06:54:10,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 31 minutes, 15 seconds)
2025-08-07 06:55:47,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:55:47,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 49.86285 ± 52.588
2025-08-07 06:55:47,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [116.04127, 91.94649, 13.496484, 12.020807, 13.313871, 8.893284, 159.79433, 9.065592, 8.245549, 65.810776]
2025-08-07 06:55:47,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [68.0, 65.0, 23.0, 14.0, 15.0, 17.0, 111.0, 11.0, 12.0, 69.0]
2025-08-07 06:55:47,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 31 minutes, 46 seconds)
2025-08-07 06:57:24,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:57:25,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 64.80437 ± 56.764
2025-08-07 06:57:25,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [128.12392, 18.56468, 108.368454, 180.26398, 12.293169, 68.614716, 10.960266, 87.66017, 18.030071, 15.164227]
2025-08-07 06:57:25,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [65.0, 17.0, 56.0, 107.0, 20.0, 50.0, 15.0, 55.0, 23.0, 23.0]
2025-08-07 06:57:25,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1226 [INFO]: New best (64.80) for latency ExtremeClogL1U23
2025-08-07 06:57:25,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 30 minutes, 22 seconds)
2025-08-07 06:59:02,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:59:02,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 46.55009 ± 50.002
2025-08-07 06:59:02,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [14.468804, 8.87155, 15.395126, 11.286415, 74.5351, 7.090875, 92.35628, 17.054085, 168.81396, 55.62867]
2025-08-07 06:59:02,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [22.0, 12.0, 16.0, 15.0, 76.0, 11.0, 63.0, 18.0, 91.0, 46.0]
2025-08-07 06:59:02,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 28 minutes, 54 seconds)
2025-08-07 07:00:39,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:00:39,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 20.12107 ± 19.375
2025-08-07 07:00:39,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [8.441101, 65.44815, 8.043763, 12.079281, 50.927376, 10.379197, 10.804149, 10.679865, 10.293659, 14.114147]
2025-08-07 07:00:39,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [15.0, 49.0, 14.0, 14.0, 41.0, 21.0, 17.0, 15.0, 12.0, 15.0]
2025-08-07 07:00:39,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 27 minutes, 24 seconds)
2025-08-07 07:02:16,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:02:17,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 31.01883 ± 53.438
2025-08-07 07:02:17,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [13.045728, 191.11958, 10.109978, 16.204128, 15.442823, 14.004276, 9.437626, 18.353218, 12.567731, 9.903233]
2025-08-07 07:02:17,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [17.0, 113.0, 25.0, 17.0, 24.0, 17.0, 16.0, 20.0, 19.0, 17.0]
2025-08-07 07:02:17,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 25 minutes, 54 seconds)
2025-08-07 07:03:53,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:03:54,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 42.13970 ± 36.523
2025-08-07 07:03:54,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [70.33137, 102.76596, 16.522345, 9.132181, 79.10494, 90.99096, 14.594419, 13.828256, 11.342673, 12.783912]
2025-08-07 07:03:54,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [42.0, 69.0, 20.0, 19.0, 79.0, 68.0, 20.0, 22.0, 16.0, 18.0]
2025-08-07 07:03:54,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 24 minutes, 15 seconds)
2025-08-07 07:05:31,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:05:31,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 38.83070 ± 50.555
2025-08-07 07:05:31,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [7.812436, 152.53854, 13.780892, 9.402433, 17.891632, 16.648045, 16.80161, 12.86314, 125.528564, 15.039739]
2025-08-07 07:05:31,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [18.0, 101.0, 19.0, 13.0, 20.0, 18.0, 23.0, 15.0, 83.0, 19.0]
2025-08-07 07:05:31,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 22 minutes, 40 seconds)
2025-08-07 07:07:09,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:07:09,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 53.39971 ± 79.994
2025-08-07 07:07:09,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [8.990837, 24.374737, 12.29487, 16.507952, 156.71098, 9.327411, 16.930542, 256.814, 18.035234, 14.010516]
2025-08-07 07:07:09,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [14.0, 21.0, 14.0, 18.0, 107.0, 13.0, 22.0, 140.0, 21.0, 18.0]
2025-08-07 07:07:09,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 21 minutes, 17 seconds)
2025-08-07 07:08:46,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:08:47,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 39.25852 ± 29.719
2025-08-07 07:08:47,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [23.45018, 30.567211, 15.467793, 15.378731, 7.760176, 41.717987, 14.2016735, 64.41942, 89.45455, 90.167496]
2025-08-07 07:08:47,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [23.0, 26.0, 17.0, 17.0, 12.0, 69.0, 24.0, 68.0, 87.0, 66.0]
2025-08-07 07:08:47,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 19 minutes, 43 seconds)
2025-08-07 07:10:24,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:10:25,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 48.04210 ± 43.023
2025-08-07 07:10:25,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [108.611374, 20.75533, 110.420166, 14.252308, 61.398907, 12.443303, 12.255348, 13.081131, 111.71926, 15.483889]
2025-08-07 07:10:25,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [86.0, 23.0, 69.0, 17.0, 79.0, 19.0, 20.0, 16.0, 69.0, 18.0]
2025-08-07 07:10:25,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 18 minutes, 18 seconds)
2025-08-07 07:12:03,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:12:03,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 29.64290 ± 32.197
2025-08-07 07:12:03,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [12.822481, 7.369714, 14.595595, 112.281044, 13.983944, 7.6259327, 38.6755, 11.872403, 13.659492, 63.542885]
2025-08-07 07:12:03,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 11.0, 23.0, 82.0, 26.0, 12.0, 44.0, 14.0, 15.0, 69.0]
2025-08-07 07:12:03,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 17 minutes, 4 seconds)
2025-08-07 07:13:41,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:13:41,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 27.98225 ± 42.806
2025-08-07 07:13:41,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [8.283457, 15.459246, 12.557579, 155.52722, 13.895379, 7.9499297, 24.111784, 20.421854, 8.439227, 13.176829]
2025-08-07 07:13:41,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [17.0, 17.0, 14.0, 155.0, 15.0, 15.0, 21.0, 21.0, 11.0, 22.0]
2025-08-07 07:13:41,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 15 minutes, 33 seconds)
2025-08-07 07:15:20,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:15:21,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 50.81427 ± 33.207
2025-08-07 07:15:21,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [15.596829, 61.290817, 6.999913, 101.759346, 75.791145, 86.641365, 18.633253, 16.321888, 43.144352, 81.96383]
2025-08-07 07:15:21,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [17.0, 49.0, 10.0, 85.0, 76.0, 115.0, 19.0, 18.0, 59.0, 105.0]
2025-08-07 07:15:21,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 14 minutes, 16 seconds)
2025-08-07 07:17:03,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:17:03,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 32.14297 ± 35.246
2025-08-07 07:17:03,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [60.567463, 109.25809, 15.209045, 7.4789033, 8.920951, 11.984999, 79.83218, 9.732462, 8.79956, 9.646054]
2025-08-07 07:17:03,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [79.0, 90.0, 17.0, 10.0, 13.0, 14.0, 64.0, 18.0, 13.0, 16.0]
2025-08-07 07:17:03,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 14 minutes, 4 seconds)
2025-08-07 07:18:39,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:18:40,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 88.88944 ± 76.469
2025-08-07 07:18:40,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [16.476852, 22.359531, 13.961661, 90.20016, 177.08528, 162.17775, 174.84747, 198.76369, 23.642101, 9.379975]
2025-08-07 07:18:40,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 40.0, 14.0, 78.0, 203.0, 125.0, 150.0, 207.0, 26.0, 13.0]
2025-08-07 07:18:40,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1226 [INFO]: New best (88.89) for latency ExtremeClogL1U23
2025-08-07 07:18:40,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 12 minutes, 10 seconds)
2025-08-07 07:20:20,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:20:20,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 44.52446 ± 53.472
2025-08-07 07:20:20,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [24.970417, 12.603173, 76.84042, 75.53552, 187.03369, 20.12184, 10.114007, 18.65569, 10.024662, 9.345165]
2025-08-07 07:20:20,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [22.0, 14.0, 52.0, 67.0, 135.0, 39.0, 15.0, 18.0, 14.0, 15.0]
2025-08-07 07:20:20,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 10 minutes, 58 seconds)
2025-08-07 07:21:58,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:21:58,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 25.42353 ± 36.155
2025-08-07 07:21:58,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [12.47715, 16.942455, 10.386068, 16.842337, 21.499949, 10.996623, 13.604116, 133.28645, 10.068791, 8.13133]
2025-08-07 07:21:58,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [22.0, 24.0, 22.0, 24.0, 20.0, 22.0, 17.0, 88.0, 14.0, 10.0]
2025-08-07 07:21:58,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 9 minutes, 8 seconds)
2025-08-07 07:23:35,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:23:36,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 62.67837 ± 48.697
2025-08-07 07:23:36,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [137.05258, 90.75317, 6.446408, 10.987055, 85.6423, 99.351006, 9.358465, 52.4122, 9.531711, 125.248825]
2025-08-07 07:23:36,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [89.0, 73.0, 9.0, 18.0, 58.0, 73.0, 11.0, 97.0, 16.0, 92.0]
2025-08-07 07:23:36,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 7 minutes, 12 seconds)
2025-08-07 07:25:14,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:25:14,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 13.71509 ± 5.786
2025-08-07 07:25:14,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [23.73546, 8.238592, 16.1792, 14.053735, 7.6439795, 17.24654, 8.3999605, 7.789661, 22.623465, 11.240351]
2025-08-07 07:25:14,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [25.0, 11.0, 20.0, 15.0, 10.0, 30.0, 12.0, 13.0, 21.0, 14.0]
2025-08-07 07:25:14,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 4 minutes, 26 seconds)
2025-08-07 07:26:52,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:26:52,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 41.56278 ± 39.202
2025-08-07 07:26:52,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [7.3307867, 123.94067, 12.258772, 12.258099, 13.10738, 76.74723, 47.066895, 90.03141, 15.121243, 17.765345]
2025-08-07 07:26:52,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [11.0, 91.0, 18.0, 15.0, 15.0, 67.0, 35.0, 70.0, 17.0, 18.0]
2025-08-07 07:26:52,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 3 minutes)
2025-08-07 07:28:30,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:28:30,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 44.17171 ± 53.355
2025-08-07 07:28:30,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [124.681755, 101.404366, 7.697404, 16.53572, 6.0237694, 9.04975, 10.325642, 8.952338, 10.7702055, 146.27615]
2025-08-07 07:28:30,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [65.0, 76.0, 14.0, 18.0, 9.0, 13.0, 13.0, 17.0, 19.0, 96.0]
2025-08-07 07:28:31,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 53 seconds)
2025-08-07 07:30:08,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:30:09,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 34.62140 ± 34.860
2025-08-07 07:30:09,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [65.62996, 7.5565395, 8.283145, 12.919504, 18.931976, 76.58148, 112.52867, 13.786326, 12.48771, 17.508732]
2025-08-07 07:30:09,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [45.0, 10.0, 10.0, 18.0, 19.0, 76.0, 86.0, 15.0, 18.0, 18.0]
2025-08-07 07:30:09,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 28/100 (estimated time remaining: 1 hour, 59 minutes, 26 seconds)
2025-08-07 07:31:47,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:31:47,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 54.38894 ± 93.232
2025-08-07 07:31:47,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [14.872816, 16.68089, 18.687181, 12.105749, 119.54117, 13.524159, 8.091487, 317.33923, 11.6778145, 11.368981]
2025-08-07 07:31:47,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [18.0, 25.0, 23.0, 20.0, 98.0, 17.0, 11.0, 213.0, 19.0, 24.0]
2025-08-07 07:31:47,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 29/100 (estimated time remaining: 1 hour, 57 minutes, 51 seconds)
2025-08-07 07:33:26,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:33:26,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 48.03114 ± 47.608
2025-08-07 07:33:26,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [10.280649, 11.969359, 10.3349495, 150.96739, 13.577168, 81.358894, 92.38685, 82.97595, 10.524235, 15.935967]
2025-08-07 07:33:26,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [12.0, 19.0, 13.0, 73.0, 19.0, 60.0, 77.0, 62.0, 12.0, 18.0]
2025-08-07 07:33:26,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 30/100 (estimated time remaining: 1 hour, 56 minutes, 23 seconds)
2025-08-07 07:35:04,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:35:05,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 57.90825 ± 80.858
2025-08-07 07:35:05,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [13.007194, 11.723335, 60.02982, 285.3071, 9.384702, 74.945274, 19.427223, 10.794844, 85.588196, 8.874759]
2025-08-07 07:35:05,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [14.0, 24.0, 37.0, 204.0, 11.0, 71.0, 23.0, 19.0, 53.0, 11.0]
2025-08-07 07:35:05,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 54 minutes, 51 seconds)
2025-08-07 07:36:43,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:36:43,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 56.36927 ± 54.690
2025-08-07 07:36:43,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [141.50722, 16.510437, 10.357743, 11.09297, 94.26355, 11.742111, 149.48447, 95.805695, 10.635081, 22.293453]
2025-08-07 07:36:43,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [87.0, 16.0, 12.0, 21.0, 66.0, 16.0, 92.0, 75.0, 14.0, 20.0]
2025-08-07 07:36:43,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 53 minutes, 20 seconds)
2025-08-07 07:38:22,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:38:22,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 34.80236 ± 45.130
2025-08-07 07:38:22,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [9.153202, 8.974178, 158.1035, 11.736544, 16.130928, 15.895914, 74.9036, 26.387012, 12.155036, 14.583634]
2025-08-07 07:38:22,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [14.0, 16.0, 101.0, 22.0, 20.0, 18.0, 52.0, 25.0, 14.0, 16.0]
2025-08-07 07:38:22,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 51 minutes, 47 seconds)
2025-08-07 07:40:00,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:40:01,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 68.74442 ± 78.351
2025-08-07 07:40:01,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [119.48983, 19.331911, 13.681105, 75.680954, 9.531429, 218.69365, 7.6042633, 12.484601, 199.54903, 11.397448]
2025-08-07 07:40:01,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [82.0, 22.0, 19.0, 44.0, 14.0, 105.0, 23.0, 21.0, 92.0, 20.0]
2025-08-07 07:40:01,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 50 minutes, 11 seconds)
2025-08-07 07:41:40,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:41:40,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 38.66616 ± 53.134
2025-08-07 07:41:40,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [27.133314, 12.787928, 67.5773, 190.12993, 19.231792, 12.800767, 12.339034, 9.705036, 26.871141, 8.08537]
2025-08-07 07:41:40,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [25.0, 16.0, 47.0, 95.0, 21.0, 15.0, 18.0, 22.0, 23.0, 11.0]
2025-08-07 07:41:40,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 48 minutes, 36 seconds)
2025-08-07 07:43:17,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:43:18,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 73.65771 ± 56.142
2025-08-07 07:43:18,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [17.446774, 7.000085, 133.85384, 78.0408, 126.533676, 124.0228, 64.758125, 15.061088, 10.182482, 159.67741]
2025-08-07 07:43:18,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [22.0, 9.0, 98.0, 52.0, 113.0, 91.0, 38.0, 19.0, 19.0, 143.0]
2025-08-07 07:43:18,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 46 minutes, 49 seconds)
2025-08-07 07:44:56,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:44:56,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 62.39288 ± 52.982
2025-08-07 07:44:56,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [122.440956, 9.276467, 20.282614, 9.493051, 21.631119, 15.927084, 89.66236, 88.40161, 170.23926, 76.574295]
2025-08-07 07:44:56,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [73.0, 16.0, 21.0, 23.0, 23.0, 23.0, 50.0, 72.0, 110.0, 46.0]
2025-08-07 07:44:56,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 45 minutes, 10 seconds)
2025-08-07 07:46:34,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:46:34,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 38.10644 ± 32.001
2025-08-07 07:46:34,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [7.243891, 66.745895, 13.112044, 10.903886, 23.062643, 8.3717785, 88.239525, 56.793907, 89.82467, 16.766157]
2025-08-07 07:46:34,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [11.0, 50.0, 17.0, 14.0, 22.0, 14.0, 95.0, 44.0, 59.0, 21.0]
2025-08-07 07:46:34,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 43 minutes, 23 seconds)
2025-08-07 07:48:11,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:48:11,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 40.65392 ± 53.313
2025-08-07 07:48:11,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [189.60802, 16.656979, 15.393293, 16.766401, 80.97154, 18.453512, 10.576667, 12.798949, 24.190851, 21.12296]
2025-08-07 07:48:11,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [135.0, 21.0, 16.0, 17.0, 63.0, 22.0, 22.0, 15.0, 22.0, 24.0]
2025-08-07 07:48:11,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 41 minutes, 26 seconds)
2025-08-07 07:49:48,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:49:48,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 29.75886 ± 31.862
2025-08-07 07:49:48,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [65.98408, 9.835103, 112.71031, 13.988109, 22.720984, 10.772397, 18.55968, 13.979081, 8.860156, 20.178642]
2025-08-07 07:49:48,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [47.0, 16.0, 118.0, 16.0, 24.0, 15.0, 21.0, 16.0, 16.0, 23.0]
2025-08-07 07:49:48,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 39 minutes, 17 seconds)
2025-08-07 07:51:26,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:51:26,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 59.36154 ± 72.241
2025-08-07 07:51:26,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [111.562706, 173.31815, 13.840068, 10.326031, 16.80222, 208.51059, 15.879326, 15.683733, 14.482075, 13.210523]
2025-08-07 07:51:26,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [80.0, 159.0, 18.0, 16.0, 23.0, 103.0, 21.0, 21.0, 19.0, 18.0]
2025-08-07 07:51:26,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 37 minutes, 40 seconds)
2025-08-07 07:53:02,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:53:02,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 59.09135 ± 86.505
2025-08-07 07:53:02,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [84.207985, 10.19418, 14.037777, 12.86733, 298.87637, 108.40467, 23.416756, 19.252867, 9.414985, 10.240534]
2025-08-07 07:53:02,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [62.0, 18.0, 25.0, 15.0, 157.0, 68.0, 24.0, 20.0, 14.0, 12.0]
2025-08-07 07:53:02,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 35 minutes, 32 seconds)
2025-08-07 07:54:38,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:54:38,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 40.83178 ± 52.605
2025-08-07 07:54:38,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [20.591393, 17.658535, 93.116615, 12.72093, 10.340322, 25.026665, 8.943463, 28.662975, 9.453772, 181.80315]
2025-08-07 07:54:38,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 17.0, 78.0, 17.0, 16.0, 22.0, 11.0, 27.0, 12.0, 84.0]
2025-08-07 07:54:38,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 33 minutes, 34 seconds)
2025-08-07 07:56:15,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:56:15,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 47.82061 ± 74.298
2025-08-07 07:56:15,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [120.75324, 15.874735, 7.2861648, 21.075817, 22.493309, 10.067857, 11.907596, 248.52058, 13.559645, 6.66721]
2025-08-07 07:56:15,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [72.0, 18.0, 12.0, 23.0, 21.0, 17.0, 17.0, 127.0, 15.0, 11.0]
2025-08-07 07:56:15,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 31 minutes, 51 seconds)
2025-08-07 07:57:51,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:57:51,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 38.94230 ± 38.012
2025-08-07 07:57:51,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [22.588793, 17.152716, 90.047585, 18.51185, 10.346152, 73.45523, 13.364246, 6.940895, 17.310175, 119.705376]
2025-08-07 07:57:51,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [25.0, 20.0, 62.0, 21.0, 14.0, 55.0, 19.0, 10.0, 18.0, 80.0]
2025-08-07 07:57:51,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 30 minutes, 8 seconds)
2025-08-07 07:59:27,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:59:28,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 70.14736 ± 73.927
2025-08-07 07:59:28,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [28.26114, 19.890692, 10.99093, 8.899639, 14.2490015, 167.69072, 7.726809, 205.54825, 160.6908, 77.52565]
2025-08-07 07:59:28,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [24.0, 25.0, 15.0, 14.0, 19.0, 141.0, 16.0, 129.0, 84.0, 45.0]
2025-08-07 07:59:28,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 28 minutes, 17 seconds)
2025-08-07 08:01:03,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:01:04,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 48.12416 ± 51.987
2025-08-07 08:01:04,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [7.531914, 136.48291, 141.89272, 18.048006, 98.74388, 18.478498, 16.153027, 9.4726715, 19.073927, 15.364067]
2025-08-07 08:01:04,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [13.0, 103.0, 106.0, 18.0, 76.0, 22.0, 19.0, 11.0, 23.0, 18.0]
2025-08-07 08:01:04,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 26 minutes, 42 seconds)
2025-08-07 08:02:40,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:02:40,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 28.45267 ± 31.928
2025-08-07 08:02:40,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [14.033839, 10.101163, 110.857574, 13.470585, 14.781068, 67.354164, 16.527523, 9.871364, 16.50336, 11.026064]
2025-08-07 08:02:40,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [15.0, 13.0, 69.0, 23.0, 16.0, 52.0, 18.0, 14.0, 22.0, 21.0]
2025-08-07 08:02:40,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 25 minutes, 7 seconds)
2025-08-07 08:04:16,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:04:17,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 46.20547 ± 52.856
2025-08-07 08:04:17,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [76.39056, 8.831029, 11.054417, 100.86924, 20.502243, 176.80777, 14.271938, 9.23686, 30.254358, 13.836286]
2025-08-07 08:04:17,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [96.0, 13.0, 16.0, 75.0, 21.0, 137.0, 19.0, 15.0, 25.0, 16.0]
2025-08-07 08:04:17,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 23 minutes, 32 seconds)
2025-08-07 08:05:51,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:05:52,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 28.19714 ± 30.788
2025-08-07 08:05:52,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [105.729, 10.959646, 18.74492, 11.369091, 11.30999, 68.950325, 16.320997, 14.602487, 11.238523, 12.746463]
2025-08-07 08:05:52,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [63.0, 14.0, 20.0, 15.0, 15.0, 43.0, 24.0, 22.0, 14.0, 26.0]
2025-08-07 08:05:52,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 21 minutes, 41 seconds)
2025-08-07 08:07:27,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:07:27,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 42.45970 ± 77.039
2025-08-07 08:07:27,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [17.69716, 17.065212, 14.479464, 273.17877, 22.189491, 8.8521, 17.76725, 9.806784, 19.99521, 23.56558]
2025-08-07 08:07:27,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [16.0, 17.0, 22.0, 128.0, 21.0, 15.0, 21.0, 12.0, 22.0, 20.0]
2025-08-07 08:07:27,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 19 minutes, 57 seconds)
2025-08-07 08:09:03,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:09:03,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 28.67944 ± 23.475
2025-08-07 08:09:03,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [29.177475, 17.934317, 21.481674, 22.916166, 77.894424, 15.03689, 11.928574, 70.2645, 10.39561, 9.764737]
2025-08-07 08:09:03,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [24.0, 20.0, 23.0, 21.0, 54.0, 20.0, 16.0, 47.0, 15.0, 23.0]
2025-08-07 08:09:03,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 18 minutes, 17 seconds)
2025-08-07 08:10:39,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:10:39,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 60.00059 ± 78.653
2025-08-07 08:10:39,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [14.785542, 7.477692, 91.09213, 213.22838, 11.989948, 14.370131, 13.554251, 207.20993, 15.302283, 10.995524]
2025-08-07 08:10:39,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [17.0, 9.0, 72.0, 97.0, 14.0, 19.0, 23.0, 150.0, 18.0, 13.0]
2025-08-07 08:10:39,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 16 minutes, 39 seconds)
2025-08-07 08:12:14,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:12:15,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 74.68742 ± 67.947
2025-08-07 08:12:15,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [204.54643, 11.099035, 20.122704, 100.12539, 7.2037125, 107.30793, 17.576622, 163.84438, 103.45296, 11.595014]
2025-08-07 08:12:15,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [123.0, 13.0, 21.0, 75.0, 14.0, 60.0, 17.0, 76.0, 81.0, 18.0]
2025-08-07 08:12:15,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 14 minutes, 53 seconds)
2025-08-07 08:13:51,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:13:51,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 97.71394 ± 66.672
2025-08-07 08:13:51,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [112.47462, 188.83824, 12.120332, 119.775154, 205.9305, 10.2403965, 15.2369585, 139.0724, 86.71132, 86.7395]
2025-08-07 08:13:51,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [64.0, 122.0, 23.0, 86.0, 138.0, 18.0, 17.0, 109.0, 64.0, 65.0]
2025-08-07 08:13:51,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1226 [INFO]: New best (97.71) for latency ExtremeClogL1U23
2025-08-07 08:13:51,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 13 minutes, 34 seconds)
2025-08-07 08:15:26,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:15:27,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 111.18617 ± 92.109
2025-08-07 08:15:27,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [133.72572, 157.26067, 19.51745, 14.68861, 71.80612, 231.71709, 11.777664, 267.6564, 18.558113, 185.15387]
2025-08-07 08:15:27,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [82.0, 130.0, 20.0, 18.0, 46.0, 238.0, 14.0, 237.0, 19.0, 96.0]
2025-08-07 08:15:27,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1226 [INFO]: New best (111.19) for latency ExtremeClogL1U23
2025-08-07 08:15:27,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 11 minutes, 59 seconds)
2025-08-07 08:17:03,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:17:04,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 31.80756 ± 37.873
2025-08-07 08:17:04,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [111.2838, 8.875265, 20.276236, 9.208302, 10.742946, 103.05212, 13.965815, 9.933141, 16.006166, 14.731815]
2025-08-07 08:17:04,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [118.0, 19.0, 25.0, 12.0, 16.0, 70.0, 22.0, 12.0, 18.0, 16.0]
2025-08-07 08:17:04,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 10 minutes, 26 seconds)
2025-08-07 08:18:39,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:18:40,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 84.79857 ± 63.016
2025-08-07 08:18:40,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [129.02916, 8.551479, 97.05762, 187.5246, 17.56062, 124.425644, 116.359825, 11.148585, 143.88248, 12.445596]
2025-08-07 08:18:40,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [100.0, 11.0, 62.0, 86.0, 24.0, 102.0, 78.0, 21.0, 128.0, 17.0]
2025-08-07 08:18:40,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 8 minutes, 49 seconds)
2025-08-07 08:20:14,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:20:15,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 64.91775 ± 79.736
2025-08-07 08:20:15,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [53.65282, 12.212238, 243.90128, 14.153252, 9.343825, 16.874699, 77.1679, 189.83266, 24.03251, 8.006308]
2025-08-07 08:20:15,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [45.0, 18.0, 129.0, 20.0, 15.0, 23.0, 52.0, 96.0, 25.0, 11.0]
2025-08-07 08:20:15,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 7 minutes, 10 seconds)
2025-08-07 08:21:50,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:21:51,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 32.83872 ± 33.960
2025-08-07 08:21:51,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [9.332441, 20.68004, 10.549794, 10.326723, 10.185445, 9.905034, 108.20308, 11.32346, 73.423836, 64.457375]
2025-08-07 08:21:51,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [13.0, 20.0, 14.0, 14.0, 16.0, 15.0, 66.0, 23.0, 58.0, 45.0]
2025-08-07 08:21:51,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 5 minutes, 30 seconds)
2025-08-07 08:23:26,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:23:26,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 35.53217 ± 41.154
2025-08-07 08:23:26,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [15.762567, 10.628694, 16.951895, 17.4182, 145.71056, 20.700628, 77.28246, 9.587835, 16.561466, 24.71745]
2025-08-07 08:23:26,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [21.0, 15.0, 22.0, 19.0, 111.0, 52.0, 50.0, 12.0, 20.0, 23.0]
2025-08-07 08:23:26,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 3 minutes, 50 seconds)
2025-08-07 08:25:02,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:25:03,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 60.57374 ± 71.376
2025-08-07 08:25:03,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [75.4558, 242.1552, 18.522915, 11.037299, 107.95044, 7.961401, 15.127542, 102.88468, 10.703438, 13.938678]
2025-08-07 08:25:03,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [60.0, 190.0, 22.0, 14.0, 57.0, 15.0, 17.0, 58.0, 13.0, 15.0]
2025-08-07 08:25:03,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 2 minutes, 17 seconds)
2025-08-07 08:26:38,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:26:38,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 23.83994 ± 39.064
2025-08-07 08:26:38,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [11.322881, 8.805371, 8.525589, 14.444354, 8.814468, 12.27482, 140.78624, 14.472408, 12.473672, 6.4795403]
2025-08-07 08:26:38,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [23.0, 14.0, 11.0, 16.0, 11.0, 18.0, 93.0, 16.0, 16.0, 12.0]
2025-08-07 08:26:38,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 36 seconds)
2025-08-07 08:28:14,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:28:14,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 60.02924 ± 62.711
2025-08-07 08:28:14,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [118.05293, 9.117067, 8.360647, 12.199767, 105.78112, 127.90362, 17.209103, 7.82878, 11.976862, 181.86247]
2025-08-07 08:28:14,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [85.0, 18.0, 10.0, 16.0, 71.0, 96.0, 29.0, 11.0, 25.0, 99.0]
2025-08-07 08:28:14,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 64/100 (estimated time remaining: 59 minutes, 9 seconds)
2025-08-07 08:29:50,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:29:50,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 42.27840 ± 56.449
2025-08-07 08:29:50,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [201.27414, 41.403137, 19.175875, 12.560413, 17.591125, 15.336053, 13.655394, 76.9592, 8.741212, 16.087482]
2025-08-07 08:29:50,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [157.0, 67.0, 21.0, 14.0, 21.0, 16.0, 14.0, 44.0, 13.0, 18.0]
2025-08-07 08:29:50,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 65/100 (estimated time remaining: 57 minutes, 31 seconds)
2025-08-07 08:31:26,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:31:26,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 99.07698 ± 104.380
2025-08-07 08:31:26,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [47.99626, 253.1413, 102.472824, 85.13005, 8.6583185, 7.5561495, 16.998379, 130.01993, 14.0517025, 324.7449]
2025-08-07 08:31:26,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [43.0, 135.0, 92.0, 56.0, 19.0, 12.0, 22.0, 71.0, 18.0, 295.0]
2025-08-07 08:31:27,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 66/100 (estimated time remaining: 56 minutes, 1 second)
2025-08-07 08:33:02,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:33:02,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 51.54484 ± 79.056
2025-08-07 08:33:02,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [8.325171, 14.131726, 16.618513, 7.261682, 12.24104, 217.8215, 200.95267, 11.633185, 11.494401, 14.968523]
2025-08-07 08:33:02,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [10.0, 21.0, 23.0, 9.0, 15.0, 123.0, 106.0, 15.0, 15.0, 20.0]
2025-08-07 08:33:02,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 67/100 (estimated time remaining: 54 minutes, 22 seconds)
2025-08-07 08:34:38,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:34:39,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 32.53125 ± 35.514
2025-08-07 08:34:39,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [12.140287, 18.71608, 14.124756, 92.675224, 9.848508, 23.879381, 112.549065, 14.460389, 14.785885, 12.132925]
2025-08-07 08:34:39,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [21.0, 21.0, 18.0, 79.0, 13.0, 24.0, 96.0, 17.0, 17.0, 20.0]
2025-08-07 08:34:39,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 68/100 (estimated time remaining: 52 minutes, 52 seconds)
2025-08-07 08:36:14,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:36:15,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 36.37180 ± 45.577
2025-08-07 08:36:15,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [86.64816, 6.5774827, 11.740231, 59.78527, 10.326587, 149.08823, 9.96081, 12.220546, 7.1672406, 10.203442]
2025-08-07 08:36:15,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [51.0, 8.0, 18.0, 39.0, 11.0, 115.0, 12.0, 14.0, 15.0, 14.0]
2025-08-07 08:36:15,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 69/100 (estimated time remaining: 51 minutes, 13 seconds)
2025-08-07 08:37:51,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:37:51,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 95.61975 ± 92.447
2025-08-07 08:37:51,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [15.167822, 19.056969, 24.369318, 8.687275, 233.15157, 96.518684, 13.129259, 168.1132, 111.28446, 266.71896]
2025-08-07 08:37:51,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [22.0, 21.0, 24.0, 19.0, 116.0, 53.0, 15.0, 108.0, 87.0, 189.0]
2025-08-07 08:37:52,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 70/100 (estimated time remaining: 49 minutes, 44 seconds)
2025-08-07 08:39:27,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:39:27,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 28.09606 ± 26.231
2025-08-07 08:39:27,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [21.636822, 13.996931, 12.9392, 19.554913, 86.65909, 14.849774, 10.902393, 72.503876, 8.269882, 19.647728]
2025-08-07 08:39:27,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [21.0, 15.0, 16.0, 23.0, 95.0, 18.0, 24.0, 90.0, 17.0, 20.0]
2025-08-07 08:39:27,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 71/100 (estimated time remaining: 48 minutes, 4 seconds)
2025-08-07 08:41:03,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:41:04,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 81.39025 ± 86.282
2025-08-07 08:41:04,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [8.217485, 12.771116, 87.023315, 69.494064, 10.165329, 252.40666, 142.10919, 209.27461, 10.531085, 11.909694]
2025-08-07 08:41:04,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [17.0, 22.0, 49.0, 51.0, 12.0, 124.0, 83.0, 161.0, 12.0, 17.0]
2025-08-07 08:41:04,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 72/100 (estimated time remaining: 46 minutes, 33 seconds)
2025-08-07 08:42:40,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:42:40,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 37.98301 ± 50.970
2025-08-07 08:42:40,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [13.981759, 7.988536, 12.975844, 12.694423, 9.753081, 129.42256, 149.28078, 9.258216, 15.307859, 19.166985]
2025-08-07 08:42:40,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [25.0, 10.0, 21.0, 19.0, 17.0, 91.0, 129.0, 13.0, 20.0, 25.0]
2025-08-07 08:42:40,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 73/100 (estimated time remaining: 44 minutes, 56 seconds)
2025-08-07 08:44:16,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:44:16,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 68.12780 ± 59.472
2025-08-07 08:44:16,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [140.04248, 135.33351, 93.65767, 11.4054785, 8.209222, 15.123946, 7.898436, 140.5857, 7.466892, 121.554665]
2025-08-07 08:44:16,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [91.0, 104.0, 74.0, 13.0, 10.0, 19.0, 10.0, 86.0, 16.0, 71.0]
2025-08-07 08:44:16,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 74/100 (estimated time remaining: 43 minutes, 20 seconds)
2025-08-07 08:45:53,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:45:53,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 52.25145 ± 59.097
2025-08-07 08:45:53,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [171.15164, 17.309227, 9.351275, 16.544464, 14.0265875, 14.236474, 88.89487, 22.963697, 152.30392, 15.732298]
2025-08-07 08:45:53,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [112.0, 23.0, 15.0, 22.0, 16.0, 18.0, 51.0, 21.0, 87.0, 17.0]
2025-08-07 08:45:53,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 75/100 (estimated time remaining: 41 minutes, 44 seconds)
2025-08-07 08:47:28,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:47:29,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 46.13295 ± 40.966
2025-08-07 08:47:29,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [7.0536427, 19.298948, 112.68118, 19.367455, 62.386005, 6.8416233, 120.030945, 19.920921, 20.111017, 73.63777]
2025-08-07 08:47:29,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [12.0, 24.0, 87.0, 20.0, 62.0, 9.0, 91.0, 19.0, 19.0, 57.0]
2025-08-07 08:47:29,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 76/100 (estimated time remaining: 40 minutes, 7 seconds)
2025-08-07 08:49:05,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:49:05,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 88.13633 ± 87.199
2025-08-07 08:49:05,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [12.039085, 260.0214, 116.60858, 80.84652, 15.1199665, 210.94011, 11.506978, 15.375304, 15.779807, 143.12556]
2025-08-07 08:49:05,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 135.0, 77.0, 56.0, 24.0, 145.0, 16.0, 20.0, 19.0, 98.0]
2025-08-07 08:49:05,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 77/100 (estimated time remaining: 38 minutes, 30 seconds)
2025-08-07 08:50:41,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:50:42,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 70.34115 ± 78.759
2025-08-07 08:50:42,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [23.845255, 12.478846, 11.715827, 258.02808, 143.37515, 27.152636, 9.925155, 14.386085, 132.59897, 69.90552]
2025-08-07 08:50:42,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [22.0, 20.0, 14.0, 191.0, 103.0, 24.0, 15.0, 16.0, 103.0, 52.0]
2025-08-07 08:50:42,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 78/100 (estimated time remaining: 36 minutes, 54 seconds)
2025-08-07 08:52:17,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:52:17,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 32.66054 ± 28.914
2025-08-07 08:52:17,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [11.37074, 7.4741125, 21.130878, 21.073452, 18.26491, 15.922163, 8.369198, 60.081573, 95.84064, 67.077774]
2025-08-07 08:52:17,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [17.0, 10.0, 26.0, 23.0, 25.0, 17.0, 13.0, 47.0, 62.0, 41.0]
2025-08-07 08:52:17,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 79/100 (estimated time remaining: 35 minutes, 16 seconds)
2025-08-07 08:53:53,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:53:54,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 27.00778 ± 31.051
2025-08-07 08:53:54,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [17.215624, 10.76227, 14.816247, 7.5892982, 8.391433, 87.34351, 9.757447, 13.140932, 10.702292, 90.35873]
2025-08-07 08:53:54,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 17.0, 16.0, 12.0, 12.0, 100.0, 17.0, 17.0, 20.0, 71.0]
2025-08-07 08:53:54,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 80/100 (estimated time remaining: 33 minutes, 38 seconds)
2025-08-07 08:55:29,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:55:30,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 13.56515 ± 4.632
2025-08-07 08:55:30,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [8.178134, 7.8606577, 21.413597, 16.673693, 9.464744, 11.422827, 12.998738, 12.0248, 21.09556, 14.518782]
2025-08-07 08:55:30,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [12.0, 12.0, 19.0, 23.0, 17.0, 17.0, 25.0, 13.0, 26.0, 18.0]
2025-08-07 08:55:30,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 81/100 (estimated time remaining: 32 minutes, 3 seconds)
2025-08-07 08:57:05,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:57:05,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 22.29399 ± 19.912
2025-08-07 08:57:05,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [12.743172, 12.744956, 13.043262, 12.956785, 21.69719, 19.634552, 80.41012, 7.6578317, 19.126911, 22.925083]
2025-08-07 08:57:05,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [16.0, 22.0, 19.0, 19.0, 21.0, 21.0, 54.0, 12.0, 20.0, 25.0]
2025-08-07 08:57:05,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 82/100 (estimated time remaining: 30 minutes, 22 seconds)
2025-08-07 08:58:41,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:58:42,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 46.98523 ± 54.993
2025-08-07 08:58:42,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [9.340213, 133.44777, 114.477936, 14.480487, 11.037555, 11.054513, 143.175, 12.005506, 9.116389, 11.716949]
2025-08-07 08:58:42,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [13.0, 116.0, 84.0, 16.0, 14.0, 20.0, 77.0, 17.0, 11.0, 13.0]
2025-08-07 08:58:42,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 83/100 (estimated time remaining: 28 minutes, 48 seconds)
2025-08-07 09:00:17,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:00:17,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 63.73798 ± 79.890
2025-08-07 09:00:17,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [10.8258505, 201.88036, 11.937034, 16.425238, 20.945236, 11.1942005, 119.729065, 15.016691, 10.49771, 218.92844]
2025-08-07 09:00:17,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [15.0, 89.0, 19.0, 24.0, 24.0, 13.0, 83.0, 18.0, 15.0, 150.0]
2025-08-07 09:00:17,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 84/100 (estimated time remaining: 27 minutes, 11 seconds)
2025-08-07 09:01:52,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:01:53,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 76.52582 ± 77.943
2025-08-07 09:01:53,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [18.370821, 176.82385, 24.270567, 165.10555, 208.07512, 126.55456, 10.455475, 10.95371, 9.498861, 15.149698]
2025-08-07 09:01:53,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [24.0, 112.0, 24.0, 81.0, 112.0, 96.0, 13.0, 19.0, 13.0, 16.0]
2025-08-07 09:01:53,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 85/100 (estimated time remaining: 25 minutes, 34 seconds)
2025-08-07 09:03:30,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:03:30,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 40.85022 ± 56.112
2025-08-07 09:03:30,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [161.83388, 16.166527, 9.475543, 143.12627, 9.729612, 22.348368, 12.719732, 11.846795, 7.15273, 14.102751]
2025-08-07 09:03:30,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [143.0, 24.0, 14.0, 70.0, 15.0, 24.0, 22.0, 15.0, 11.0, 16.0]
2025-08-07 09:03:30,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 86/100 (estimated time remaining: 24 minutes, 1 second)
2025-08-07 09:05:05,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:05:06,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 88.41248 ± 64.815
2025-08-07 09:05:06,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [136.46883, 113.6871, 139.0599, 109.77727, 19.342743, 10.174799, 200.79074, 126.31612, 20.700974, 7.806269]
2025-08-07 09:05:06,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [70.0, 71.0, 102.0, 73.0, 19.0, 14.0, 162.0, 105.0, 25.0, 15.0]
2025-08-07 09:05:06,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 87/100 (estimated time remaining: 22 minutes, 27 seconds)
2025-08-07 09:06:42,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:06:42,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 64.07410 ± 62.699
2025-08-07 09:06:42,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [113.56394, 10.534377, 26.183392, 97.99468, 18.54148, 12.060341, 130.52846, 20.386263, 197.33966, 13.608413]
2025-08-07 09:06:42,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [81.0, 22.0, 25.0, 64.0, 23.0, 16.0, 97.0, 21.0, 116.0, 17.0]
2025-08-07 09:06:42,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 88/100 (estimated time remaining: 20 minutes, 49 seconds)
2025-08-07 09:08:18,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:08:19,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 45.86061 ± 64.508
2025-08-07 09:08:19,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [8.806077, 18.669044, 18.086449, 199.48776, 145.3113, 12.0887985, 14.629619, 8.708198, 14.610317, 18.208534]
2025-08-07 09:08:19,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [18.0, 21.0, 20.0, 99.0, 70.0, 17.0, 21.0, 17.0, 16.0, 20.0]
2025-08-07 09:08:19,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 89/100 (estimated time remaining: 19 minutes, 16 seconds)
2025-08-07 09:09:54,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:09:55,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 70.63513 ± 118.761
2025-08-07 09:09:55,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [197.60617, 9.38729, 8.955599, 387.02826, 19.083738, 14.147943, 21.55255, 11.536861, 12.191246, 24.86172]
2025-08-07 09:09:55,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [99.0, 22.0, 11.0, 201.0, 22.0, 16.0, 22.0, 19.0, 14.0, 24.0]
2025-08-07 09:09:55,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 90/100 (estimated time remaining: 17 minutes, 39 seconds)
2025-08-07 09:11:29,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:11:30,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 75.68742 ± 89.804
2025-08-07 09:11:30,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [19.393951, 12.139835, 13.440096, 107.94738, 102.63247, 9.068805, 173.8763, 10.295866, 290.35995, 17.719534]
2025-08-07 09:11:30,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [20.0, 18.0, 19.0, 80.0, 66.0, 14.0, 119.0, 15.0, 164.0, 21.0]
2025-08-07 09:11:30,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 91/100 (estimated time remaining: 16 minutes)
2025-08-07 09:13:06,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:13:06,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 44.10632 ± 45.173
2025-08-07 09:13:06,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [110.02959, 12.928178, 14.721421, 11.489965, 15.650917, 15.390229, 10.741424, 21.52052, 119.316925, 109.273964]
2025-08-07 09:13:06,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [74.0, 17.0, 18.0, 15.0, 21.0, 18.0, 14.0, 25.0, 95.0, 83.0]
2025-08-07 09:13:06,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 92/100 (estimated time remaining: 14 minutes, 24 seconds)
2025-08-07 09:14:42,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:14:43,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 84.62776 ± 93.213
2025-08-07 09:14:43,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [11.228566, 293.44418, 11.912119, 167.09987, 25.629253, 146.30212, 12.740262, 146.53696, 18.620565, 12.763703]
2025-08-07 09:14:43,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [14.0, 118.0, 13.0, 110.0, 22.0, 111.0, 21.0, 80.0, 22.0, 22.0]
2025-08-07 09:14:43,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 93/100 (estimated time remaining: 12 minutes, 48 seconds)
2025-08-07 09:16:18,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:16:19,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 102.37566 ± 102.172
2025-08-07 09:16:19,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [205.41727, 8.856314, 14.994085, 311.78308, 110.355995, 131.88089, 14.361818, 14.43725, 13.362363, 198.30754]
2025-08-07 09:16:19,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [156.0, 16.0, 18.0, 153.0, 73.0, 85.0, 17.0, 15.0, 23.0, 115.0]
2025-08-07 09:16:19,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 94/100 (estimated time remaining: 11 minutes, 12 seconds)
2025-08-07 09:17:54,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:17:55,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 46.23396 ± 61.769
2025-08-07 09:17:55,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [12.890823, 94.09173, 12.675138, 13.292337, 9.789785, 17.212753, 12.115199, 212.09814, 10.860153, 67.313576]
2025-08-07 09:17:55,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [14.0, 73.0, 17.0, 18.0, 13.0, 21.0, 17.0, 126.0, 16.0, 46.0]
2025-08-07 09:17:55,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 95/100 (estimated time remaining: 9 minutes, 36 seconds)
2025-08-07 09:19:31,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:19:32,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 42.44978 ± 64.426
2025-08-07 09:19:32,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [8.914896, 183.4924, 157.9314, 7.318073, 10.396696, 13.356956, 12.533798, 9.561703, 14.1281185, 6.863693]
2025-08-07 09:19:32,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [12.0, 131.0, 141.0, 18.0, 17.0, 16.0, 17.0, 14.0, 19.0, 16.0]
2025-08-07 09:19:32,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 1 second)
2025-08-07 09:21:07,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:21:08,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 54.22691 ± 53.082
2025-08-07 09:21:08,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [10.6884165, 15.676533, 13.676771, 131.23517, 148.24463, 14.74085, 110.63741, 17.004978, 9.681639, 70.682686]
2025-08-07 09:21:08,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [14.0, 20.0, 16.0, 81.0, 120.0, 16.0, 66.0, 21.0, 12.0, 131.0]
2025-08-07 09:21:08,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 25 seconds)
2025-08-07 09:22:44,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:22:44,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 26.55756 ± 36.777
2025-08-07 09:22:44,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [18.972317, 11.30904, 9.311303, 20.723404, 9.790452, 12.2214575, 19.742807, 136.24219, 15.340584, 11.921993]
2025-08-07 09:22:44,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [20.0, 14.0, 13.0, 25.0, 22.0, 24.0, 20.0, 102.0, 18.0, 18.0]
2025-08-07 09:22:44,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 98/100 (estimated time remaining: 4 minutes, 48 seconds)
2025-08-07 09:24:20,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:24:20,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 34.21573 ± 33.380
2025-08-07 09:24:20,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [12.542417, 94.64931, 18.57961, 7.6792974, 13.493185, 13.523399, 8.938557, 73.45364, 85.419975, 13.87793]
2025-08-07 09:24:20,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [16.0, 73.0, 20.0, 12.0, 15.0, 16.0, 12.0, 47.0, 59.0, 18.0]
2025-08-07 09:24:20,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 12 seconds)
2025-08-07 09:25:55,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:25:56,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 48.94939 ± 59.396
2025-08-07 09:25:56,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [9.533997, 17.938286, 9.633187, 154.84962, 8.622031, 27.49001, 67.93086, 11.057144, 12.013903, 170.42488]
2025-08-07 09:25:56,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [17.0, 19.0, 14.0, 119.0, 11.0, 24.0, 43.0, 16.0, 14.0, 81.0]
2025-08-07 09:25:56,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 36 seconds)
2025-08-07 09:27:32,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:27:32,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 25.04671 ± 36.187
2025-08-07 09:27:32,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [14.377892, 20.48545, 8.900713, 132.4601, 13.634886, 10.708472, 10.245526, 8.710977, 6.605555, 24.33754]
2025-08-07 09:27:32,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [18.0, 18.0, 21.0, 68.0, 16.0, 13.0, 24.0, 13.0, 12.0, 25.0]
2025-08-07 09:27:32,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1251 [DEBUG]: Training session finished
