2025-08-07 06:42:46,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc20-hopper/ExtremeClogL1U23-bpql-mem24
2025-08-07 06:42:46,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc20-hopper/ExtremeClogL1U23-bpql-mem24
2025-08-07 06:42:46,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x14d91eab1550>}
2025-08-07 06:42:46,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1111 [DEBUG]: using device: cuda
2025-08-07 06:42:46,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1133 [INFO]: Creating new trainer
2025-08-07 06:42:46,671 baseline-bpql-noiseperc20-hopper:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=83, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-08-07 06:42:46,672 baseline-bpql-noiseperc20-hopper:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 06:42:47,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1194 [DEBUG]: Starting training session...
2025-08-07 06:42:47,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 1/100
2025-08-07 06:44:16,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:44:17,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 33.61332 ± 20.732
2025-08-07 06:44:17,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [10.344272, 17.421461, 51.29804, 55.478516, 9.461879, 66.023285, 11.13132, 40.509354, 52.897552, 21.567545]
2025-08-07 06:44:17,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [15.0, 18.0, 51.0, 37.0, 11.0, 50.0, 15.0, 50.0, 45.0, 25.0]
2025-08-07 06:44:17,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1226 [INFO]: New best (33.61) for latency ExtremeClogL1U23
2025-08-07 06:44:17,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 27 minutes, 58 seconds)
2025-08-07 06:45:52,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:45:53,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 51.20019 ± 59.132
2025-08-07 06:45:53,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [10.772293, 91.851875, 40.80383, 37.03727, 16.15318, 208.56635, 73.74774, 8.425927, 15.220486, 9.422973]
2025-08-07 06:45:53,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 61.0, 38.0, 67.0, 17.0, 139.0, 59.0, 14.0, 18.0, 13.0]
2025-08-07 06:45:53,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1226 [INFO]: New best (51.20) for latency ExtremeClogL1U23
2025-08-07 06:45:53,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 31 minutes, 51 seconds)
2025-08-07 06:47:29,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:47:30,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 38.74805 ± 37.968
2025-08-07 06:47:30,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [16.275368, 8.3567095, 12.123099, 24.439342, 57.81409, 129.02324, 85.81802, 19.291845, 21.335903, 13.00292]
2025-08-07 06:47:30,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [18.0, 11.0, 22.0, 21.0, 56.0, 85.0, 58.0, 23.0, 21.0, 20.0]
2025-08-07 06:47:30,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 32 minutes, 25 seconds)
2025-08-07 06:49:07,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:49:07,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 46.34232 ± 41.312
2025-08-07 06:49:07,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [80.30497, 18.820747, 89.31178, 7.477446, 11.373171, 9.89529, 90.19486, 121.272194, 13.282073, 21.490698]
2025-08-07 06:49:07,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [72.0, 22.0, 67.0, 9.0, 19.0, 18.0, 57.0, 73.0, 20.0, 22.0]
2025-08-07 06:49:07,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 32 minutes, 5 seconds)
2025-08-07 06:50:43,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:50:44,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 16.77026 ± 7.913
2025-08-07 06:50:44,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [14.452825, 11.176596, 38.280464, 17.374065, 16.510225, 17.376892, 17.507084, 8.309185, 9.445562, 17.26969]
2025-08-07 06:50:44,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [16.0, 17.0, 37.0, 21.0, 19.0, 21.0, 19.0, 23.0, 20.0, 19.0]
2025-08-07 06:50:44,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 30 minutes, 53 seconds)
2025-08-07 06:52:21,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:52:21,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 45.43688 ± 52.609
2025-08-07 06:52:21,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [99.690674, 8.610196, 13.794978, 11.687008, 17.906857, 11.201337, 10.277066, 154.73007, 115.638596, 10.831982]
2025-08-07 06:52:21,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [61.0, 15.0, 15.0, 17.0, 19.0, 23.0, 22.0, 120.0, 75.0, 15.0]
2025-08-07 06:52:22,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 31 minutes, 52 seconds)
2025-08-07 06:53:59,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:54:00,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 48.99570 ± 44.938
2025-08-07 06:54:00,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [18.3813, 14.54682, 13.886611, 104.88266, 97.874695, 122.21833, 87.253006, 12.419086, 10.302983, 8.191514]
2025-08-07 06:54:00,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 19.0, 22.0, 62.0, 76.0, 97.0, 69.0, 15.0, 12.0, 16.0]
2025-08-07 06:54:00,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 30 minutes, 52 seconds)
2025-08-07 06:55:36,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:55:36,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 44.37201 ± 45.524
2025-08-07 06:55:36,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [30.630983, 11.056483, 14.202021, 121.950516, 12.724427, 19.525694, 12.195808, 79.47421, 11.00099, 130.95898]
2025-08-07 06:55:36,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [24.0, 17.0, 19.0, 65.0, 16.0, 24.0, 17.0, 56.0, 18.0, 84.0]
2025-08-07 06:55:36,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 29 minutes, 7 seconds)
2025-08-07 06:57:13,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:57:14,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 51.87180 ± 43.475
2025-08-07 06:57:14,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [67.43795, 99.69302, 16.112131, 27.789656, 19.971443, 46.362522, 17.86639, 18.63029, 157.61852, 47.23608]
2025-08-07 06:57:14,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [46.0, 64.0, 20.0, 41.0, 26.0, 66.0, 21.0, 20.0, 124.0, 44.0]
2025-08-07 06:57:14,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1226 [INFO]: New best (51.87) for latency ExtremeClogL1U23
2025-08-07 06:57:14,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 27 minutes, 36 seconds)
2025-08-07 06:58:51,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:58:51,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 51.93211 ± 62.413
2025-08-07 06:58:51,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [24.06184, 226.93881, 45.419422, 48.29459, 11.313995, 12.587767, 13.470812, 87.08634, 16.756039, 33.39141]
2025-08-07 06:58:51,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [24.0, 169.0, 41.0, 49.0, 17.0, 20.0, 18.0, 60.0, 21.0, 28.0]
2025-08-07 06:58:51,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1226 [INFO]: New best (51.93) for latency ExtremeClogL1U23
2025-08-07 06:58:51,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 26 minutes, 21 seconds)
2025-08-07 07:00:28,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:00:29,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 20.84643 ± 21.769
2025-08-07 07:00:29,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [10.4399185, 11.676576, 85.76643, 15.394066, 14.8489485, 11.718488, 19.250193, 14.047465, 12.593425, 12.728775]
2025-08-07 07:00:29,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [22.0, 20.0, 58.0, 21.0, 18.0, 17.0, 25.0, 21.0, 16.0, 21.0]
2025-08-07 07:00:29,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 24 minutes, 33 seconds)
2025-08-07 07:02:06,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:02:07,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 93.51841 ± 80.961
2025-08-07 07:02:07,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [85.52776, 16.9557, 126.81295, 14.474834, 290.22635, 136.7524, 17.472641, 109.32079, 115.98803, 21.652563]
2025-08-07 07:02:07,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [84.0, 22.0, 101.0, 16.0, 118.0, 80.0, 26.0, 94.0, 103.0, 21.0]
2025-08-07 07:02:07,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1226 [INFO]: New best (93.52) for latency ExtremeClogL1U23
2025-08-07 07:02:07,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 22 minutes, 50 seconds)
2025-08-07 07:03:44,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:03:44,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 47.27689 ± 57.075
2025-08-07 07:03:44,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [15.909596, 138.98196, 18.69394, 16.09767, 13.916923, 12.795014, 13.095438, 175.3682, 58.039658, 9.8705225]
2025-08-07 07:03:44,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [18.0, 84.0, 24.0, 18.0, 23.0, 16.0, 22.0, 122.0, 69.0, 15.0]
2025-08-07 07:03:44,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 21 minutes, 32 seconds)
2025-08-07 07:05:22,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:05:23,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 95.74164 ± 95.036
2025-08-07 07:05:23,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [13.344399, 182.093, 252.67404, 37.20126, 101.978714, 17.358772, 261.50623, 9.837495, 62.163403, 19.259123]
2025-08-07 07:05:23,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [16.0, 129.0, 169.0, 53.0, 57.0, 24.0, 122.0, 13.0, 43.0, 22.0]
2025-08-07 07:05:23,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1226 [INFO]: New best (95.74) for latency ExtremeClogL1U23
2025-08-07 07:05:23,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 20 minutes, 16 seconds)
2025-08-07 07:07:00,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:07:01,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 36.50170 ± 43.272
2025-08-07 07:07:01,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [14.806238, 12.988677, 19.337402, 135.36992, 108.575, 21.884712, 14.539344, 15.037365, 9.919844, 12.558497]
2025-08-07 07:07:01,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 15.0, 20.0, 82.0, 78.0, 24.0, 15.0, 24.0, 16.0, 23.0]
2025-08-07 07:07:01,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 18 minutes, 39 seconds)
2025-08-07 07:08:38,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:08:38,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 52.82984 ± 48.180
2025-08-07 07:08:38,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [54.11531, 101.91516, 14.149237, 15.192489, 29.115742, 8.718725, 18.553232, 107.03788, 155.10329, 24.397326]
2025-08-07 07:08:38,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [42.0, 83.0, 18.0, 22.0, 29.0, 11.0, 24.0, 69.0, 100.0, 24.0]
2025-08-07 07:08:38,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 17 minutes, 3 seconds)
2025-08-07 07:10:16,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:10:17,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 30.45314 ± 27.849
2025-08-07 07:10:17,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [33.33451, 34.66962, 19.267801, 9.0964, 32.934444, 28.936867, 9.615976, 14.244705, 108.9133, 13.517795]
2025-08-07 07:10:17,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [54.0, 36.0, 28.0, 17.0, 33.0, 51.0, 12.0, 22.0, 79.0, 16.0]
2025-08-07 07:10:17,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 15 minutes, 33 seconds)
2025-08-07 07:11:53,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:11:54,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 52.32295 ± 42.472
2025-08-07 07:11:54,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [24.74452, 12.346504, 15.625331, 45.672005, 11.917069, 25.019646, 44.430347, 127.447296, 108.98117, 107.045586]
2025-08-07 07:11:54,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [24.0, 18.0, 18.0, 50.0, 15.0, 25.0, 80.0, 82.0, 71.0, 57.0]
2025-08-07 07:11:54,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 13 minutes, 43 seconds)
2025-08-07 07:13:31,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:13:32,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 44.75869 ± 50.443
2025-08-07 07:13:32,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [146.84027, 12.49462, 135.8274, 14.086592, 12.846027, 12.370464, 61.554226, 10.890017, 14.873481, 25.803768]
2025-08-07 07:13:32,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [133.0, 19.0, 91.0, 16.0, 15.0, 15.0, 63.0, 15.0, 17.0, 25.0]
2025-08-07 07:13:32,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 11 minutes, 51 seconds)
2025-08-07 07:15:09,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:15:09,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 24.11075 ± 20.788
2025-08-07 07:15:09,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [15.610046, 81.50893, 22.647139, 14.893692, 12.942269, 9.955704, 12.954965, 14.406211, 40.1786, 16.009985]
2025-08-07 07:15:09,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [17.0, 60.0, 23.0, 18.0, 19.0, 11.0, 15.0, 17.0, 88.0, 17.0]
2025-08-07 07:15:09,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 10 minutes, 12 seconds)
2025-08-07 07:16:47,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:16:47,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 25.10034 ± 19.763
2025-08-07 07:16:47,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [65.73499, 17.306313, 14.328014, 15.284956, 63.09825, 14.119748, 12.747691, 12.330864, 17.170832, 18.881678]
2025-08-07 07:16:47,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [46.0, 21.0, 22.0, 20.0, 50.0, 21.0, 17.0, 18.0, 18.0, 18.0]
2025-08-07 07:16:47,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 8 minutes, 47 seconds)
2025-08-07 07:18:27,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:18:28,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 71.04311 ± 109.235
2025-08-07 07:18:28,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [11.423783, 215.79495, 39.02009, 22.577639, 346.1919, 21.2021, 13.606088, 12.174214, 17.910025, 10.530248]
2025-08-07 07:18:28,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [20.0, 239.0, 69.0, 26.0, 381.0, 23.0, 18.0, 16.0, 29.0, 13.0]
2025-08-07 07:18:28,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 7 minutes, 42 seconds)
2025-08-07 07:20:03,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:20:03,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 46.95550 ± 42.270
2025-08-07 07:20:03,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [18.974836, 13.275607, 16.917938, 92.98148, 55.21039, 20.300411, 115.40023, 13.705169, 115.91135, 6.8776193]
2025-08-07 07:20:03,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [25.0, 15.0, 18.0, 80.0, 55.0, 24.0, 61.0, 19.0, 80.0, 12.0]
2025-08-07 07:20:03,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 5 minutes, 38 seconds)
2025-08-07 07:21:41,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:21:42,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 78.25713 ± 70.775
2025-08-07 07:21:42,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [16.01697, 155.81218, 8.8524475, 207.64825, 136.87617, 105.01408, 11.952378, 115.69358, 10.601999, 14.103121]
2025-08-07 07:21:42,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [21.0, 99.0, 12.0, 115.0, 86.0, 71.0, 20.0, 93.0, 12.0, 15.0]
2025-08-07 07:21:42,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 4 minutes, 7 seconds)
2025-08-07 07:23:19,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:23:19,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 60.56395 ± 57.180
2025-08-07 07:23:19,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [151.44545, 17.218895, 18.364859, 146.05708, 12.689795, 111.981, 20.393677, 12.530953, 9.564183, 105.39364]
2025-08-07 07:23:19,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [104.0, 20.0, 20.0, 107.0, 24.0, 90.0, 19.0, 25.0, 11.0, 73.0]
2025-08-07 07:23:19,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 2 minutes, 34 seconds)
2025-08-07 07:24:57,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:24:58,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 54.77670 ± 49.527
2025-08-07 07:24:58,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [151.76753, 18.435835, 22.86031, 15.266206, 91.16814, 18.325436, 79.5149, 12.740327, 15.019696, 122.668625]
2025-08-07 07:24:58,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [134.0, 18.0, 26.0, 21.0, 70.0, 19.0, 62.0, 23.0, 19.0, 88.0]
2025-08-07 07:24:58,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 1 minute)
2025-08-07 07:26:37,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:26:38,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 94.07816 ± 93.815
2025-08-07 07:26:38,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [13.587824, 153.04286, 14.253804, 97.96896, 225.9996, 112.83492, 12.916721, 280.4285, 20.096403, 9.652066]
2025-08-07 07:26:38,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [23.0, 96.0, 17.0, 74.0, 192.0, 86.0, 43.0, 166.0, 19.0, 13.0]
2025-08-07 07:26:38,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 28/100 (estimated time remaining: 1 hour, 59 minutes, 19 seconds)
2025-08-07 07:28:15,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:28:16,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 80.20270 ± 79.134
2025-08-07 07:28:16,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [242.69972, 19.959017, 160.93982, 16.798899, 28.569172, 15.137505, 15.53748, 136.8235, 145.02518, 20.53673]
2025-08-07 07:28:16,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [182.0, 22.0, 130.0, 19.0, 57.0, 17.0, 16.0, 88.0, 116.0, 30.0]
2025-08-07 07:28:16,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 29/100 (estimated time remaining: 1 hour, 58 minutes, 16 seconds)
2025-08-07 07:29:55,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:29:55,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 45.78127 ± 47.486
2025-08-07 07:29:55,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [26.92015, 127.23363, 14.101725, 9.971445, 114.31933, 9.085136, 16.14171, 111.73177, 14.182363, 14.125495]
2025-08-07 07:29:55,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [24.0, 101.0, 18.0, 14.0, 104.0, 15.0, 22.0, 76.0, 24.0, 24.0]
2025-08-07 07:29:55,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 30/100 (estimated time remaining: 1 hour, 56 minutes, 49 seconds)
2025-08-07 07:31:34,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:31:34,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 59.86871 ± 58.552
2025-08-07 07:31:34,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [87.074165, 16.62824, 143.61128, 14.923561, 158.08018, 16.701511, 13.206685, 9.550803, 125.30547, 13.605241]
2025-08-07 07:31:34,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [93.0, 19.0, 85.0, 19.0, 82.0, 20.0, 25.0, 14.0, 102.0, 18.0]
2025-08-07 07:31:34,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 55 minutes, 28 seconds)
2025-08-07 07:33:12,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:33:12,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 59.67152 ± 68.417
2025-08-07 07:33:12,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [143.03587, 19.108088, 12.431449, 14.251506, 10.923205, 131.24658, 206.6476, 20.294004, 22.710289, 16.066654]
2025-08-07 07:33:12,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [90.0, 23.0, 19.0, 20.0, 15.0, 105.0, 109.0, 25.0, 22.0, 19.0]
2025-08-07 07:33:12,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 53 minutes, 42 seconds)
2025-08-07 07:34:49,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:34:50,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 93.19783 ± 81.016
2025-08-07 07:34:50,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [162.63971, 25.845861, 132.25883, 16.275963, 8.530992, 163.21095, 197.45592, 204.16628, 11.321503, 10.272262]
2025-08-07 07:34:50,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [117.0, 23.0, 102.0, 19.0, 11.0, 94.0, 110.0, 114.0, 19.0, 14.0]
2025-08-07 07:34:50,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 51 minutes, 25 seconds)
2025-08-07 07:36:26,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:36:27,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 65.73353 ± 47.150
2025-08-07 07:36:27,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [114.547676, 133.9427, 98.71243, 10.475175, 9.773837, 100.08947, 82.21079, 8.933928, 86.11061, 12.538653]
2025-08-07 07:36:27,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [86.0, 77.0, 56.0, 21.0, 14.0, 80.0, 64.0, 11.0, 52.0, 14.0]
2025-08-07 07:36:27,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 49 minutes, 43 seconds)
2025-08-07 07:38:04,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:38:05,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 93.70475 ± 65.492
2025-08-07 07:38:05,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [119.51304, 188.51183, 135.92586, 107.056305, 58.823185, 12.685466, 14.467211, 198.33391, 14.812364, 86.918335]
2025-08-07 07:38:05,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [75.0, 122.0, 82.0, 169.0, 53.0, 23.0, 19.0, 91.0, 16.0, 61.0]
2025-08-07 07:38:05,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 47 minutes, 45 seconds)
2025-08-07 07:39:42,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:39:43,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 63.53444 ± 45.868
2025-08-07 07:39:43,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [12.29106, 134.2676, 123.79731, 66.03966, 80.214775, 14.1702385, 12.493529, 110.76551, 64.87546, 16.429285]
2025-08-07 07:39:43,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 78.0, 88.0, 49.0, 48.0, 18.0, 15.0, 82.0, 50.0, 26.0]
2025-08-07 07:39:43,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 45 minutes, 52 seconds)
2025-08-07 07:41:18,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:41:19,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 31.24563 ± 31.049
2025-08-07 07:41:19,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [106.87383, 24.015871, 17.92104, 75.37299, 11.563225, 8.3336315, 13.363493, 17.037437, 21.167824, 16.80698]
2025-08-07 07:41:19,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [80.0, 23.0, 20.0, 55.0, 16.0, 16.0, 19.0, 21.0, 23.0, 20.0]
2025-08-07 07:41:19,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 43 minutes, 45 seconds)
2025-08-07 07:42:56,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:42:56,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 38.58420 ± 45.724
2025-08-07 07:42:56,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [17.082539, 10.934023, 127.24135, 22.631903, 17.994652, 14.395969, 17.264828, 15.688011, 10.328786, 132.27997]
2025-08-07 07:42:56,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [18.0, 17.0, 74.0, 24.0, 23.0, 20.0, 22.0, 17.0, 14.0, 103.0]
2025-08-07 07:42:56,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 42 minutes, 6 seconds)
2025-08-07 07:44:32,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:44:32,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 78.66343 ± 62.584
2025-08-07 07:44:32,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [8.396717, 114.18166, 14.34678, 11.650177, 94.996475, 98.28753, 204.66562, 79.78447, 17.683226, 142.64163]
2025-08-07 07:44:32,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [13.0, 67.0, 17.0, 13.0, 88.0, 75.0, 92.0, 55.0, 24.0, 83.0]
2025-08-07 07:44:33,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 40 minutes, 18 seconds)
2025-08-07 07:46:08,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:46:08,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 31.62396 ± 37.522
2025-08-07 07:46:08,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [98.97758, 11.891639, 19.057787, 11.328056, 8.968386, 113.35656, 17.145845, 10.152412, 13.62736, 11.73394]
2025-08-07 07:46:08,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [79.0, 17.0, 24.0, 21.0, 15.0, 95.0, 18.0, 12.0, 17.0, 14.0]
2025-08-07 07:46:08,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 38 minutes, 15 seconds)
2025-08-07 07:47:44,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:47:45,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 93.59934 ± 75.994
2025-08-07 07:47:45,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [106.19211, 15.932074, 226.18948, 9.220508, 11.3807, 159.8564, 95.415474, 19.458858, 97.792, 194.55577]
2025-08-07 07:47:45,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [71.0, 16.0, 129.0, 14.0, 15.0, 98.0, 61.0, 18.0, 53.0, 131.0]
2025-08-07 07:47:45,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 36 minutes, 26 seconds)
2025-08-07 07:49:22,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:49:22,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 29.61957 ± 34.729
2025-08-07 07:49:22,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [70.964134, 119.8061, 12.435226, 16.25121, 13.525964, 9.257565, 12.349229, 18.125614, 12.2114725, 11.269225]
2025-08-07 07:49:22,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [41.0, 71.0, 23.0, 17.0, 16.0, 37.0, 15.0, 20.0, 15.0, 22.0]
2025-08-07 07:49:22,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 35 minutes, 8 seconds)
2025-08-07 07:50:58,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:50:59,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 91.21186 ± 107.860
2025-08-07 07:50:59,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [11.40063, 10.994427, 96.201744, 142.904, 16.600252, 95.42539, 127.08928, 14.445635, 378.85712, 18.200125]
2025-08-07 07:50:59,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [13.0, 18.0, 54.0, 108.0, 22.0, 75.0, 104.0, 17.0, 188.0, 18.0]
2025-08-07 07:50:59,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 33 minutes, 18 seconds)
2025-08-07 07:52:33,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:52:34,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 48.08222 ± 38.286
2025-08-07 07:52:34,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [24.597862, 97.651436, 13.656171, 17.57788, 8.702755, 14.748704, 83.801605, 92.18625, 103.61942, 24.28009]
2025-08-07 07:52:34,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [24.0, 75.0, 20.0, 17.0, 11.0, 16.0, 52.0, 66.0, 69.0, 27.0]
2025-08-07 07:52:34,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 31 minutes, 25 seconds)
2025-08-07 07:54:10,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:54:10,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 96.31220 ± 86.399
2025-08-07 07:54:10,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [19.233755, 149.62303, 24.99545, 12.01197, 108.272934, 179.21564, 13.590584, 197.77328, 246.15459, 12.250717]
2025-08-07 07:54:10,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [25.0, 128.0, 30.0, 15.0, 68.0, 147.0, 15.0, 142.0, 130.0, 15.0]
2025-08-07 07:54:10,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1226 [INFO]: New best (96.31) for latency ExtremeClogL1U23
2025-08-07 07:54:11,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 30 minutes, 1 second)
2025-08-07 07:55:46,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:55:46,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 45.45766 ± 51.298
2025-08-07 07:55:46,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [12.923083, 20.370686, 149.9457, 15.395897, 15.760295, 8.876402, 134.61179, 14.973725, 12.006035, 69.71308]
2025-08-07 07:55:46,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 23.0, 111.0, 17.0, 24.0, 11.0, 121.0, 19.0, 18.0, 53.0]
2025-08-07 07:55:46,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 28 minutes, 11 seconds)
2025-08-07 07:57:22,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:57:22,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 81.96388 ± 94.368
2025-08-07 07:57:22,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [11.399449, 19.882158, 93.9045, 217.8721, 16.453743, 291.2404, 14.114678, 115.49974, 22.788456, 16.483553]
2025-08-07 07:57:22,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [13.0, 19.0, 56.0, 142.0, 21.0, 171.0, 23.0, 76.0, 23.0, 19.0]
2025-08-07 07:57:22,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 26 minutes, 22 seconds)
2025-08-07 07:58:57,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:58:58,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 47.87348 ± 51.185
2025-08-07 07:58:58,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [19.064545, 14.638968, 59.51724, 77.911514, 12.405153, 12.181001, 12.536221, 13.038331, 77.926674, 179.5152]
2025-08-07 07:58:58,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 18.0, 38.0, 51.0, 17.0, 21.0, 16.0, 21.0, 50.0, 97.0]
2025-08-07 07:58:58,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 24 minutes, 37 seconds)
2025-08-07 08:00:33,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:00:34,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 92.57610 ± 70.568
2025-08-07 08:00:34,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [19.413132, 171.9443, 19.630684, 60.749985, 129.068, 196.85832, 145.3544, 16.88283, 156.76816, 9.091227]
2025-08-07 08:00:34,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 89.0, 25.0, 45.0, 65.0, 126.0, 114.0, 23.0, 104.0, 13.0]
2025-08-07 08:00:34,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 23 minutes, 14 seconds)
2025-08-07 08:02:10,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:02:11,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 91.51676 ± 89.631
2025-08-07 08:02:11,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [104.17963, 75.93462, 321.21307, 18.488491, 145.19366, 103.148476, 14.950946, 108.242935, 10.089952, 13.725842]
2025-08-07 08:02:11,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [75.0, 50.0, 139.0, 21.0, 111.0, 69.0, 18.0, 78.0, 14.0, 16.0]
2025-08-07 08:02:11,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 21 minutes, 36 seconds)
2025-08-07 08:03:46,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:03:47,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 96.21443 ± 77.356
2025-08-07 08:03:47,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [239.74796, 16.26949, 95.90695, 76.45577, 160.04356, 12.275107, 21.267096, 166.72563, 14.03988, 159.41287]
2025-08-07 08:03:47,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [119.0, 22.0, 103.0, 48.0, 125.0, 15.0, 22.0, 90.0, 23.0, 136.0]
2025-08-07 08:03:47,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 20 minutes, 11 seconds)
2025-08-07 08:05:22,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:05:23,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 52.64708 ± 60.525
2025-08-07 08:05:23,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [22.721228, 66.2844, 21.253166, 13.448838, 14.985861, 9.110257, 25.186127, 157.93385, 180.84897, 14.698145]
2025-08-07 08:05:23,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [23.0, 52.0, 22.0, 24.0, 21.0, 13.0, 23.0, 77.0, 128.0, 18.0]
2025-08-07 08:05:23,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 18 minutes, 28 seconds)
2025-08-07 08:06:58,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:06:59,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 72.36539 ± 63.165
2025-08-07 08:06:59,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [18.11447, 147.21245, 113.17018, 14.622845, 38.50678, 22.752228, 152.75009, 25.130558, 176.2094, 15.184939]
2025-08-07 08:06:59,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 111.0, 72.0, 15.0, 33.0, 21.0, 98.0, 23.0, 157.0, 21.0]
2025-08-07 08:06:59,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 17 minutes)
2025-08-07 08:08:35,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:08:35,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 37.47432 ± 63.807
2025-08-07 08:08:35,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [8.332514, 20.181011, 9.619264, 10.554272, 18.623192, 19.042612, 23.129774, 12.323157, 24.757185, 228.18025]
2025-08-07 08:08:35,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [14.0, 19.0, 11.0, 19.0, 21.0, 26.0, 23.0, 20.0, 21.0, 136.0]
2025-08-07 08:08:35,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 15 minutes, 22 seconds)
2025-08-07 08:10:10,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:10:11,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 115.19049 ± 98.415
2025-08-07 08:10:11,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [90.77042, 15.319621, 12.330673, 19.617268, 22.16672, 173.50003, 199.54636, 323.87277, 177.21985, 117.56117]
2025-08-07 08:10:11,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [59.0, 16.0, 14.0, 21.0, 22.0, 119.0, 112.0, 162.0, 99.0, 91.0]
2025-08-07 08:10:11,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1226 [INFO]: New best (115.19) for latency ExtremeClogL1U23
2025-08-07 08:10:11,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 13 minutes, 36 seconds)
2025-08-07 08:11:47,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:11:48,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 30.08277 ± 36.526
2025-08-07 08:11:48,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [12.137593, 138.57622, 9.947797, 16.372416, 26.89982, 16.832838, 13.339956, 20.473282, 23.090353, 23.15741]
2025-08-07 08:11:48,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [13.0, 91.0, 12.0, 18.0, 23.0, 17.0, 18.0, 20.0, 22.0, 21.0]
2025-08-07 08:11:48,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 12 minutes, 3 seconds)
2025-08-07 08:13:22,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:13:22,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 47.02789 ± 45.522
2025-08-07 08:13:22,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [101.98662, 10.217637, 10.895054, 98.04756, 11.809696, 72.87884, 11.170371, 9.589692, 14.348623, 129.33478]
2025-08-07 08:13:22,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [64.0, 13.0, 24.0, 54.0, 13.0, 55.0, 16.0, 15.0, 21.0, 90.0]
2025-08-07 08:13:22,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 10 minutes, 22 seconds)
2025-08-07 08:14:58,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:14:59,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 78.75188 ± 71.094
2025-08-07 08:14:59,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [140.31783, 179.05795, 103.51566, 16.821266, 92.095474, 10.412668, 13.420877, 14.352649, 16.085438, 201.43889]
2025-08-07 08:14:59,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [117.0, 118.0, 86.0, 18.0, 74.0, 18.0, 22.0, 19.0, 18.0, 106.0]
2025-08-07 08:14:59,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 8 minutes, 46 seconds)
2025-08-07 08:16:34,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:16:35,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 94.65785 ± 64.344
2025-08-07 08:16:35,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [131.29742, 107.14552, 136.36104, 94.641136, 93.79451, 23.220978, 12.606788, 12.276387, 100.928795, 234.30598]
2025-08-07 08:16:35,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [139.0, 60.0, 101.0, 69.0, 63.0, 24.0, 15.0, 19.0, 76.0, 132.0]
2025-08-07 08:16:35,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 7 minutes, 12 seconds)
2025-08-07 08:18:10,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:18:11,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 95.54895 ± 73.212
2025-08-07 08:18:11,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [13.564484, 190.3605, 106.28758, 10.308348, 96.57368, 171.25491, 173.13911, 167.60303, 13.913833, 12.48405]
2025-08-07 08:18:11,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 150.0, 60.0, 16.0, 72.0, 164.0, 111.0, 97.0, 16.0, 18.0]
2025-08-07 08:18:12,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 5 minutes, 43 seconds)
2025-08-07 08:19:47,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:19:48,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 55.59885 ± 54.995
2025-08-07 08:19:48,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [10.461437, 17.875124, 85.39808, 113.00598, 12.441187, 172.27168, 14.528096, 13.886811, 15.128767, 100.991325]
2025-08-07 08:19:48,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [15.0, 24.0, 67.0, 74.0, 19.0, 97.0, 20.0, 21.0, 18.0, 74.0]
2025-08-07 08:19:48,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 3 minutes, 59 seconds)
2025-08-07 08:21:23,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:21:23,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 60.89557 ± 52.241
2025-08-07 08:21:23,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [16.191614, 9.323871, 76.77654, 10.67023, 112.28692, 105.176216, 22.897062, 19.57985, 170.83551, 65.217926]
2025-08-07 08:21:23,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [26.0, 13.0, 61.0, 14.0, 107.0, 85.0, 25.0, 24.0, 84.0, 44.0]
2025-08-07 08:21:24,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 2 minutes, 32 seconds)
2025-08-07 08:22:59,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:23:00,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 73.76369 ± 63.925
2025-08-07 08:23:00,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [120.88109, 13.514313, 14.613548, 152.91609, 189.18495, 14.514643, 78.218056, 11.7684765, 23.371355, 118.65434]
2025-08-07 08:23:00,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [97.0, 21.0, 17.0, 96.0, 98.0, 18.0, 65.0, 13.0, 24.0, 89.0]
2025-08-07 08:23:00,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 56 seconds)
2025-08-07 08:24:35,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:24:36,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 73.36148 ± 83.563
2025-08-07 08:24:36,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [14.762678, 155.82396, 13.7101965, 103.36836, 15.1679535, 142.9951, 9.168588, 11.850431, 7.308071, 259.45947]
2025-08-07 08:24:36,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [21.0, 104.0, 18.0, 61.0, 17.0, 91.0, 12.0, 14.0, 10.0, 115.0]
2025-08-07 08:24:36,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 64/100 (estimated time remaining: 59 minutes, 18 seconds)
2025-08-07 08:26:12,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:26:12,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 48.57284 ± 50.076
2025-08-07 08:26:12,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [130.0896, 19.034637, 20.670263, 11.713144, 133.3854, 12.027675, 17.330112, 18.186308, 109.86893, 13.4223385]
2025-08-07 08:26:12,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [72.0, 20.0, 20.0, 18.0, 85.0, 19.0, 23.0, 20.0, 66.0, 17.0]
2025-08-07 08:26:12,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 65/100 (estimated time remaining: 57 minutes, 40 seconds)
2025-08-07 08:27:47,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:27:48,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 79.11720 ± 92.110
2025-08-07 08:27:48,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [16.313757, 297.3416, 12.875103, 148.36732, 160.67012, 99.6592, 13.467462, 12.374935, 10.179026, 19.923529]
2025-08-07 08:27:48,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [17.0, 139.0, 22.0, 92.0, 97.0, 78.0, 17.0, 19.0, 17.0, 25.0]
2025-08-07 08:27:48,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 66/100 (estimated time remaining: 56 minutes, 2 seconds)
2025-08-07 08:29:24,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:29:24,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 72.34897 ± 57.462
2025-08-07 08:29:24,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [160.04117, 104.39456, 15.2828865, 111.00727, 20.330145, 20.095165, 145.26015, 19.920712, 9.284889, 117.87272]
2025-08-07 08:29:24,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [82.0, 81.0, 16.0, 80.0, 21.0, 22.0, 89.0, 20.0, 16.0, 93.0]
2025-08-07 08:29:24,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 67/100 (estimated time remaining: 54 minutes, 30 seconds)
2025-08-07 08:31:00,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:31:01,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 60.01144 ± 82.893
2025-08-07 08:31:01,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [13.100208, 100.5784, 90.36133, 25.911142, 12.069011, 17.281448, 290.35355, 15.481204, 18.447784, 16.530317]
2025-08-07 08:31:01,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [24.0, 74.0, 59.0, 25.0, 17.0, 20.0, 115.0, 16.0, 19.0, 17.0]
2025-08-07 08:31:01,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 68/100 (estimated time remaining: 52 minutes, 53 seconds)
2025-08-07 08:32:36,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:32:37,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 85.20833 ± 90.776
2025-08-07 08:32:37,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [235.51628, 9.665497, 237.85474, 152.60507, 18.444416, 25.086264, 13.270879, 13.044372, 134.6235, 11.972142]
2025-08-07 08:32:37,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [97.0, 12.0, 123.0, 88.0, 21.0, 23.0, 18.0, 19.0, 112.0, 13.0]
2025-08-07 08:32:37,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 69/100 (estimated time remaining: 51 minutes, 17 seconds)
2025-08-07 08:34:12,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:34:12,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 38.68461 ± 39.368
2025-08-07 08:34:12,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [8.425486, 15.729829, 14.19667, 121.133934, 14.6959, 87.801186, 14.661347, 15.814156, 82.018974, 12.368643]
2025-08-07 08:34:12,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [12.0, 17.0, 17.0, 97.0, 21.0, 56.0, 16.0, 19.0, 58.0, 15.0]
2025-08-07 08:34:12,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 70/100 (estimated time remaining: 49 minutes, 36 seconds)
2025-08-07 08:35:49,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:35:50,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 60.24375 ± 73.371
2025-08-07 08:35:50,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [151.3128, 11.380528, 179.40672, 20.131124, 183.99103, 11.466329, 8.049192, 14.897228, 11.192922, 10.609625]
2025-08-07 08:35:50,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [94.0, 17.0, 134.0, 20.0, 97.0, 16.0, 12.0, 18.0, 18.0, 20.0]
2025-08-07 08:35:50,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 71/100 (estimated time remaining: 48 minutes, 9 seconds)
2025-08-07 08:37:24,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:37:25,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 47.24129 ± 49.870
2025-08-07 08:37:25,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [15.3553, 11.371826, 16.030903, 133.47253, 11.503381, 16.960888, 16.082375, 98.137375, 134.45506, 19.043304]
2025-08-07 08:37:25,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [24.0, 16.0, 19.0, 90.0, 18.0, 24.0, 19.0, 59.0, 102.0, 24.0]
2025-08-07 08:37:25,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 72/100 (estimated time remaining: 46 minutes, 26 seconds)
2025-08-07 08:39:01,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:39:01,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 60.40057 ± 79.551
2025-08-07 08:39:01,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [17.971859, 11.791613, 19.168118, 18.384716, 18.497877, 270.0737, 104.222176, 15.901738, 9.176155, 118.8178]
2025-08-07 08:39:01,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [20.0, 15.0, 20.0, 22.0, 19.0, 153.0, 103.0, 17.0, 11.0, 92.0]
2025-08-07 08:39:01,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 73/100 (estimated time remaining: 44 minutes, 51 seconds)
2025-08-07 08:40:37,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:40:38,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 28.50250 ± 41.407
2025-08-07 08:40:38,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [17.288017, 12.242462, 9.985676, 10.098597, 152.15204, 11.030767, 23.109646, 16.587132, 14.160444, 18.370272]
2025-08-07 08:40:38,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [26.0, 14.0, 14.0, 13.0, 90.0, 17.0, 23.0, 21.0, 18.0, 22.0]
2025-08-07 08:40:38,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 74/100 (estimated time remaining: 43 minutes, 15 seconds)
2025-08-07 08:42:12,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:42:12,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 59.53452 ± 66.338
2025-08-07 08:42:12,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [22.90515, 168.55254, 13.5019455, 14.626077, 76.39941, 200.17886, 60.972393, 12.382962, 14.803103, 11.022786]
2025-08-07 08:42:12,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [23.0, 90.0, 23.0, 16.0, 56.0, 126.0, 45.0, 21.0, 16.0, 15.0]
2025-08-07 08:42:13,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 75/100 (estimated time remaining: 41 minutes, 37 seconds)
2025-08-07 08:43:49,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:43:49,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 36.48310 ± 33.860
2025-08-07 08:43:49,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [15.134573, 96.41966, 22.076738, 12.79691, 12.668183, 12.816254, 12.427177, 19.54503, 100.29047, 60.656063]
2025-08-07 08:43:49,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [17.0, 72.0, 21.0, 23.0, 18.0, 17.0, 15.0, 25.0, 66.0, 37.0]
2025-08-07 08:43:49,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 76/100 (estimated time remaining: 39 minutes, 57 seconds)
2025-08-07 08:45:24,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:45:25,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 68.32354 ± 50.600
2025-08-07 08:45:25,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [19.879728, 110.183304, 89.61485, 8.1842985, 38.159164, 18.819458, 141.61069, 114.05342, 128.78131, 13.949202]
2025-08-07 08:45:25,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [23.0, 78.0, 63.0, 13.0, 32.0, 23.0, 105.0, 96.0, 95.0, 16.0]
2025-08-07 08:45:25,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 77/100 (estimated time remaining: 38 minutes, 24 seconds)
2025-08-07 08:47:00,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:47:00,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 36.62797 ± 39.939
2025-08-07 08:47:00,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [79.68034, 12.576467, 18.789309, 18.582304, 44.850433, 23.622282, 8.783169, 8.97053, 11.492137, 138.93274]
2025-08-07 08:47:00,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [60.0, 17.0, 18.0, 25.0, 39.0, 24.0, 16.0, 14.0, 19.0, 80.0]
2025-08-07 08:47:00,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 78/100 (estimated time remaining: 36 minutes, 43 seconds)
2025-08-07 08:48:38,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:48:39,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 87.26096 ± 69.420
2025-08-07 08:48:39,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [7.713434, 115.878136, 116.321785, 124.813065, 230.88574, 109.25019, 12.885731, 9.7785015, 125.97341, 19.109589]
2025-08-07 08:48:39,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [16.0, 90.0, 84.0, 64.0, 186.0, 74.0, 24.0, 13.0, 86.0, 22.0]
2025-08-07 08:48:39,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 79/100 (estimated time remaining: 35 minutes, 18 seconds)
2025-08-07 08:50:13,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:50:14,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 69.12099 ± 70.636
2025-08-07 08:50:14,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [6.304319, 15.7838125, 17.811344, 8.631481, 208.24686, 19.305065, 71.973, 85.72385, 190.30643, 67.12376]
2025-08-07 08:50:14,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [9.0, 22.0, 20.0, 14.0, 131.0, 25.0, 52.0, 55.0, 116.0, 40.0]
2025-08-07 08:50:14,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 80/100 (estimated time remaining: 33 minutes, 41 seconds)
2025-08-07 08:51:49,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:51:50,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 108.92757 ± 66.629
2025-08-07 08:51:50,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [134.12073, 99.01305, 10.952756, 17.953693, 64.2474, 134.82831, 82.97306, 160.1204, 246.5927, 138.47357]
2025-08-07 08:51:50,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [80.0, 64.0, 13.0, 19.0, 39.0, 97.0, 49.0, 105.0, 127.0, 97.0]
2025-08-07 08:51:50,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 81/100 (estimated time remaining: 32 minutes, 3 seconds)
2025-08-07 08:53:25,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:53:26,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 83.32887 ± 65.987
2025-08-07 08:53:26,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [145.81548, 19.597578, 91.37016, 17.31902, 13.333561, 90.8888, 9.179159, 143.42056, 88.411194, 213.95317]
2025-08-07 08:53:26,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [116.0, 22.0, 72.0, 18.0, 18.0, 104.0, 18.0, 131.0, 65.0, 118.0]
2025-08-07 08:53:26,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 82/100 (estimated time remaining: 30 minutes, 28 seconds)
2025-08-07 08:55:02,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:55:03,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 88.00710 ± 88.969
2025-08-07 08:55:03,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [8.384507, 10.897373, 158.75249, 19.190456, 273.35464, 176.3806, 12.055882, 143.49849, 63.979897, 13.576623]
2025-08-07 08:55:03,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [11.0, 13.0, 139.0, 20.0, 174.0, 103.0, 17.0, 110.0, 46.0, 20.0]
2025-08-07 08:55:03,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 83/100 (estimated time remaining: 28 minutes, 57 seconds)
2025-08-07 08:56:39,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:56:40,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 100.32667 ± 93.717
2025-08-07 08:56:40,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [11.221045, 17.229979, 129.16194, 15.892783, 108.480804, 115.539116, 101.87137, 138.25795, 340.4096, 25.202116]
2025-08-07 08:56:40,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [14.0, 22.0, 107.0, 17.0, 106.0, 60.0, 66.0, 82.0, 129.0, 23.0]
2025-08-07 08:56:40,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 84/100 (estimated time remaining: 27 minutes, 15 seconds)
2025-08-07 08:58:16,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:58:17,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 125.00621 ± 129.823
2025-08-07 08:58:17,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [134.93196, 198.43231, 13.340651, 463.07062, 129.14104, 13.562758, 140.06624, 18.457281, 14.2032995, 124.85613]
2025-08-07 08:58:17,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [107.0, 117.0, 19.0, 217.0, 94.0, 17.0, 75.0, 18.0, 20.0, 84.0]
2025-08-07 08:58:17,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1226 [INFO]: New best (125.01) for latency ExtremeClogL1U23
2025-08-07 08:58:17,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 85/100 (estimated time remaining: 25 minutes, 46 seconds)
2025-08-07 08:59:51,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:59:52,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 106.22550 ± 91.112
2025-08-07 08:59:52,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [208.32162, 223.12666, 201.85637, 15.862529, 45.794228, 15.620872, 14.022417, 217.70116, 107.65253, 12.296494]
2025-08-07 08:59:52,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [163.0, 98.0, 91.0, 16.0, 44.0, 17.0, 25.0, 121.0, 73.0, 18.0]
2025-08-07 08:59:52,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 86/100 (estimated time remaining: 24 minutes, 6 seconds)
2025-08-07 09:01:28,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:01:29,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 115.89718 ± 85.834
2025-08-07 09:01:29,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [102.20823, 150.06258, 265.07236, 9.442374, 12.313745, 123.82232, 104.34932, 110.92533, 255.67834, 25.097298]
2025-08-07 09:01:29,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [72.0, 95.0, 158.0, 11.0, 17.0, 105.0, 73.0, 85.0, 148.0, 24.0]
2025-08-07 09:01:29,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 87/100 (estimated time remaining: 22 minutes, 31 seconds)
2025-08-07 09:03:04,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:03:05,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 57.86107 ± 51.320
2025-08-07 09:03:05,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [26.44486, 116.56595, 119.6999, 126.710625, 10.530443, 118.891914, 13.468003, 17.002457, 14.757066, 14.539507]
2025-08-07 09:03:05,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [25.0, 73.0, 91.0, 116.0, 15.0, 90.0, 20.0, 20.0, 21.0, 17.0]
2025-08-07 09:03:05,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 88/100 (estimated time remaining: 20 minutes, 52 seconds)
2025-08-07 09:04:40,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:04:41,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 66.63819 ± 73.011
2025-08-07 09:04:41,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [11.602173, 15.363416, 18.889387, 96.77761, 149.85014, 13.908348, 12.510489, 233.75566, 99.90487, 13.819851]
2025-08-07 09:04:41,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [14.0, 22.0, 22.0, 60.0, 95.0, 21.0, 13.0, 113.0, 79.0, 19.0]
2025-08-07 09:04:41,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 89/100 (estimated time remaining: 19 minutes, 13 seconds)
2025-08-07 09:06:17,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:06:17,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 21.80377 ± 19.922
2025-08-07 09:06:17,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [11.767008, 17.321146, 18.705025, 12.600223, 20.334734, 12.649561, 10.105925, 80.59655, 20.70961, 13.247937]
2025-08-07 09:06:17,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [17.0, 21.0, 23.0, 16.0, 25.0, 15.0, 14.0, 55.0, 20.0, 19.0]
2025-08-07 09:06:17,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 90/100 (estimated time remaining: 17 minutes, 35 seconds)
2025-08-07 09:07:52,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:07:53,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 135.23593 ± 100.371
2025-08-07 09:07:53,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [19.971735, 145.28964, 69.714165, 100.69911, 102.21628, 238.88515, 367.3581, 128.20804, 12.499333, 167.51773]
2025-08-07 09:07:53,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [25.0, 94.0, 51.0, 71.0, 72.0, 146.0, 135.0, 74.0, 17.0, 103.0]
2025-08-07 09:07:53,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1226 [INFO]: New best (135.24) for latency ExtremeClogL1U23
2025-08-07 09:07:54,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 91/100 (estimated time remaining: 16 minutes, 2 seconds)
2025-08-07 09:09:29,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:09:29,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 67.25438 ± 65.981
2025-08-07 09:09:29,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [19.637327, 15.599685, 135.68625, 17.519331, 14.090325, 106.761696, 10.12564, 189.03442, 147.32805, 16.761026]
2025-08-07 09:09:29,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [20.0, 21.0, 93.0, 23.0, 21.0, 76.0, 12.0, 105.0, 94.0, 19.0]
2025-08-07 09:09:29,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 92/100 (estimated time remaining: 14 minutes, 24 seconds)
2025-08-07 09:11:05,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:11:05,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 75.59184 ± 83.296
2025-08-07 09:11:05,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [26.651863, 217.7327, 13.170017, 237.64233, 12.292084, 17.40644, 24.446846, 14.311814, 64.36471, 127.899574]
2025-08-07 09:11:05,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [23.0, 105.0, 18.0, 106.0, 16.0, 19.0, 22.0, 20.0, 39.0, 77.0]
2025-08-07 09:11:05,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 93/100 (estimated time remaining: 12 minutes, 48 seconds)
2025-08-07 09:12:40,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:12:41,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 80.19362 ± 46.112
2025-08-07 09:12:41,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [19.192965, 82.78662, 93.46637, 10.287993, 17.092367, 109.30932, 145.54916, 86.87614, 132.1664, 105.20888]
2025-08-07 09:12:41,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [22.0, 63.0, 72.0, 19.0, 24.0, 69.0, 95.0, 61.0, 107.0, 66.0]
2025-08-07 09:12:41,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 94/100 (estimated time remaining: 11 minutes, 12 seconds)
2025-08-07 09:14:16,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:14:17,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 87.27455 ± 72.246
2025-08-07 09:14:17,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [106.35785, 122.17888, 64.27746, 162.64117, 160.40298, 11.390752, 10.716566, 13.863166, 212.8578, 8.058808]
2025-08-07 09:14:17,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [88.0, 77.0, 40.0, 90.0, 125.0, 15.0, 19.0, 15.0, 112.0, 14.0]
2025-08-07 09:14:17,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 95/100 (estimated time remaining: 9 minutes, 35 seconds)
2025-08-07 09:15:53,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:15:53,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 68.59527 ± 71.788
2025-08-07 09:15:53,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [15.56949, 18.488731, 194.09848, 122.869064, 195.51036, 15.624129, 20.053703, 11.962192, 14.200112, 77.57641]
2025-08-07 09:15:53,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [17.0, 21.0, 167.0, 82.0, 120.0, 18.0, 20.0, 14.0, 17.0, 48.0]
2025-08-07 09:15:53,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 96/100 (estimated time remaining: 7 minutes, 59 seconds)
2025-08-07 09:17:30,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:17:31,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 100.49245 ± 73.195
2025-08-07 09:17:31,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [76.29996, 11.899253, 90.28762, 8.450971, 145.9133, 134.8644, 97.22242, 210.11032, 217.10666, 12.769671]
2025-08-07 09:17:31,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [56.0, 14.0, 52.0, 16.0, 70.0, 84.0, 71.0, 100.0, 134.0, 19.0]
2025-08-07 09:17:31,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 25 seconds)
2025-08-07 09:19:06,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:19:07,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 101.18777 ± 71.936
2025-08-07 09:19:07,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [131.45468, 21.250114, 20.50532, 103.79254, 178.97374, 182.0249, 191.98764, 19.430897, 150.56091, 11.896955]
2025-08-07 09:19:07,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [103.0, 22.0, 28.0, 111.0, 113.0, 88.0, 134.0, 22.0, 128.0, 19.0]
2025-08-07 09:19:07,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 98/100 (estimated time remaining: 4 minutes, 48 seconds)
2025-08-07 09:20:41,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:20:42,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 63.47126 ± 49.163
2025-08-07 09:20:42,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [145.11516, 83.212616, 94.616486, 9.6948395, 10.383474, 58.39253, 12.403696, 64.113525, 139.03268, 17.747627]
2025-08-07 09:20:42,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [126.0, 63.0, 71.0, 12.0, 12.0, 37.0, 19.0, 39.0, 112.0, 23.0]
2025-08-07 09:20:42,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 12 seconds)
2025-08-07 09:22:18,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:22:19,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 76.08501 ± 63.507
2025-08-07 09:22:19,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [119.3996, 180.98506, 127.11442, 153.21454, 100.59849, 23.661236, 21.532028, 6.929997, 15.801471, 11.613357]
2025-08-07 09:22:19,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [91.0, 127.0, 89.0, 113.0, 69.0, 22.0, 20.0, 10.0, 24.0, 18.0]
2025-08-07 09:22:19,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 36 seconds)
2025-08-07 09:23:53,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:23:54,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1221 [DEBUG]: Total Reward: 104.11251 ± 107.870
2025-08-07 09:23:54,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1222 [DEBUG]: All rewards: [16.935617, 27.192944, 9.480099, 10.211809, 6.5187926, 104.884224, 335.59818, 233.4679, 122.90963, 173.92598]
2025-08-07 09:23:54,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 27.0, 12.0, 14.0, 9.0, 86.0, 202.0, 177.0, 87.0, 94.0]
2025-08-07 09:23:55,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-hopper):1251 [DEBUG]: Training session finished
