2025-08-07 10:08:02,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc8/noiseperc25-hopper/MM1Queue_a033_s075-bpql-mem16
2025-08-07 10:08:02,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc8/noiseperc25-hopper/MM1Queue_a033_s075-bpql-mem16
2025-08-07 10:08:02,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1110 [DEBUG]: args.trainer_eval_latencies: {'MM1Queue_a033_s075': <latency_env.delayed_mdp.MM1QueueDelay object at 0x146ecbfd3710>}
2025-08-07 10:08:02,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1111 [DEBUG]: using device: cuda
2025-08-07 10:08:02,985 baseline-bpql-noiseperc25-hopper:77 [WARNING]: args.assumed_delay != args.horizon: 16 != 24
2025-08-07 10:08:02,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1133 [INFO]: Creating new trainer
2025-08-07 10:08:03,000 baseline-bpql-noiseperc25-hopper:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=59, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-08-07 10:08:03,000 baseline-bpql-noiseperc25-hopper:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 10:08:03,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1194 [DEBUG]: Starting training session...
2025-08-07 10:08:03,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 1/100
2025-08-07 10:09:32,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:09:32,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 37.53423 ± 20.252
2025-08-07 10:09:32,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [16.753641, 53.592453, 50.5638, 47.009052, 31.04424, 12.279278, 11.179967, 59.917397, 22.881035, 70.121445]
2025-08-07 10:09:32,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [16.0, 37.0, 34.0, 35.0, 37.0, 14.0, 15.0, 46.0, 36.0, 55.0]
2025-08-07 10:09:32,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1226 [INFO]: New best (37.53) for latency MM1Queue_a033_s075
2025-08-07 10:09:32,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 26 minutes, 23 seconds)
2025-08-07 10:11:07,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:11:08,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 61.75248 ± 45.342
2025-08-07 10:11:08,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [143.72, 12.421518, 74.99619, 21.142277, 95.86811, 10.562402, 68.65553, 116.419945, 64.47158, 9.267257]
2025-08-07 10:11:08,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [89.0, 15.0, 72.0, 21.0, 79.0, 14.0, 55.0, 75.0, 43.0, 14.0]
2025-08-07 10:11:08,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1226 [INFO]: New best (61.75) for latency MM1Queue_a033_s075
2025-08-07 10:11:08,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 30 minutes, 53 seconds)
2025-08-07 10:12:44,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:12:45,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 39.81180 ± 24.644
2025-08-07 10:12:45,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [15.352146, 7.251349, 15.093761, 50.79076, 67.04383, 69.12127, 37.980156, 79.0209, 16.537224, 39.926563]
2025-08-07 10:12:45,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [16.0, 10.0, 18.0, 37.0, 55.0, 87.0, 43.0, 67.0, 18.0, 32.0]
2025-08-07 10:12:45,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 31 minutes, 38 seconds)
2025-08-07 10:14:20,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:14:20,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 32.55824 ± 31.899
2025-08-07 10:14:20,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [8.128874, 70.696846, 101.24266, 20.345467, 12.669546, 12.847544, 13.842158, 65.80544, 10.16902, 9.834773]
2025-08-07 10:14:20,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [11.0, 62.0, 64.0, 19.0, 16.0, 15.0, 18.0, 43.0, 16.0, 14.0]
2025-08-07 10:14:20,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 30 minutes, 39 seconds)
2025-08-07 10:15:56,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:15:56,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 79.60099 ± 39.013
2025-08-07 10:15:56,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [89.983505, 126.15634, 7.641808, 116.0816, 96.114136, 96.50523, 76.65001, 69.103775, 9.38825, 108.38519]
2025-08-07 10:15:56,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [88.0, 120.0, 11.0, 77.0, 61.0, 75.0, 64.0, 48.0, 12.0, 62.0]
2025-08-07 10:15:56,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1226 [INFO]: New best (79.60) for latency MM1Queue_a033_s075
2025-08-07 10:15:56,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 29 minutes, 50 seconds)
2025-08-07 10:17:32,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:17:33,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 89.39130 ± 72.061
2025-08-07 10:17:33,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [143.18619, 161.3352, 8.210049, 125.73966, 14.480843, 221.50629, 75.054726, 118.755905, 6.214915, 19.429264]
2025-08-07 10:17:33,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [87.0, 102.0, 11.0, 91.0, 17.0, 149.0, 66.0, 94.0, 12.0, 20.0]
2025-08-07 10:17:33,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1226 [INFO]: New best (89.39) for latency MM1Queue_a033_s075
2025-08-07 10:17:33,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 30 minutes, 45 seconds)
2025-08-07 10:19:09,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:19:10,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 61.10765 ± 72.826
2025-08-07 10:19:10,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [258.3329, 84.8159, 10.010455, 88.97349, 46.74052, 75.191124, 7.0426397, 12.689093, 15.886826, 11.393562]
2025-08-07 10:19:10,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [146.0, 74.0, 14.0, 62.0, 44.0, 66.0, 12.0, 16.0, 19.0, 15.0]
2025-08-07 10:19:10,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 29 minutes, 17 seconds)
2025-08-07 10:20:46,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:20:47,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 69.26745 ± 76.383
2025-08-07 10:20:47,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [74.808525, 14.069145, 53.362762, 188.53539, 8.418988, 230.44606, 93.0245, 9.692477, 8.363147, 11.9535055]
2025-08-07 10:20:47,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [46.0, 18.0, 45.0, 133.0, 11.0, 128.0, 62.0, 16.0, 15.0, 15.0]
2025-08-07 10:20:47,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 27 minutes, 53 seconds)
2025-08-07 10:22:22,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:22:23,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 63.20150 ± 45.522
2025-08-07 10:22:23,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [14.616413, 71.58878, 15.008971, 104.93598, 76.1845, 123.69897, 76.98944, 7.8240886, 129.8838, 11.284017]
2025-08-07 10:22:23,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [16.0, 42.0, 17.0, 86.0, 57.0, 85.0, 45.0, 16.0, 73.0, 16.0]
2025-08-07 10:22:23,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 26 minutes, 24 seconds)
2025-08-07 10:23:58,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:23:58,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 54.42216 ± 57.478
2025-08-07 10:23:58,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [128.92046, 7.978446, 91.22833, 165.21811, 8.467526, 14.921777, 6.6597676, 14.061098, 98.74305, 8.023063]
2025-08-07 10:23:58,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [80.0, 10.0, 87.0, 104.0, 18.0, 15.0, 15.0, 16.0, 71.0, 17.0]
2025-08-07 10:23:58,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 24 minutes, 36 seconds)
2025-08-07 10:25:35,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:25:36,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 68.05898 ± 49.011
2025-08-07 10:25:36,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [72.37468, 115.8115, 129.02711, 9.230614, 140.46292, 49.75584, 39.627464, 104.842995, 9.484174, 9.972474]
2025-08-07 10:25:36,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [52.0, 61.0, 68.0, 14.0, 99.0, 68.0, 33.0, 57.0, 12.0, 13.0]
2025-08-07 10:25:36,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 23 minutes, 12 seconds)
2025-08-07 10:27:11,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:27:11,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 57.77139 ± 48.592
2025-08-07 10:27:11,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [10.431644, 50.01277, 48.77556, 14.602852, 10.730874, 10.774874, 115.92894, 53.591366, 112.8224, 150.04265]
2025-08-07 10:27:11,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [15.0, 40.0, 53.0, 23.0, 15.0, 15.0, 86.0, 43.0, 62.0, 75.0]
2025-08-07 10:27:11,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 21 minutes, 21 seconds)
2025-08-07 10:28:48,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:28:48,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 50.25422 ± 35.967
2025-08-07 10:28:48,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [69.651245, 8.564575, 109.26143, 77.47936, 9.195239, 6.4757414, 14.566985, 51.266876, 87.470024, 68.610756]
2025-08-07 10:28:48,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [44.0, 12.0, 93.0, 69.0, 11.0, 10.0, 17.0, 31.0, 90.0, 67.0]
2025-08-07 10:28:48,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 19 minutes, 39 seconds)
2025-08-07 10:30:24,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:30:24,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 64.12201 ± 46.944
2025-08-07 10:30:24,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [133.92813, 9.072955, 63.23062, 130.61316, 85.45628, 8.476317, 39.045437, 113.67787, 46.02317, 11.69619]
2025-08-07 10:30:24,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [88.0, 11.0, 44.0, 95.0, 62.0, 11.0, 32.0, 61.0, 49.0, 14.0]
2025-08-07 10:30:24,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 18 minutes, 5 seconds)
2025-08-07 10:32:02,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:32:03,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 116.31834 ± 111.668
2025-08-07 10:32:03,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [77.63349, 75.614845, 230.3558, 9.417896, 148.07863, 90.5481, 388.87585, 13.824868, 13.141106, 115.69282]
2025-08-07 10:32:03,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [45.0, 77.0, 165.0, 15.0, 111.0, 74.0, 206.0, 16.0, 15.0, 82.0]
2025-08-07 10:32:03,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1226 [INFO]: New best (116.32) for latency MM1Queue_a033_s075
2025-08-07 10:32:03,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 17 minutes, 10 seconds)
2025-08-07 10:33:37,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:33:38,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 82.11223 ± 75.443
2025-08-07 10:33:38,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [214.82608, 57.68653, 67.061386, 231.12132, 6.067696, 72.26981, 10.059035, 11.662822, 66.61746, 83.75009]
2025-08-07 10:33:38,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [109.0, 40.0, 42.0, 104.0, 12.0, 45.0, 15.0, 15.0, 39.0, 63.0]
2025-08-07 10:33:38,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 14 minutes, 58 seconds)
2025-08-07 10:35:14,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:35:15,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 61.36313 ± 52.344
2025-08-07 10:35:15,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [129.91527, 12.956899, 85.62403, 98.026764, 9.572227, 12.422714, 9.092634, 145.91414, 13.90906, 96.19755]
2025-08-07 10:35:15,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [88.0, 15.0, 73.0, 63.0, 15.0, 18.0, 11.0, 79.0, 16.0, 80.0]
2025-08-07 10:35:15,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 13 minutes, 47 seconds)
2025-08-07 10:36:50,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:36:51,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 90.37953 ± 73.415
2025-08-07 10:36:51,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [175.65332, 12.895875, 10.483117, 155.54732, 98.09891, 123.0738, 216.02562, 92.91711, 10.862897, 8.237316]
2025-08-07 10:36:51,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [124.0, 15.0, 16.0, 111.0, 82.0, 93.0, 121.0, 74.0, 14.0, 12.0]
2025-08-07 10:36:51,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 11 minutes, 56 seconds)
2025-08-07 10:38:28,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:38:29,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 112.48491 ± 67.917
2025-08-07 10:38:29,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [15.911581, 140.21701, 132.39816, 263.7339, 122.06163, 94.58454, 108.59561, 6.692076, 142.87169, 97.78299]
2025-08-07 10:38:29,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [16.0, 78.0, 67.0, 170.0, 67.0, 73.0, 82.0, 11.0, 98.0, 68.0]
2025-08-07 10:38:29,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 10 minutes, 50 seconds)
2025-08-07 10:40:04,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:40:05,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 107.53065 ± 65.918
2025-08-07 10:40:05,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [13.472768, 9.792477, 128.88297, 84.76034, 56.6901, 182.52547, 167.87593, 160.81113, 197.90472, 72.59057]
2025-08-07 10:40:05,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [16.0, 13.0, 67.0, 70.0, 51.0, 84.0, 86.0, 131.0, 151.0, 47.0]
2025-08-07 10:40:05,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 8 minutes, 37 seconds)
2025-08-07 10:41:41,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:41:42,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 112.73704 ± 87.377
2025-08-07 10:41:42,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [118.25193, 109.30975, 105.6518, 95.51782, 5.852255, 61.376553, 355.50067, 65.24841, 88.04952, 122.611626]
2025-08-07 10:41:42,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [63.0, 78.0, 87.0, 52.0, 9.0, 36.0, 158.0, 48.0, 53.0, 86.0]
2025-08-07 10:41:42,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 7 minutes, 31 seconds)
2025-08-07 10:43:18,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:43:19,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 86.90948 ± 73.320
2025-08-07 10:43:19,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [14.424202, 92.30582, 116.14917, 141.26247, 115.53096, 116.36223, 7.757618, 11.429146, 9.0116205, 244.86166]
2025-08-07 10:43:19,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [18.0, 66.0, 67.0, 83.0, 66.0, 77.0, 14.0, 16.0, 18.0, 111.0]
2025-08-07 10:43:19,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 5 minutes, 47 seconds)
2025-08-07 10:44:55,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:44:56,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 108.25847 ± 113.681
2025-08-07 10:44:56,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [10.477472, 254.93855, 9.244247, 137.36824, 87.65241, 9.829183, 10.19887, 11.245584, 242.53842, 309.09167]
2025-08-07 10:44:56,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [13.0, 171.0, 12.0, 72.0, 52.0, 14.0, 13.0, 14.0, 132.0, 140.0]
2025-08-07 10:44:56,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 4 minutes, 20 seconds)
2025-08-07 10:46:32,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:46:33,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 119.77525 ± 119.030
2025-08-07 10:46:33,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [8.53527, 9.837, 8.0410795, 310.67877, 257.3284, 160.67831, 63.385746, 70.01272, 295.36127, 13.893862]
2025-08-07 10:46:33,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [12.0, 14.0, 12.0, 175.0, 117.0, 93.0, 48.0, 44.0, 161.0, 16.0]
2025-08-07 10:46:33,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1226 [INFO]: New best (119.78) for latency MM1Queue_a033_s075
2025-08-07 10:46:33,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 2 minutes, 37 seconds)
2025-08-07 10:48:09,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:48:10,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 97.59377 ± 83.625
2025-08-07 10:48:10,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [76.097046, 15.007268, 134.7981, 128.0139, 15.6259575, 206.28798, 260.4463, 117.04605, 14.137157, 8.477921]
2025-08-07 10:48:10,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [57.0, 19.0, 69.0, 92.0, 17.0, 104.0, 144.0, 77.0, 16.0, 14.0]
2025-08-07 10:48:10,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 1 minute, 9 seconds)
2025-08-07 10:49:46,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:49:47,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 88.74164 ± 89.020
2025-08-07 10:49:47,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [60.84216, 8.827451, 68.78648, 9.4010515, 140.48148, 124.60395, 6.4089737, 297.708, 159.18243, 11.1743765]
2025-08-07 10:49:47,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [49.0, 14.0, 60.0, 11.0, 80.0, 108.0, 9.0, 154.0, 93.0, 17.0]
2025-08-07 10:49:47,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 27/100 (estimated time remaining: 1 hour, 59 minutes, 37 seconds)
2025-08-07 10:51:23,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:51:23,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 80.67335 ± 93.471
2025-08-07 10:51:23,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [94.28269, 13.882871, 265.78537, 7.7156925, 10.657714, 19.130962, 16.911377, 9.840803, 140.11502, 228.41093]
2025-08-07 10:51:23,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [63.0, 16.0, 177.0, 12.0, 13.0, 32.0, 17.0, 13.0, 96.0, 133.0]
2025-08-07 10:51:23,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 28/100 (estimated time remaining: 1 hour, 57 minutes, 55 seconds)
2025-08-07 10:53:00,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:53:00,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 69.83634 ± 62.707
2025-08-07 10:53:00,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [6.0604496, 10.036874, 67.69053, 77.92337, 72.882164, 109.64137, 210.15865, 8.227574, 124.97157, 10.770867]
2025-08-07 10:53:00,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [11.0, 15.0, 40.0, 51.0, 47.0, 98.0, 133.0, 12.0, 66.0, 14.0]
2025-08-07 10:53:00,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 29/100 (estimated time remaining: 1 hour, 56 minutes, 20 seconds)
2025-08-07 10:54:37,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:54:38,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 136.99844 ± 110.685
2025-08-07 10:54:38,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [13.609147, 11.709031, 130.86368, 8.147024, 93.75297, 200.07774, 214.94038, 223.08229, 370.41858, 103.38354]
2025-08-07 10:54:38,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [15.0, 18.0, 120.0, 13.0, 65.0, 147.0, 135.0, 116.0, 201.0, 61.0]
2025-08-07 10:54:38,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1226 [INFO]: New best (137.00) for latency MM1Queue_a033_s075
2025-08-07 10:54:38,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 30/100 (estimated time remaining: 1 hour, 54 minutes, 45 seconds)
2025-08-07 10:56:14,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:56:14,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 65.89030 ± 51.897
2025-08-07 10:56:14,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [10.821007, 10.240383, 84.48668, 9.635103, 74.19173, 129.93793, 163.09302, 10.689735, 78.00954, 87.797966]
2025-08-07 10:56:14,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [16.0, 16.0, 141.0, 12.0, 55.0, 100.0, 90.0, 13.0, 54.0, 60.0]
2025-08-07 10:56:14,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 53 minutes, 6 seconds)
2025-08-07 10:57:51,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:57:51,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 65.74548 ± 60.969
2025-08-07 10:57:51,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [130.80473, 59.668888, 15.743804, 169.47556, 16.660452, 157.00323, 9.858799, 73.64236, 10.553194, 14.04371]
2025-08-07 10:57:51,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [68.0, 43.0, 18.0, 106.0, 17.0, 112.0, 12.0, 44.0, 13.0, 19.0]
2025-08-07 10:57:51,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 51 minutes, 19 seconds)
2025-08-07 10:59:28,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 10:59:29,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 96.06284 ± 68.371
2025-08-07 10:59:29,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [10.422264, 190.21013, 152.46278, 201.30247, 71.68954, 124.68239, 85.49673, 103.74324, 10.432108, 10.186775]
2025-08-07 10:59:29,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [13.0, 134.0, 91.0, 125.0, 42.0, 83.0, 57.0, 79.0, 17.0, 13.0]
2025-08-07 10:59:29,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 50 minutes, 6 seconds)
2025-08-07 11:01:05,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:01:06,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 73.49213 ± 52.279
2025-08-07 11:01:06,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [8.6684475, 129.30893, 130.7032, 89.04637, 126.778496, 11.672025, 85.73615, 124.805534, 14.209042, 13.993108]
2025-08-07 11:01:06,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [12.0, 78.0, 89.0, 62.0, 74.0, 14.0, 50.0, 87.0, 16.0, 15.0]
2025-08-07 11:01:06,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 48 minutes, 28 seconds)
2025-08-07 11:02:42,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:02:42,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 103.92754 ± 58.949
2025-08-07 11:02:42,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [82.7274, 142.92023, 199.66, 156.67885, 9.289712, 13.43727, 152.64534, 75.671455, 87.08191, 119.16323]
2025-08-07 11:02:42,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [47.0, 70.0, 116.0, 96.0, 16.0, 16.0, 99.0, 54.0, 52.0, 96.0]
2025-08-07 11:02:42,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 46 minutes, 36 seconds)
2025-08-07 11:04:19,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:04:19,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 66.80669 ± 43.219
2025-08-07 11:04:19,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [126.56763, 126.06256, 7.222222, 12.13862, 62.472, 72.46886, 9.70696, 64.71239, 80.48318, 106.232414]
2025-08-07 11:04:19,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [78.0, 67.0, 13.0, 16.0, 39.0, 57.0, 14.0, 46.0, 55.0, 67.0]
2025-08-07 11:04:19,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 45 minutes, 3 seconds)
2025-08-07 11:05:56,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:05:57,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 108.55180 ± 86.710
2025-08-07 11:05:57,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [186.19629, 79.336266, 133.06902, 125.923775, 142.50244, 300.61795, 9.237642, 11.672521, 10.444473, 86.51773]
2025-08-07 11:05:57,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [111.0, 52.0, 78.0, 77.0, 134.0, 154.0, 11.0, 15.0, 14.0, 60.0]
2025-08-07 11:05:57,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 43 minutes, 39 seconds)
2025-08-07 11:07:32,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:07:33,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 120.88979 ± 82.185
2025-08-07 11:07:33,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [217.67834, 9.356817, 86.63756, 200.02396, 149.06741, 229.58391, 12.460818, 10.557189, 161.29759, 132.23442]
2025-08-07 11:07:33,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [120.0, 13.0, 55.0, 124.0, 92.0, 184.0, 16.0, 14.0, 97.0, 84.0]
2025-08-07 11:07:33,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 41 minutes, 39 seconds)
2025-08-07 11:09:10,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:09:11,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 113.89032 ± 108.294
2025-08-07 11:09:11,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [9.374268, 14.532784, 84.6802, 106.460785, 92.39265, 159.4509, 121.09843, 132.95035, 15.934953, 402.02795]
2025-08-07 11:09:11,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [15.0, 17.0, 63.0, 81.0, 55.0, 108.0, 67.0, 107.0, 16.0, 176.0]
2025-08-07 11:09:11,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 40 minutes, 11 seconds)
2025-08-07 11:10:47,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:10:48,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 93.44878 ± 96.390
2025-08-07 11:10:48,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [7.7895317, 195.57352, 8.805156, 13.78201, 136.96014, 19.449455, 80.62177, 68.383766, 324.22708, 78.89542]
2025-08-07 11:10:48,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [13.0, 93.0, 12.0, 16.0, 69.0, 23.0, 51.0, 41.0, 137.0, 62.0]
2025-08-07 11:10:48,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 38 minutes, 39 seconds)
2025-08-07 11:12:24,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:12:24,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 96.97253 ± 58.849
2025-08-07 11:12:24,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [11.5841255, 45.30645, 132.02927, 8.535984, 58.312355, 145.01427, 191.89287, 138.86964, 112.98017, 125.20028]
2025-08-07 11:12:24,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [14.0, 28.0, 70.0, 12.0, 42.0, 71.0, 92.0, 91.0, 89.0, 79.0]
2025-08-07 11:12:24,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 37 minutes, 2 seconds)
2025-08-07 11:14:00,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:14:01,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 102.97742 ± 73.678
2025-08-07 11:14:01,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [239.40765, 107.94731, 10.061758, 172.01707, 119.85703, 75.236176, 12.494932, 11.136805, 170.02988, 111.58539]
2025-08-07 11:14:01,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [115.0, 69.0, 16.0, 121.0, 78.0, 53.0, 16.0, 14.0, 122.0, 63.0]
2025-08-07 11:14:01,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 35 minutes, 13 seconds)
2025-08-07 11:15:38,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:15:38,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 55.92076 ± 95.142
2025-08-07 11:15:38,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [6.5250177, 9.090083, 12.048598, 221.56447, 6.8262734, 7.9125876, 268.4655, 7.1258807, 11.115038, 8.534208]
2025-08-07 11:15:38,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [11.0, 12.0, 17.0, 136.0, 13.0, 12.0, 131.0, 10.0, 12.0, 16.0]
2025-08-07 11:15:38,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 33 minutes, 44 seconds)
2025-08-07 11:17:12,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:17:13,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 97.12859 ± 68.796
2025-08-07 11:17:13,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [164.54066, 89.43986, 9.391001, 139.87253, 82.967476, 201.45221, 74.81129, 10.950297, 14.714205, 183.14635]
2025-08-07 11:17:13,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [126.0, 51.0, 12.0, 70.0, 53.0, 96.0, 47.0, 14.0, 16.0, 96.0]
2025-08-07 11:17:13,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 31 minutes, 38 seconds)
2025-08-07 11:18:48,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:18:49,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 125.44912 ± 78.049
2025-08-07 11:18:49,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [9.06216, 107.67693, 86.4584, 138.95064, 204.71107, 85.79903, 212.23915, 256.36816, 140.07388, 13.151678]
2025-08-07 11:18:49,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [12.0, 59.0, 63.0, 82.0, 117.0, 51.0, 150.0, 133.0, 88.0, 14.0]
2025-08-07 11:18:49,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 29 minutes, 53 seconds)
2025-08-07 11:20:25,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:20:25,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 75.97362 ± 84.717
2025-08-07 11:20:25,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [10.504665, 247.20686, 181.59456, 120.38106, 141.80421, 13.128498, 9.901554, 9.798346, 13.68889, 11.727514]
2025-08-07 11:20:25,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [18.0, 137.0, 116.0, 83.0, 87.0, 15.0, 14.0, 16.0, 17.0, 16.0]
2025-08-07 11:20:25,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 28 minutes, 7 seconds)
2025-08-07 11:22:00,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:22:00,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 109.33478 ± 61.036
2025-08-07 11:22:00,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [146.64197, 9.296507, 185.19754, 96.33532, 194.08868, 98.619, 11.879946, 102.00829, 160.78711, 88.49344]
2025-08-07 11:22:00,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [112.0, 17.0, 103.0, 58.0, 96.0, 83.0, 15.0, 75.0, 108.0, 72.0]
2025-08-07 11:22:01,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 26 minutes, 16 seconds)
2025-08-07 11:23:34,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:23:35,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 107.49693 ± 67.859
2025-08-07 11:23:35,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [182.78784, 44.784576, 115.411804, 13.562662, 182.30586, 140.25333, 95.6153, 13.750504, 211.82266, 74.67484]
2025-08-07 11:23:35,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [88.0, 27.0, 82.0, 17.0, 96.0, 79.0, 72.0, 15.0, 127.0, 47.0]
2025-08-07 11:23:35,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 24 minutes, 16 seconds)
2025-08-07 11:25:10,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:25:11,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 93.43783 ± 86.214
2025-08-07 11:25:11,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [9.954663, 8.057778, 82.92378, 263.45886, 73.589195, 99.00694, 177.83322, 13.701397, 10.823058, 195.02945]
2025-08-07 11:25:11,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [14.0, 10.0, 56.0, 126.0, 63.0, 54.0, 81.0, 17.0, 15.0, 119.0]
2025-08-07 11:25:11,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 22 minutes, 50 seconds)
2025-08-07 11:26:45,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:26:46,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 64.10307 ± 49.946
2025-08-07 11:26:46,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [11.481247, 158.34682, 102.61051, 5.6149526, 93.8197, 95.61065, 11.229012, 62.03671, 9.360686, 90.92034]
2025-08-07 11:26:46,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [15.0, 85.0, 67.0, 12.0, 87.0, 65.0, 13.0, 55.0, 13.0, 57.0]
2025-08-07 11:26:46,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 21 minutes, 2 seconds)
2025-08-07 11:28:19,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:28:20,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 103.07253 ± 74.988
2025-08-07 11:28:20,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [15.27781, 144.55809, 109.48693, 248.76361, 174.25603, 122.824585, 12.417149, 136.27205, 10.419174, 56.44982]
2025-08-07 11:28:20,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [16.0, 112.0, 77.0, 164.0, 138.0, 93.0, 15.0, 68.0, 16.0, 35.0]
2025-08-07 11:28:20,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 19 minutes, 11 seconds)
2025-08-07 11:29:55,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:29:56,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 79.81866 ± 76.051
2025-08-07 11:29:56,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [270.35712, 69.565384, 11.392696, 11.692213, 9.245228, 91.5867, 87.297066, 112.645004, 119.80109, 14.604127]
2025-08-07 11:29:56,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [141.0, 41.0, 13.0, 16.0, 12.0, 52.0, 52.0, 62.0, 73.0, 16.0]
2025-08-07 11:29:56,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 17 minutes, 37 seconds)
2025-08-07 11:31:30,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:31:31,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 141.88553 ± 94.104
2025-08-07 11:31:31,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [8.620586, 148.09329, 217.22443, 352.4051, 227.02081, 87.14709, 119.75536, 68.63888, 83.495766, 106.453964]
2025-08-07 11:31:31,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [12.0, 100.0, 129.0, 187.0, 106.0, 59.0, 101.0, 52.0, 52.0, 57.0]
2025-08-07 11:31:31,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1226 [INFO]: New best (141.89) for latency MM1Queue_a033_s075
2025-08-07 11:31:31,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 16 minutes, 7 seconds)
2025-08-07 11:33:05,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:33:06,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 80.45450 ± 75.211
2025-08-07 11:33:06,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [11.418896, 10.414626, 214.67375, 15.664637, 88.534325, 12.976604, 180.00491, 134.43625, 9.301163, 127.11983]
2025-08-07 11:33:06,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [16.0, 15.0, 128.0, 17.0, 51.0, 18.0, 109.0, 114.0, 12.0, 87.0]
2025-08-07 11:33:06,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 14 minutes, 20 seconds)
2025-08-07 11:34:40,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:34:41,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 150.36606 ± 59.028
2025-08-07 11:34:41,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [156.28027, 138.86043, 71.453094, 212.63658, 103.78354, 176.64209, 267.40915, 133.39711, 176.68095, 66.51737]
2025-08-07 11:34:41,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [93.0, 79.0, 59.0, 103.0, 80.0, 112.0, 135.0, 67.0, 83.0, 42.0]
2025-08-07 11:34:41,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1226 [INFO]: New best (150.37) for latency MM1Queue_a033_s075
2025-08-07 11:34:41,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 12 minutes, 50 seconds)
2025-08-07 11:36:15,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:36:15,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 75.57082 ± 59.671
2025-08-07 11:36:15,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [7.9699917, 82.17193, 8.893382, 80.79829, 172.66727, 155.67767, 13.305063, 106.61959, 11.090908, 116.51407]
2025-08-07 11:36:15,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [13.0, 46.0, 13.0, 75.0, 126.0, 123.0, 15.0, 60.0, 16.0, 80.0]
2025-08-07 11:36:16,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 11 minutes, 16 seconds)
2025-08-07 11:37:49,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:37:50,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 83.85202 ± 89.509
2025-08-07 11:37:50,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [12.746133, 77.85584, 13.563789, 122.48439, 9.806119, 130.3722, 10.651528, 86.02246, 318.0731, 56.944656]
2025-08-07 11:37:50,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 49.0, 15.0, 87.0, 14.0, 96.0, 15.0, 54.0, 147.0, 46.0]
2025-08-07 11:37:50,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 9 minutes, 29 seconds)
2025-08-07 11:39:25,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:39:25,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 85.17833 ± 42.000
2025-08-07 11:39:25,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [95.836655, 81.64598, 137.18947, 132.6514, 12.290672, 106.04408, 95.60894, 64.60904, 11.535057, 114.37201]
2025-08-07 11:39:25,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [63.0, 80.0, 68.0, 110.0, 14.0, 63.0, 69.0, 44.0, 14.0, 76.0]
2025-08-07 11:39:25,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 8 minutes)
2025-08-07 11:40:59,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:41:00,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 121.48645 ± 59.448
2025-08-07 11:41:00,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [12.376163, 114.031654, 14.483934, 185.04155, 113.76983, 144.22098, 193.96747, 151.6832, 131.38054, 153.90912]
2025-08-07 11:41:00,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [18.0, 61.0, 15.0, 110.0, 91.0, 87.0, 113.0, 77.0, 83.0, 117.0]
2025-08-07 11:41:00,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 6 minutes, 22 seconds)
2025-08-07 11:42:34,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:42:35,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 87.99419 ± 72.358
2025-08-07 11:42:35,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [169.57283, 11.731684, 200.70576, 143.29771, 10.309657, 134.65373, 138.45439, 8.106206, 49.050198, 14.059692]
2025-08-07 11:42:35,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [111.0, 16.0, 122.0, 78.0, 15.0, 88.0, 98.0, 10.0, 30.0, 15.0]
2025-08-07 11:42:35,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 4 minutes, 45 seconds)
2025-08-07 11:44:09,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:44:10,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 137.85208 ± 87.827
2025-08-07 11:44:10,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [187.33607, 167.36409, 7.5097747, 180.09317, 12.665386, 319.3265, 122.36259, 88.90635, 103.74045, 189.21642]
2025-08-07 11:44:10,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [140.0, 102.0, 14.0, 97.0, 14.0, 128.0, 71.0, 68.0, 72.0, 115.0]
2025-08-07 11:44:10,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 3 minutes, 18 seconds)
2025-08-07 11:45:44,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:45:45,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 97.45769 ± 85.501
2025-08-07 11:45:45,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [209.91689, 184.09393, 211.29192, 8.895187, 83.5524, 58.17629, 186.24318, 11.194022, 9.593107, 11.620045]
2025-08-07 11:45:45,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [105.0, 108.0, 204.0, 13.0, 49.0, 35.0, 113.0, 15.0, 16.0, 14.0]
2025-08-07 11:45:45,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 1 minute, 49 seconds)
2025-08-07 11:47:18,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:47:19,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 95.64771 ± 74.300
2025-08-07 11:47:19,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [8.748625, 106.050964, 181.44638, 14.691842, 178.80196, 135.55705, 96.38213, 14.941497, 205.32553, 14.531004]
2025-08-07 11:47:19,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [16.0, 75.0, 105.0, 17.0, 116.0, 74.0, 52.0, 16.0, 132.0, 14.0]
2025-08-07 11:47:19,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 63/100 (estimated time remaining: 59 minutes, 57 seconds)
2025-08-07 11:48:53,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:48:54,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 123.36748 ± 79.603
2025-08-07 11:48:54,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [78.89105, 95.166016, 123.13852, 264.50775, 14.842511, 168.07678, 10.45848, 234.32182, 91.85312, 152.41872]
2025-08-07 11:48:54,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [49.0, 94.0, 72.0, 139.0, 16.0, 86.0, 14.0, 138.0, 56.0, 86.0]
2025-08-07 11:48:54,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 64/100 (estimated time remaining: 58 minutes, 28 seconds)
2025-08-07 11:50:28,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:50:29,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 156.89499 ± 138.997
2025-08-07 11:50:29,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [423.7505, 381.55853, 11.715946, 110.348526, 17.23156, 15.006737, 151.62173, 94.31034, 140.1155, 223.2907]
2025-08-07 11:50:29,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [205.0, 146.0, 14.0, 61.0, 17.0, 16.0, 120.0, 76.0, 114.0, 139.0]
2025-08-07 11:50:29,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1226 [INFO]: New best (156.89) for latency MM1Queue_a033_s075
2025-08-07 11:50:29,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 65/100 (estimated time remaining: 56 minutes, 55 seconds)
2025-08-07 11:52:03,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:52:04,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 138.31097 ± 141.806
2025-08-07 11:52:04,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [233.13387, 11.601717, 9.608922, 164.72748, 9.449757, 202.43909, 381.86255, 351.5819, 9.572296, 9.132189]
2025-08-07 11:52:04,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [101.0, 16.0, 15.0, 103.0, 12.0, 111.0, 167.0, 190.0, 12.0, 16.0]
2025-08-07 11:52:04,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 66/100 (estimated time remaining: 55 minutes, 17 seconds)
2025-08-07 11:53:38,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:53:38,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 88.03158 ± 58.556
2025-08-07 11:53:38,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [64.53878, 68.18783, 118.29775, 13.575591, 140.80746, 14.418146, 9.409698, 163.38005, 126.694786, 161.0057]
2025-08-07 11:53:38,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [40.0, 40.0, 63.0, 15.0, 78.0, 18.0, 11.0, 131.0, 90.0, 85.0]
2025-08-07 11:53:38,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 67/100 (estimated time remaining: 53 minutes, 36 seconds)
2025-08-07 11:55:12,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:55:13,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 155.46053 ± 78.552
2025-08-07 11:55:13,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [239.62292, 253.03899, 143.54675, 174.88466, 251.78134, 174.45396, 81.894905, 61.27594, 162.21999, 11.885783]
2025-08-07 11:55:13,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [107.0, 151.0, 88.0, 90.0, 105.0, 80.0, 65.0, 46.0, 90.0, 18.0]
2025-08-07 11:55:13,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 68/100 (estimated time remaining: 52 minutes, 12 seconds)
2025-08-07 11:56:48,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:56:49,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 97.22449 ± 92.544
2025-08-07 11:56:49,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [74.40806, 135.70647, 95.46135, 15.479645, 93.08823, 7.889794, 10.108335, 270.11578, 254.6297, 15.35748]
2025-08-07 11:56:49,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [61.0, 91.0, 81.0, 17.0, 75.0, 13.0, 14.0, 112.0, 127.0, 17.0]
2025-08-07 11:56:49,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 69/100 (estimated time remaining: 50 minutes, 41 seconds)
2025-08-07 11:58:22,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:58:23,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 138.53220 ± 112.897
2025-08-07 11:58:23,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [8.96058, 360.49457, 196.85625, 183.43147, 99.12435, 12.93934, 11.056707, 123.242004, 283.72397, 105.492645]
2025-08-07 11:58:23,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [13.0, 137.0, 87.0, 123.0, 64.0, 17.0, 15.0, 82.0, 155.0, 80.0]
2025-08-07 11:58:23,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 70/100 (estimated time remaining: 48 minutes, 58 seconds)
2025-08-07 11:59:57,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 11:59:58,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 102.78770 ± 102.498
2025-08-07 11:59:58,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [8.017516, 20.759003, 139.11945, 131.54955, 357.01547, 14.617364, 177.58762, 11.275832, 91.60738, 76.32797]
2025-08-07 11:59:58,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [14.0, 19.0, 89.0, 100.0, 136.0, 16.0, 97.0, 15.0, 68.0, 45.0]
2025-08-07 11:59:58,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 71/100 (estimated time remaining: 47 minutes, 21 seconds)
2025-08-07 12:01:32,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:01:33,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 136.99449 ± 53.899
2025-08-07 12:01:33,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [166.33078, 131.62349, 163.74443, 217.03775, 168.45587, 161.75764, 129.20023, 11.676551, 143.37184, 76.74645]
2025-08-07 12:01:33,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [137.0, 69.0, 125.0, 92.0, 101.0, 98.0, 72.0, 16.0, 96.0, 54.0]
2025-08-07 12:01:33,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 72/100 (estimated time remaining: 45 minutes, 52 seconds)
2025-08-07 12:03:07,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:03:08,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 119.45748 ± 66.899
2025-08-07 12:03:08,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [93.885925, 107.074104, 75.31437, 83.379814, 16.890753, 197.77626, 87.97294, 151.40248, 267.75552, 113.12263]
2025-08-07 12:03:08,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [80.0, 86.0, 53.0, 50.0, 18.0, 94.0, 97.0, 79.0, 165.0, 81.0]
2025-08-07 12:03:08,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 73/100 (estimated time remaining: 44 minutes, 17 seconds)
2025-08-07 12:04:43,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:04:43,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 103.13236 ± 127.213
2025-08-07 12:04:43,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [6.916902, 146.81345, 85.80915, 127.502205, 75.85654, 7.8703957, 454.92892, 99.22107, 12.162948, 14.242]
2025-08-07 12:04:43,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [13.0, 73.0, 55.0, 82.0, 50.0, 10.0, 189.0, 73.0, 13.0, 18.0]
2025-08-07 12:04:43,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 74/100 (estimated time remaining: 42 minutes, 40 seconds)
2025-08-07 12:06:17,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:06:17,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 78.77405 ± 62.457
2025-08-07 12:06:17,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [109.68571, 96.46949, 121.81683, 113.606255, 8.832076, 9.360522, 9.882504, 201.34422, 11.667325, 105.075584]
2025-08-07 12:06:17,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [61.0, 57.0, 102.0, 70.0, 15.0, 12.0, 12.0, 129.0, 14.0, 84.0]
2025-08-07 12:06:17,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 75/100 (estimated time remaining: 41 minutes, 6 seconds)
2025-08-07 12:07:52,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:07:53,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 115.63245 ± 81.945
2025-08-07 12:07:53,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [180.56812, 110.13172, 9.548061, 129.04613, 132.8816, 162.82466, 9.005514, 275.6939, 8.607179, 138.01764]
2025-08-07 12:07:53,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [108.0, 65.0, 12.0, 78.0, 111.0, 121.0, 12.0, 110.0, 16.0, 110.0]
2025-08-07 12:07:53,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 76/100 (estimated time remaining: 39 minutes, 33 seconds)
2025-08-07 12:09:28,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:09:28,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 93.56879 ± 106.428
2025-08-07 12:09:28,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [7.000775, 153.71263, 363.26166, 16.125303, 12.934368, 86.42445, 145.48946, 125.67877, 15.016506, 10.044065]
2025-08-07 12:09:28,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [10.0, 113.0, 168.0, 16.0, 17.0, 63.0, 72.0, 77.0, 16.0, 14.0]
2025-08-07 12:09:28,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 77/100 (estimated time remaining: 38 minutes, 2 seconds)
2025-08-07 12:11:02,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:11:03,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 111.74387 ± 98.595
2025-08-07 12:11:03,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [102.67975, 103.28446, 180.29262, 10.512872, 10.92245, 283.69434, 249.57611, 7.3718743, 10.362421, 158.74179]
2025-08-07 12:11:03,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [78.0, 60.0, 128.0, 13.0, 15.0, 118.0, 146.0, 13.0, 14.0, 113.0]
2025-08-07 12:11:03,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 78/100 (estimated time remaining: 36 minutes, 26 seconds)
2025-08-07 12:12:37,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:12:37,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 98.06760 ± 76.129
2025-08-07 12:12:37,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [9.59502, 214.07843, 8.97009, 6.9791603, 177.90883, 13.025237, 151.68779, 117.9699, 141.0349, 139.42664]
2025-08-07 12:12:37,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [12.0, 144.0, 16.0, 11.0, 85.0, 17.0, 79.0, 80.0, 70.0, 86.0]
2025-08-07 12:12:37,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 79/100 (estimated time remaining: 34 minutes, 46 seconds)
2025-08-07 12:14:12,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:14:12,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 63.34705 ± 56.817
2025-08-07 12:14:12,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [141.63852, 95.2461, 72.60635, 12.860251, 9.412304, 128.86462, 9.660534, 144.49034, 9.32541, 9.365967]
2025-08-07 12:14:12,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [85.0, 54.0, 42.0, 18.0, 16.0, 82.0, 15.0, 84.0, 11.0, 12.0]
2025-08-07 12:14:12,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 80/100 (estimated time remaining: 33 minutes, 13 seconds)
2025-08-07 12:15:48,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:15:48,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 100.86934 ± 87.935
2025-08-07 12:15:48,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [126.62951, 8.014732, 164.5008, 253.6842, 6.7507005, 154.44514, 10.084834, 10.063454, 207.16751, 67.352615]
2025-08-07 12:15:48,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [80.0, 10.0, 77.0, 125.0, 12.0, 112.0, 14.0, 12.0, 127.0, 40.0]
2025-08-07 12:15:48,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 81/100 (estimated time remaining: 31 minutes, 43 seconds)
2025-08-07 12:17:22,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:17:22,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 96.44374 ± 97.543
2025-08-07 12:17:22,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [9.033056, 152.02417, 10.095121, 336.35132, 152.61858, 9.511592, 80.13924, 90.522224, 116.97125, 7.170919]
2025-08-07 12:17:22,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [11.0, 106.0, 14.0, 144.0, 97.0, 15.0, 47.0, 51.0, 104.0, 11.0]
2025-08-07 12:17:22,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 82/100 (estimated time remaining: 30 minutes, 1 second)
2025-08-07 12:18:56,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:18:57,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 107.67149 ± 73.478
2025-08-07 12:18:57,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [134.01935, 14.32439, 216.8061, 12.128521, 183.05476, 188.73761, 11.974192, 86.87369, 142.75575, 86.04057]
2025-08-07 12:18:57,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [79.0, 17.0, 105.0, 17.0, 103.0, 121.0, 14.0, 61.0, 74.0, 54.0]
2025-08-07 12:18:57,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 83/100 (estimated time remaining: 28 minutes, 25 seconds)
2025-08-07 12:20:31,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:20:32,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 98.67131 ± 69.035
2025-08-07 12:20:32,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [201.8194, 55.982285, 113.36734, 128.89233, 130.04097, 144.45699, 183.33777, 6.6630063, 13.002367, 9.150637]
2025-08-07 12:20:32,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [103.0, 50.0, 100.0, 82.0, 110.0, 99.0, 102.0, 11.0, 16.0, 11.0]
2025-08-07 12:20:32,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 84/100 (estimated time remaining: 26 minutes, 53 seconds)
2025-08-07 12:22:06,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:22:07,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 97.02995 ± 78.521
2025-08-07 12:22:07,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [99.95188, 9.0955, 146.89087, 11.473493, 135.81122, 110.806335, 10.923391, 229.8326, 200.31587, 15.198432]
2025-08-07 12:22:07,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [73.0, 11.0, 122.0, 14.0, 67.0, 78.0, 14.0, 135.0, 121.0, 17.0]
2025-08-07 12:22:07,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 85/100 (estimated time remaining: 25 minutes, 19 seconds)
2025-08-07 12:23:42,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:23:43,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 122.02441 ± 102.774
2025-08-07 12:23:43,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [105.14941, 15.243184, 13.434797, 150.873, 394.98505, 70.584175, 110.67347, 170.30489, 84.853226, 104.14286]
2025-08-07 12:23:43,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [100.0, 17.0, 15.0, 75.0, 155.0, 61.0, 82.0, 79.0, 63.0, 72.0]
2025-08-07 12:23:43,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 86/100 (estimated time remaining: 23 minutes, 44 seconds)
2025-08-07 12:25:17,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:25:18,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 141.88600 ± 103.715
2025-08-07 12:25:18,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [287.92502, 218.79344, 11.858804, 10.0624075, 177.86366, 125.9545, 278.21338, 84.34762, 213.90302, 9.9381]
2025-08-07 12:25:18,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [147.0, 119.0, 16.0, 14.0, 114.0, 74.0, 133.0, 47.0, 119.0, 14.0]
2025-08-07 12:25:18,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 87/100 (estimated time remaining: 22 minutes, 11 seconds)
2025-08-07 12:26:53,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:26:53,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 81.36377 ± 60.489
2025-08-07 12:26:53,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [13.954795, 178.13615, 9.214674, 97.59835, 34.24519, 169.85275, 76.70365, 111.15695, 113.60748, 9.167646]
2025-08-07 12:26:53,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [15.0, 112.0, 13.0, 62.0, 29.0, 88.0, 86.0, 93.0, 77.0, 14.0]
2025-08-07 12:26:53,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 88/100 (estimated time remaining: 20 minutes, 38 seconds)
2025-08-07 12:28:27,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:28:28,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 106.28442 ± 61.644
2025-08-07 12:28:28,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [197.32103, 110.04807, 101.09613, 124.441216, 105.44716, 210.41191, 104.20785, 11.195499, 10.824447, 87.85078]
2025-08-07 12:28:28,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [152.0, 104.0, 55.0, 92.0, 60.0, 97.0, 80.0, 15.0, 20.0, 53.0]
2025-08-07 12:28:28,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 89/100 (estimated time remaining: 19 minutes, 2 seconds)
2025-08-07 12:30:03,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:30:04,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 112.18297 ± 104.883
2025-08-07 12:30:04,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [143.7949, 217.8099, 9.699861, 16.1451, 8.041949, 225.67381, 10.480134, 212.32132, 11.270184, 266.5925]
2025-08-07 12:30:04,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [79.0, 139.0, 14.0, 17.0, 11.0, 97.0, 17.0, 111.0, 15.0, 125.0]
2025-08-07 12:30:04,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 90/100 (estimated time remaining: 17 minutes, 29 seconds)
2025-08-07 12:31:37,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:31:38,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 117.57397 ± 75.178
2025-08-07 12:31:38,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [254.87543, 177.7413, 22.374205, 107.37704, 86.44006, 153.14102, 203.55447, 100.49251, 60.17738, 9.566329]
2025-08-07 12:31:38,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [186.0, 103.0, 22.0, 76.0, 64.0, 88.0, 89.0, 67.0, 40.0, 11.0]
2025-08-07 12:31:38,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 91/100 (estimated time remaining: 15 minutes, 49 seconds)
2025-08-07 12:33:10,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:33:11,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 196.95969 ± 105.226
2025-08-07 12:33:11,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [86.249695, 307.90814, 90.57934, 363.40497, 162.55516, 246.7813, 178.53479, 298.23636, 218.42953, 16.917454]
2025-08-07 12:33:11,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [54.0, 156.0, 51.0, 199.0, 93.0, 145.0, 83.0, 126.0, 108.0, 17.0]
2025-08-07 12:33:11,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1226 [INFO]: New best (196.96) for latency MM1Queue_a033_s075
2025-08-07 12:33:11,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 92/100 (estimated time remaining: 14 minutes, 12 seconds)
2025-08-07 12:34:45,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:34:46,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 121.64864 ± 95.947
2025-08-07 12:34:46,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [228.63414, 7.1053605, 18.54036, 120.28435, 17.330938, 239.68042, 9.401922, 244.94257, 180.07227, 150.49416]
2025-08-07 12:34:46,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [119.0, 11.0, 21.0, 70.0, 18.0, 109.0, 12.0, 135.0, 105.0, 98.0]
2025-08-07 12:34:46,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 93/100 (estimated time remaining: 12 minutes, 35 seconds)
2025-08-07 12:36:18,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:36:19,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 100.16431 ± 65.187
2025-08-07 12:36:19,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [13.6518135, 14.653593, 97.97396, 118.20691, 237.4225, 158.271, 149.54881, 64.634125, 78.21135, 69.069]
2025-08-07 12:36:19,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [17.0, 18.0, 85.0, 73.0, 132.0, 82.0, 72.0, 46.0, 46.0, 44.0]
2025-08-07 12:36:19,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 94/100 (estimated time remaining: 10 minutes, 59 seconds)
2025-08-07 12:37:51,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:37:52,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 137.09628 ± 84.741
2025-08-07 12:37:52,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [129.77756, 205.96396, 11.857, 207.76657, 70.89278, 208.13501, 85.44374, 270.0426, 168.73367, 12.350014]
2025-08-07 12:37:52,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [99.0, 105.0, 13.0, 144.0, 84.0, 126.0, 55.0, 187.0, 100.0, 14.0]
2025-08-07 12:37:52,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 95/100 (estimated time remaining: 9 minutes, 21 seconds)
2025-08-07 12:39:24,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:39:25,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 178.43008 ± 125.051
2025-08-07 12:39:25,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [234.69786, 505.37073, 230.37903, 58.435467, 195.93059, 154.22357, 72.71471, 75.80418, 156.77023, 99.974365]
2025-08-07 12:39:25,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [122.0, 166.0, 129.0, 45.0, 113.0, 74.0, 44.0, 46.0, 82.0, 77.0]
2025-08-07 12:39:25,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 96/100 (estimated time remaining: 7 minutes, 47 seconds)
2025-08-07 12:40:58,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:40:58,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 107.48210 ± 148.564
2025-08-07 12:40:58,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [9.119647, 10.456251, 374.55905, 6.1220527, 9.185209, 110.35421, 10.216071, 418.81506, 62.373924, 63.619602]
2025-08-07 12:40:58,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [12.0, 12.0, 163.0, 8.0, 12.0, 60.0, 12.0, 151.0, 59.0, 46.0]
2025-08-07 12:40:58,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 13 seconds)
2025-08-07 12:42:30,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:42:31,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 71.87675 ± 64.712
2025-08-07 12:42:31,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [98.61503, 10.114174, 159.14587, 122.66523, 185.92416, 19.267607, 8.521153, 11.832551, 89.51364, 13.168161]
2025-08-07 12:42:31,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [69.0, 13.0, 108.0, 103.0, 120.0, 18.0, 13.0, 15.0, 66.0, 16.0]
2025-08-07 12:42:31,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 98/100 (estimated time remaining: 4 minutes, 39 seconds)
2025-08-07 12:44:03,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:44:04,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 139.62442 ± 110.314
2025-08-07 12:44:04,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [104.44904, 54.06539, 74.674126, 14.032082, 233.51454, 173.58145, 127.35598, 382.56256, 9.02598, 222.98312]
2025-08-07 12:44:04,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [63.0, 32.0, 64.0, 16.0, 119.0, 107.0, 99.0, 134.0, 12.0, 122.0]
2025-08-07 12:44:04,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 6 seconds)
2025-08-07 12:45:37,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:45:37,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 70.10331 ± 74.023
2025-08-07 12:45:37,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [199.29402, 14.783012, 72.390274, 61.819523, 13.158709, 214.54071, 10.778367, 11.418161, 11.40524, 91.44512]
2025-08-07 12:45:37,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [86.0, 17.0, 56.0, 38.0, 14.0, 130.0, 13.0, 13.0, 17.0, 56.0]
2025-08-07 12:45:37,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 33 seconds)
2025-08-07 12:47:10,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency MM1Queue_a033_s075...
2025-08-07 12:47:10,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 122.99628 ± 134.254
2025-08-07 12:47:10,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [8.445042, 127.75646, 358.15274, 9.975683, 113.512566, 12.031534, 389.3465, 70.60428, 9.528428, 130.6095]
2025-08-07 12:47:10,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [13.0, 77.0, 175.0, 19.0, 63.0, 14.0, 165.0, 42.0, 20.0, 85.0]
2025-08-07 12:47:10,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1251 [DEBUG]: Training session finished
