2025-08-07 00:48:26,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc15-hopper/ExtremeSparseL4U32-bpql-mem32
2025-08-07 00:48:26,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc15-hopper/ExtremeSparseL4U32-bpql-mem32
2025-08-07 00:48:26,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x151f48f47e90>}
2025-08-07 00:48:26,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1111 [DEBUG]: using device: cuda
2025-08-07 00:48:26,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1133 [INFO]: Creating new trainer
2025-08-07 00:48:26,031 baseline-bpql-noiseperc15-hopper:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=107, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-08-07 00:48:26,032 baseline-bpql-noiseperc15-hopper:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 00:48:27,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1194 [DEBUG]: Starting training session...
2025-08-07 00:48:27,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 1/100
2025-08-07 00:49:58,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:49:59,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 35.97508 ± 20.512
2025-08-07 00:49:59,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [70.785995, 10.424966, 46.098404, 57.134586, 20.609547, 11.002619, 30.167175, 53.306847, 46.388325, 13.832358]
2025-08-07 00:49:59,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [73.0, 14.0, 52.0, 52.0, 20.0, 13.0, 47.0, 68.0, 49.0, 28.0]
2025-08-07 00:49:59,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1226 [INFO]: New best (35.98) for latency ExtremeSparseL4U32
2025-08-07 00:49:59,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 31 minutes, 28 seconds)
2025-08-07 00:51:36,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:51:37,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 50.73653 ± 58.525
2025-08-07 00:51:37,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [207.03339, 88.782936, 16.94556, 20.167973, 13.705247, 82.66672, 18.203451, 20.821434, 17.840616, 21.19806]
2025-08-07 00:51:37,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [120.0, 52.0, 27.0, 21.0, 20.0, 53.0, 21.0, 24.0, 19.0, 27.0]
2025-08-07 00:51:37,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1226 [INFO]: New best (50.74) for latency ExtremeSparseL4U32
2025-08-07 00:51:37,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 34 minutes, 57 seconds)
2025-08-07 00:53:15,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:53:16,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 45.22831 ± 31.840
2025-08-07 00:53:16,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [89.92426, 14.5469, 79.69328, 16.992977, 71.959694, 29.64097, 15.108451, 31.753017, 12.00674, 90.65679]
2025-08-07 00:53:16,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [74.0, 16.0, 61.0, 18.0, 62.0, 31.0, 18.0, 29.0, 24.0, 67.0]
2025-08-07 00:53:16,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 35 minutes, 29 seconds)
2025-08-07 00:54:53,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:54:54,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 28.03155 ± 12.349
2025-08-07 00:54:54,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [40.01252, 13.162792, 48.463184, 29.515299, 20.188894, 20.588041, 45.90712, 16.928026, 14.843792, 30.705854]
2025-08-07 00:54:54,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [57.0, 17.0, 50.0, 28.0, 24.0, 22.0, 52.0, 28.0, 17.0, 32.0]
2025-08-07 00:54:54,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 34 minutes, 45 seconds)
2025-08-07 00:56:32,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:56:32,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 29.68550 ± 17.830
2025-08-07 00:56:32,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [48.396248, 13.54263, 30.367588, 28.995205, 73.56611, 14.766211, 26.47387, 10.690763, 23.470346, 26.586052]
2025-08-07 00:56:32,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [66.0, 20.0, 28.0, 29.0, 50.0, 18.0, 25.0, 13.0, 25.0, 24.0]
2025-08-07 00:56:32,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 33 minutes, 39 seconds)
2025-08-07 00:58:10,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:58:11,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 43.04606 ± 32.727
2025-08-07 00:58:11,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [85.38121, 21.086079, 13.046287, 20.317188, 26.894217, 26.114393, 17.48837, 86.15238, 104.498985, 29.481464]
2025-08-07 00:58:11,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [66.0, 31.0, 18.0, 23.0, 29.0, 25.0, 21.0, 80.0, 79.0, 39.0]
2025-08-07 00:58:11,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 34 minutes, 10 seconds)
2025-08-07 00:59:49,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:59:50,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 38.57499 ± 24.271
2025-08-07 00:59:50,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [17.525415, 61.25898, 32.753006, 26.47922, 76.24377, 18.443007, 21.988655, 84.0585, 31.565262, 15.4341545]
2025-08-07 00:59:50,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [23.0, 56.0, 32.0, 24.0, 66.0, 27.0, 26.0, 65.0, 48.0, 18.0]
2025-08-07 00:59:50,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 32 minutes, 48 seconds)
2025-08-07 01:01:28,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:01:29,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 39.24105 ± 40.243
2025-08-07 01:01:29,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [18.534266, 34.17239, 21.15765, 19.958332, 15.928633, 18.663248, 20.110249, 57.851036, 154.5736, 31.461054]
2025-08-07 01:01:29,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [30.0, 31.0, 27.0, 23.0, 17.0, 29.0, 25.0, 100.0, 90.0, 27.0]
2025-08-07 01:01:29,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 31 minutes, 12 seconds)
2025-08-07 01:03:07,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:03:08,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 41.86943 ± 32.438
2025-08-07 01:03:08,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [74.832085, 64.967896, 19.668673, 119.96407, 21.78814, 15.222383, 18.915113, 33.396015, 28.679232, 21.26064]
2025-08-07 01:03:08,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [59.0, 51.0, 20.0, 83.0, 33.0, 18.0, 19.0, 32.0, 27.0, 29.0]
2025-08-07 01:03:08,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 29 minutes, 45 seconds)
2025-08-07 01:04:46,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:04:47,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 48.08096 ± 38.849
2025-08-07 01:04:47,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [26.535088, 17.957518, 88.372635, 98.93285, 29.505377, 33.457123, 13.336905, 28.229626, 16.637568, 127.84492]
2025-08-07 01:04:47,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [26.0, 25.0, 68.0, 88.0, 31.0, 28.0, 17.0, 31.0, 21.0, 100.0]
2025-08-07 01:04:47,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 28 minutes, 20 seconds)
2025-08-07 01:06:25,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:06:26,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 60.52308 ± 47.684
2025-08-07 01:06:26,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [37.451508, 31.147621, 46.194027, 128.34224, 31.274788, 160.99791, 13.73966, 48.30757, 93.642784, 14.132634]
2025-08-07 01:06:26,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [32.0, 30.0, 45.0, 99.0, 32.0, 116.0, 17.0, 53.0, 78.0, 16.0]
2025-08-07 01:06:26,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1226 [INFO]: New best (60.52) for latency ExtremeSparseL4U32
2025-08-07 01:06:26,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 26 minutes, 45 seconds)
2025-08-07 01:08:04,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:08:05,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 39.36003 ± 38.307
2025-08-07 01:08:05,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [44.3971, 21.159756, 52.867146, 25.62285, 19.619728, 23.805056, 22.995356, 9.653532, 24.791592, 148.68816]
2025-08-07 01:08:05,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [54.0, 22.0, 62.0, 33.0, 32.0, 31.0, 26.0, 13.0, 23.0, 88.0]
2025-08-07 01:08:05,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 25 minutes, 7 seconds)
2025-08-07 01:09:43,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:09:44,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 60.13064 ± 45.788
2025-08-07 01:09:44,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [150.13441, 20.770384, 11.934685, 55.84445, 50.74227, 17.426907, 18.489346, 53.951363, 103.46928, 118.54329]
2025-08-07 01:09:44,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [85.0, 24.0, 15.0, 64.0, 63.0, 27.0, 24.0, 58.0, 80.0, 90.0]
2025-08-07 01:09:44,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 23 minutes, 41 seconds)
2025-08-07 01:11:22,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:11:22,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 31.00135 ± 20.769
2025-08-07 01:11:22,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [22.707314, 72.56501, 30.254969, 13.5789, 17.195803, 12.149364, 22.186306, 63.17705, 44.49014, 11.708632]
2025-08-07 01:11:22,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [29.0, 70.0, 31.0, 20.0, 19.0, 19.0, 21.0, 39.0, 60.0, 14.0]
2025-08-07 01:11:22,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 21 minutes, 50 seconds)
2025-08-07 01:13:00,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:13:01,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 45.36068 ± 34.949
2025-08-07 01:13:01,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [19.866869, 12.312908, 97.683754, 14.908697, 56.626743, 22.545464, 18.333967, 75.85478, 109.55851, 25.915066]
2025-08-07 01:13:01,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [26.0, 17.0, 84.0, 23.0, 60.0, 27.0, 18.0, 59.0, 96.0, 25.0]
2025-08-07 01:13:01,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 20 minutes, 1 second)
2025-08-07 01:14:40,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:14:40,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 62.63998 ± 49.835
2025-08-07 01:14:40,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [82.59558, 14.264213, 15.061672, 103.405624, 13.363482, 125.16479, 18.06055, 136.99829, 105.037895, 12.447757]
2025-08-07 01:14:40,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [49.0, 18.0, 22.0, 91.0, 18.0, 75.0, 18.0, 85.0, 74.0, 20.0]
2025-08-07 01:14:40,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1226 [INFO]: New best (62.64) for latency ExtremeSparseL4U32
2025-08-07 01:14:40,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 18 minutes, 30 seconds)
2025-08-07 01:16:19,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:16:19,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 48.81145 ± 39.932
2025-08-07 01:16:19,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [19.215591, 28.363943, 98.72564, 19.037718, 31.489923, 23.286577, 23.926807, 130.71046, 18.785435, 94.57241]
2025-08-07 01:16:19,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [25.0, 63.0, 71.0, 24.0, 27.0, 29.0, 32.0, 85.0, 25.0, 53.0]
2025-08-07 01:16:19,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 16 minutes, 51 seconds)
2025-08-07 01:17:57,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:17:58,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 53.61586 ± 52.491
2025-08-07 01:17:58,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [24.53805, 86.1135, 20.876328, 19.480122, 61.659615, 86.047905, 12.050858, 14.970861, 188.1333, 22.288008]
2025-08-07 01:17:58,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [28.0, 66.0, 22.0, 23.0, 61.0, 55.0, 18.0, 21.0, 119.0, 29.0]
2025-08-07 01:17:58,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 14 minutes, 56 seconds)
2025-08-07 01:19:36,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:19:37,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 71.88005 ± 74.040
2025-08-07 01:19:37,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [33.22863, 141.44336, 102.28055, 18.275501, 104.15642, 17.285618, 14.783207, 24.168695, 13.269166, 249.9094]
2025-08-07 01:19:37,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [31.0, 77.0, 63.0, 22.0, 65.0, 18.0, 18.0, 29.0, 21.0, 124.0]
2025-08-07 01:19:37,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1226 [INFO]: New best (71.88) for latency ExtremeSparseL4U32
2025-08-07 01:19:37,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 13 minutes, 31 seconds)
2025-08-07 01:21:17,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:21:17,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 59.08268 ± 55.142
2025-08-07 01:21:17,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [185.3031, 24.025587, 22.543175, 23.97538, 34.578247, 120.53019, 23.225782, 22.342922, 109.20134, 25.101156]
2025-08-07 01:21:17,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [117.0, 30.0, 30.0, 26.0, 45.0, 85.0, 28.0, 25.0, 77.0, 26.0]
2025-08-07 01:21:17,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 12 minutes, 20 seconds)
2025-08-07 01:22:55,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:22:55,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 51.61414 ± 53.349
2025-08-07 01:22:55,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [16.885757, 17.927107, 23.09257, 19.233704, 28.076097, 16.211384, 166.0702, 139.84976, 17.37301, 71.42175]
2025-08-07 01:22:55,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [21.0, 24.0, 31.0, 24.0, 31.0, 24.0, 102.0, 93.0, 21.0, 65.0]
2025-08-07 01:22:55,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 10 minutes, 20 seconds)
2025-08-07 01:24:34,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:24:35,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 46.18227 ± 58.114
2025-08-07 01:24:35,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [16.147541, 11.182545, 18.856792, 10.328553, 16.572546, 62.951035, 20.10348, 210.26424, 68.29912, 27.116785]
2025-08-07 01:24:35,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [21.0, 13.0, 23.0, 13.0, 26.0, 62.0, 22.0, 144.0, 50.0, 31.0]
2025-08-07 01:24:35,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 8 minutes, 49 seconds)
2025-08-07 01:26:12,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:26:13,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 64.64523 ± 65.221
2025-08-07 01:26:13,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [38.362167, 27.197414, 21.487368, 13.085288, 136.62645, 190.57007, 18.473032, 17.780247, 23.901598, 158.96857]
2025-08-07 01:26:13,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [32.0, 30.0, 29.0, 15.0, 92.0, 99.0, 28.0, 25.0, 27.0, 83.0]
2025-08-07 01:26:13,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 7 minutes, 7 seconds)
2025-08-07 01:27:52,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:27:52,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 56.13906 ± 39.834
2025-08-07 01:27:52,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [114.65557, 77.53201, 18.450525, 18.869522, 19.902025, 95.731606, 14.719028, 97.778244, 89.35954, 14.392529]
2025-08-07 01:27:52,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [82.0, 96.0, 19.0, 21.0, 21.0, 60.0, 18.0, 75.0, 67.0, 19.0]
2025-08-07 01:27:52,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 5 minutes, 27 seconds)
2025-08-07 01:29:31,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:29:32,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 36.49782 ± 25.019
2025-08-07 01:29:32,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [16.766518, 61.15166, 21.233458, 29.712557, 62.606644, 18.610828, 22.068087, 92.257324, 11.954847, 28.616268]
2025-08-07 01:29:32,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [23.0, 57.0, 27.0, 28.0, 53.0, 22.0, 28.0, 59.0, 18.0, 29.0]
2025-08-07 01:29:32,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 3 minutes, 34 seconds)
2025-08-07 01:31:10,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:31:11,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 72.44596 ± 90.797
2025-08-07 01:31:11,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [14.541661, 72.70355, 15.612499, 16.445778, 167.1832, 83.737335, 22.114147, 16.09868, 10.9270315, 305.09567]
2025-08-07 01:31:11,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [18.0, 73.0, 22.0, 23.0, 100.0, 61.0, 26.0, 21.0, 13.0, 168.0]
2025-08-07 01:31:11,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1226 [INFO]: New best (72.45) for latency ExtremeSparseL4U32
2025-08-07 01:31:11,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 2 minutes, 16 seconds)
2025-08-07 01:32:50,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:32:50,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 47.85558 ± 50.460
2025-08-07 01:32:50,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [19.417896, 82.53413, 13.709693, 13.929608, 25.57751, 11.378956, 84.94588, 13.84121, 36.360455, 176.86043]
2025-08-07 01:32:50,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [25.0, 62.0, 20.0, 19.0, 33.0, 13.0, 54.0, 16.0, 51.0, 129.0]
2025-08-07 01:32:50,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 37 seconds)
2025-08-07 01:34:29,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:34:30,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 76.40250 ± 62.972
2025-08-07 01:34:30,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [21.881989, 14.391076, 18.53089, 21.594984, 208.36865, 106.85402, 144.9719, 28.174044, 103.77485, 95.48259]
2025-08-07 01:34:30,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [25.0, 18.0, 25.0, 27.0, 195.0, 66.0, 116.0, 32.0, 67.0, 70.0]
2025-08-07 01:34:30,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1226 [INFO]: New best (76.40) for latency ExtremeSparseL4U32
2025-08-07 01:34:30,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 29/100 (estimated time remaining: 1 hour, 59 minutes, 13 seconds)
2025-08-07 01:36:09,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:36:10,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 59.94896 ± 39.416
2025-08-07 01:36:10,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [66.328384, 69.4642, 130.81026, 106.37047, 16.614014, 19.040628, 56.578537, 25.638885, 12.936696, 95.707466]
2025-08-07 01:36:10,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [41.0, 61.0, 103.0, 67.0, 49.0, 24.0, 52.0, 30.0, 18.0, 81.0]
2025-08-07 01:36:10,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 30/100 (estimated time remaining: 1 hour, 57 minutes, 45 seconds)
2025-08-07 01:37:48,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:37:49,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 61.56597 ± 54.493
2025-08-07 01:37:49,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [23.0228, 90.10856, 23.484835, 83.35584, 16.392582, 16.89298, 29.80977, 196.92003, 99.65793, 36.014393]
2025-08-07 01:37:49,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [25.0, 83.0, 29.0, 62.0, 22.0, 20.0, 25.0, 113.0, 66.0, 32.0]
2025-08-07 01:37:49,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 56 minutes, 2 seconds)
2025-08-07 01:39:27,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:39:28,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 31.04422 ± 21.542
2025-08-07 01:39:28,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [16.629202, 18.20945, 22.145716, 30.894876, 90.55613, 40.688457, 14.73233, 21.251099, 36.87434, 18.460608]
2025-08-07 01:39:28,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [18.0, 24.0, 24.0, 35.0, 54.0, 43.0, 17.0, 25.0, 46.0, 26.0]
2025-08-07 01:39:28,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 54 minutes, 17 seconds)
2025-08-07 01:41:07,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:41:07,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 59.58328 ± 44.387
2025-08-07 01:41:07,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [21.874752, 9.1695595, 125.23148, 73.08072, 121.892044, 95.76501, 92.27067, 20.562376, 19.556644, 16.429634]
2025-08-07 01:41:07,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [26.0, 13.0, 77.0, 48.0, 79.0, 66.0, 68.0, 21.0, 22.0, 27.0]
2025-08-07 01:41:07,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 52 minutes, 38 seconds)
2025-08-07 01:42:44,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:42:45,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 42.57030 ± 49.780
2025-08-07 01:42:45,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [160.82677, 118.74873, 14.833714, 13.975363, 17.60821, 17.071215, 20.804298, 32.303497, 12.854102, 16.677153]
2025-08-07 01:42:45,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [103.0, 70.0, 19.0, 22.0, 28.0, 28.0, 23.0, 52.0, 21.0, 20.0]
2025-08-07 01:42:45,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 50 minutes, 29 seconds)
2025-08-07 01:44:23,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:44:23,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 43.79466 ± 57.424
2025-08-07 01:44:23,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [16.994339, 16.872902, 204.61925, 28.054766, 85.873505, 22.729542, 14.499576, 11.31, 15.994993, 20.99776]
2025-08-07 01:44:23,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [22.0, 26.0, 109.0, 33.0, 64.0, 25.0, 20.0, 16.0, 28.0, 25.0]
2025-08-07 01:44:23,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 48 minutes, 35 seconds)
2025-08-07 01:46:00,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:46:01,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 63.48597 ± 61.160
2025-08-07 01:46:01,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [31.365473, 79.526215, 12.354657, 192.45778, 103.25063, 11.098107, 24.041164, 145.7151, 17.187931, 17.862638]
2025-08-07 01:46:01,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [33.0, 57.0, 15.0, 122.0, 60.0, 13.0, 28.0, 99.0, 23.0, 24.0]
2025-08-07 01:46:01,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 46 minutes, 38 seconds)
2025-08-07 01:47:38,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:47:39,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 81.21819 ± 68.840
2025-08-07 01:47:39,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [27.08762, 235.60628, 18.597157, 29.501307, 159.27817, 37.23825, 113.95652, 92.71523, 86.06351, 12.137807]
2025-08-07 01:47:39,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [28.0, 134.0, 22.0, 34.0, 81.0, 32.0, 67.0, 66.0, 64.0, 20.0]
2025-08-07 01:47:39,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1226 [INFO]: New best (81.22) for latency ExtremeSparseL4U32
2025-08-07 01:47:39,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 44 minutes, 43 seconds)
2025-08-07 01:49:16,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:49:17,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 67.14684 ± 61.528
2025-08-07 01:49:17,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [18.990328, 12.567212, 25.107172, 121.094185, 189.00056, 148.80576, 24.959417, 21.590975, 21.519924, 87.8328]
2025-08-07 01:49:17,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [23.0, 24.0, 27.0, 69.0, 118.0, 101.0, 25.0, 26.0, 30.0, 68.0]
2025-08-07 01:49:17,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 42 minutes, 42 seconds)
2025-08-07 01:50:53,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:50:54,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 59.03039 ± 58.103
2025-08-07 01:50:54,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [21.102142, 17.785353, 129.89885, 23.364658, 196.09154, 68.69585, 13.779476, 29.259134, 77.78299, 12.543857]
2025-08-07 01:50:54,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [30.0, 19.0, 74.0, 26.0, 127.0, 73.0, 15.0, 28.0, 51.0, 20.0]
2025-08-07 01:50:54,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 41 minutes, 7 seconds)
2025-08-07 01:52:32,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:52:32,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 55.97742 ± 42.500
2025-08-07 01:52:32,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [120.337395, 14.045235, 112.54461, 17.487179, 20.16345, 29.696764, 40.555252, 12.674055, 86.94977, 105.32044]
2025-08-07 01:52:32,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [80.0, 16.0, 111.0, 19.0, 21.0, 27.0, 64.0, 20.0, 56.0, 67.0]
2025-08-07 01:52:32,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 39 minutes, 26 seconds)
2025-08-07 01:54:09,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:54:10,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 59.51784 ± 64.169
2025-08-07 01:54:10,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [215.48267, 31.595474, 20.071838, 36.26095, 24.814821, 12.751197, 133.51204, 13.716006, 15.157891, 91.8155]
2025-08-07 01:54:10,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [126.0, 29.0, 23.0, 37.0, 26.0, 18.0, 85.0, 28.0, 18.0, 67.0]
2025-08-07 01:54:10,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 37 minutes, 41 seconds)
2025-08-07 01:55:46,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:55:47,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 35.43802 ± 27.959
2025-08-07 01:55:47,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [27.869946, 26.537075, 92.304405, 24.153013, 12.971637, 22.72748, 89.126, 16.662838, 19.974216, 22.053616]
2025-08-07 01:55:47,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [28.0, 32.0, 74.0, 30.0, 17.0, 23.0, 76.0, 22.0, 27.0, 26.0]
2025-08-07 01:55:47,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 35 minutes, 57 seconds)
2025-08-07 01:57:24,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:57:24,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 69.90593 ± 97.529
2025-08-07 01:57:24,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [12.7983465, 20.986282, 14.255314, 10.480073, 47.722557, 12.920803, 19.250296, 333.1502, 154.9081, 72.58735]
2025-08-07 01:57:24,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [16.0, 26.0, 15.0, 15.0, 62.0, 17.0, 25.0, 155.0, 89.0, 60.0]
2025-08-07 01:57:24,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 34 minutes, 18 seconds)
2025-08-07 01:59:01,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:59:01,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 45.41230 ± 38.446
2025-08-07 01:59:01,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [110.068985, 37.275703, 24.573303, 10.979859, 120.23477, 71.019226, 25.895882, 16.134596, 22.44117, 15.499486]
2025-08-07 01:59:01,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [83.0, 30.0, 29.0, 18.0, 72.0, 64.0, 32.0, 20.0, 25.0, 17.0]
2025-08-07 01:59:01,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 32 minutes, 35 seconds)
2025-08-07 02:00:38,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:00:38,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 53.44923 ± 54.888
2025-08-07 02:00:38,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [11.315589, 21.777323, 19.392992, 13.98402, 103.98177, 19.705194, 38.861885, 113.9362, 177.8829, 13.654436]
2025-08-07 02:00:38,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [23.0, 26.0, 24.0, 18.0, 77.0, 20.0, 41.0, 68.0, 115.0, 23.0]
2025-08-07 02:00:38,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 30 minutes, 41 seconds)
2025-08-07 02:02:15,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:02:16,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 61.47628 ± 58.353
2025-08-07 02:02:16,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [23.596287, 28.775473, 27.487064, 191.60439, 81.96519, 67.4085, 14.245522, 21.110533, 14.058314, 144.51158]
2025-08-07 02:02:16,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [25.0, 27.0, 33.0, 112.0, 61.0, 45.0, 18.0, 24.0, 22.0, 110.0]
2025-08-07 02:02:16,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 29 minutes, 8 seconds)
2025-08-07 02:03:52,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:03:53,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 47.59700 ± 51.179
2025-08-07 02:03:53,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [21.636839, 26.799524, 16.830229, 28.271177, 114.62391, 33.233063, 176.29395, 12.926283, 28.198563, 17.156443]
2025-08-07 02:03:53,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [22.0, 24.0, 18.0, 26.0, 72.0, 31.0, 87.0, 22.0, 28.0, 30.0]
2025-08-07 02:03:53,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 27 minutes, 30 seconds)
2025-08-07 02:05:30,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:05:31,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 75.97248 ± 50.066
2025-08-07 02:05:31,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [87.33212, 13.669841, 19.375614, 111.744804, 136.15729, 20.749235, 13.802749, 109.03491, 110.838486, 137.0197]
2025-08-07 02:05:31,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [60.0, 23.0, 24.0, 71.0, 101.0, 20.0, 19.0, 69.0, 75.0, 82.0]
2025-08-07 02:05:31,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 25 minutes, 58 seconds)
2025-08-07 02:07:07,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:07:07,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 45.43231 ± 50.588
2025-08-07 02:07:07,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [104.678635, 14.240813, 12.286992, 15.04968, 35.814724, 51.815628, 11.644509, 172.89604, 23.1662, 12.729906]
2025-08-07 02:07:07,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [84.0, 17.0, 28.0, 20.0, 55.0, 81.0, 19.0, 115.0, 26.0, 15.0]
2025-08-07 02:07:07,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 24 minutes, 15 seconds)
2025-08-07 02:08:44,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:08:45,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 62.28917 ± 93.375
2025-08-07 02:08:45,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [27.457075, 19.751171, 11.875198, 18.30896, 311.46304, 22.415808, 11.344877, 14.198784, 25.162525, 160.91428]
2025-08-07 02:08:45,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [27.0, 22.0, 33.0, 23.0, 176.0, 25.0, 29.0, 18.0, 32.0, 91.0]
2025-08-07 02:08:45,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 22 minutes, 40 seconds)
2025-08-07 02:10:21,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:10:22,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 74.10884 ± 55.100
2025-08-07 02:10:22,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [12.307336, 63.052597, 163.8042, 23.279234, 91.321655, 156.41162, 98.88157, 13.61318, 103.94131, 14.475768]
2025-08-07 02:10:22,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [17.0, 82.0, 110.0, 24.0, 73.0, 86.0, 63.0, 15.0, 92.0, 21.0]
2025-08-07 02:10:22,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 21 minutes)
2025-08-07 02:11:58,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:11:59,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 50.74633 ± 49.064
2025-08-07 02:11:59,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [108.90905, 138.66725, 20.478382, 9.716528, 20.127731, 23.964146, 29.025549, 20.52802, 126.00178, 10.044918]
2025-08-07 02:11:59,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [63.0, 95.0, 31.0, 12.0, 25.0, 23.0, 61.0, 23.0, 92.0, 12.0]
2025-08-07 02:11:59,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 19 minutes, 22 seconds)
2025-08-07 02:13:35,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:13:36,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 44.98584 ± 40.505
2025-08-07 02:13:36,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [135.36684, 103.38177, 21.09786, 13.6485615, 23.321568, 26.091074, 67.904594, 21.0037, 18.788635, 19.25375]
2025-08-07 02:13:36,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [78.0, 65.0, 26.0, 20.0, 26.0, 28.0, 55.0, 25.0, 22.0, 25.0]
2025-08-07 02:13:36,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 17 minutes, 33 seconds)
2025-08-07 02:15:13,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:15:13,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 62.36273 ± 62.490
2025-08-07 02:15:13,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [133.14194, 201.67767, 16.700457, 24.411428, 112.10193, 69.60958, 14.652235, 10.498055, 27.310549, 13.5234165]
2025-08-07 02:15:13,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [99.0, 115.0, 19.0, 31.0, 72.0, 80.0, 15.0, 13.0, 33.0, 23.0]
2025-08-07 02:15:13,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 16 minutes, 8 seconds)
2025-08-07 02:16:50,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:16:50,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 24.76519 ± 23.569
2025-08-07 02:16:50,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [17.567915, 18.896914, 22.193552, 15.629419, 21.628857, 17.986883, 94.762955, 10.777854, 14.515953, 13.69161]
2025-08-07 02:16:50,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [23.0, 32.0, 26.0, 17.0, 25.0, 23.0, 86.0, 14.0, 25.0, 20.0]
2025-08-07 02:16:50,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 14 minutes, 25 seconds)
2025-08-07 02:18:26,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:18:27,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 67.74053 ± 61.063
2025-08-07 02:18:27,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [18.531242, 15.799559, 15.851995, 138.36018, 177.94699, 22.002516, 20.785774, 117.560356, 127.483536, 23.083185]
2025-08-07 02:18:27,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [21.0, 21.0, 26.0, 78.0, 110.0, 26.0, 22.0, 74.0, 82.0, 24.0]
2025-08-07 02:18:27,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 12 minutes, 46 seconds)
2025-08-07 02:20:04,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:20:04,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 37.23701 ± 36.978
2025-08-07 02:20:04,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [34.075275, 26.406952, 138.01549, 14.255203, 25.096977, 16.578253, 23.842602, 12.051474, 67.905045, 14.142823]
2025-08-07 02:20:04,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [33.0, 31.0, 79.0, 25.0, 29.0, 24.0, 29.0, 18.0, 66.0, 16.0]
2025-08-07 02:20:04,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 11 minutes, 10 seconds)
2025-08-07 02:21:40,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:21:41,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 56.22158 ± 50.992
2025-08-07 02:21:41,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [155.61528, 35.55371, 90.2979, 20.39786, 17.424545, 17.657501, 24.083607, 25.592342, 143.89455, 31.698534]
2025-08-07 02:21:41,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [103.0, 28.0, 60.0, 20.0, 23.0, 28.0, 26.0, 32.0, 87.0, 31.0]
2025-08-07 02:21:41,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 9 minutes, 33 seconds)
2025-08-07 02:23:18,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:23:19,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 71.04611 ± 61.337
2025-08-07 02:23:19,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [15.44863, 146.17865, 75.88494, 15.646507, 92.61037, 16.969723, 32.577034, 14.978896, 98.666046, 201.50034]
2025-08-07 02:23:19,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 96.0, 50.0, 28.0, 69.0, 29.0, 30.0, 27.0, 64.0, 109.0]
2025-08-07 02:23:19,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 7 minutes, 57 seconds)
2025-08-07 02:24:55,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:24:56,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 75.32593 ± 55.014
2025-08-07 02:24:56,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [14.02206, 78.30674, 17.018185, 112.48795, 26.199717, 158.13358, 79.270096, 19.154339, 171.62497, 77.0416]
2025-08-07 02:24:56,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [16.0, 52.0, 20.0, 92.0, 30.0, 114.0, 72.0, 31.0, 102.0, 52.0]
2025-08-07 02:24:56,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 6 minutes, 27 seconds)
2025-08-07 02:26:32,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:26:33,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 82.20539 ± 73.895
2025-08-07 02:26:33,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [104.10396, 154.58488, 19.149454, 18.082798, 19.52098, 166.30476, 232.79092, 26.123528, 56.975975, 24.416658]
2025-08-07 02:26:33,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [78.0, 92.0, 26.0, 20.0, 29.0, 114.0, 166.0, 27.0, 51.0, 26.0]
2025-08-07 02:26:33,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1226 [INFO]: New best (82.21) for latency ExtremeSparseL4U32
2025-08-07 02:26:33,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 4 minutes, 50 seconds)
2025-08-07 02:28:10,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:28:11,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 45.29370 ± 47.278
2025-08-07 02:28:11,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [18.513906, 161.01497, 13.51076, 19.052155, 81.33523, 89.87443, 17.990965, 10.115261, 14.763756, 26.765633]
2025-08-07 02:28:11,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [23.0, 111.0, 18.0, 26.0, 79.0, 66.0, 18.0, 17.0, 31.0, 32.0]
2025-08-07 02:28:11,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 3 minutes, 18 seconds)
2025-08-07 02:29:47,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:29:48,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 35.34363 ± 31.867
2025-08-07 02:29:48,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [11.340556, 88.27804, 21.767479, 20.973137, 22.801893, 23.054285, 15.506934, 24.802076, 107.74853, 17.16339]
2025-08-07 02:29:48,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [15.0, 61.0, 22.0, 33.0, 27.0, 28.0, 20.0, 33.0, 73.0, 29.0]
2025-08-07 02:29:48,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 1 minute, 40 seconds)
2025-08-07 02:31:25,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:31:26,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 72.59088 ± 59.326
2025-08-07 02:31:26,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [16.217321, 17.695732, 16.818527, 27.842411, 137.8935, 78.803635, 104.062485, 186.92859, 122.31056, 17.336048]
2025-08-07 02:31:26,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [29.0, 22.0, 21.0, 31.0, 92.0, 53.0, 76.0, 115.0, 80.0, 21.0]
2025-08-07 02:31:26,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 6 seconds)
2025-08-07 02:33:02,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:33:03,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 82.31911 ± 67.865
2025-08-07 02:33:03,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [185.38112, 195.3982, 11.515774, 122.620316, 21.113066, 15.939924, 124.666565, 31.250328, 91.12434, 24.181515]
2025-08-07 02:33:03,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [128.0, 111.0, 15.0, 100.0, 21.0, 24.0, 110.0, 29.0, 73.0, 32.0]
2025-08-07 02:33:03,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1226 [INFO]: New best (82.32) for latency ExtremeSparseL4U32
2025-08-07 02:33:03,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 65/100 (estimated time remaining: 58 minutes, 27 seconds)
2025-08-07 02:34:40,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:34:41,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 50.83146 ± 58.236
2025-08-07 02:34:41,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [21.284828, 22.307943, 18.461584, 26.226742, 18.938227, 15.475508, 26.717558, 183.26, 26.781898, 148.86026]
2025-08-07 02:34:41,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [28.0, 33.0, 32.0, 26.0, 20.0, 26.0, 26.0, 117.0, 28.0, 90.0]
2025-08-07 02:34:41,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 66/100 (estimated time remaining: 56 minutes, 52 seconds)
2025-08-07 02:36:17,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:36:18,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 55.05421 ± 57.057
2025-08-07 02:36:18,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [23.266474, 13.502642, 114.499084, 25.598234, 26.564854, 12.064051, 158.25902, 13.171684, 148.59082, 15.025297]
2025-08-07 02:36:18,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [27.0, 17.0, 67.0, 26.0, 29.0, 15.0, 126.0, 20.0, 99.0, 17.0]
2025-08-07 02:36:18,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 67/100 (estimated time remaining: 55 minutes, 9 seconds)
2025-08-07 02:37:56,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:37:57,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 93.83751 ± 81.138
2025-08-07 02:37:57,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [81.27005, 277.03098, 11.762657, 19.898243, 78.07276, 23.131365, 151.17082, 173.88115, 98.65352, 23.503511]
2025-08-07 02:37:57,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [60.0, 174.0, 17.0, 22.0, 51.0, 30.0, 125.0, 104.0, 82.0, 28.0]
2025-08-07 02:37:57,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1226 [INFO]: New best (93.84) for latency ExtremeSparseL4U32
2025-08-07 02:37:57,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 68/100 (estimated time remaining: 53 minutes, 45 seconds)
2025-08-07 02:39:33,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:39:34,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 105.94869 ± 70.994
2025-08-07 02:39:34,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [218.2537, 183.13841, 173.50685, 103.847374, 16.429762, 14.469155, 83.95989, 9.832667, 118.44526, 137.60388]
2025-08-07 02:39:34,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [131.0, 97.0, 124.0, 74.0, 22.0, 28.0, 58.0, 15.0, 88.0, 129.0]
2025-08-07 02:39:34,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1226 [INFO]: New best (105.95) for latency ExtremeSparseL4U32
2025-08-07 02:39:34,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 69/100 (estimated time remaining: 52 minutes)
2025-08-07 02:41:11,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:41:12,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 60.70934 ± 85.086
2025-08-07 02:41:12,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [29.947721, 19.983683, 13.835409, 15.934387, 15.402327, 13.690231, 208.77982, 250.47041, 19.144228, 19.905132]
2025-08-07 02:41:12,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [26.0, 30.0, 17.0, 21.0, 29.0, 17.0, 131.0, 155.0, 29.0, 31.0]
2025-08-07 02:41:12,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 70/100 (estimated time remaining: 50 minutes, 26 seconds)
2025-08-07 02:42:48,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:42:49,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 50.70759 ± 51.790
2025-08-07 02:42:49,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [17.038363, 13.295154, 147.65425, 34.13042, 14.775017, 23.232162, 85.137436, 13.975112, 143.89313, 13.944819]
2025-08-07 02:42:49,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [20.0, 22.0, 87.0, 29.0, 18.0, 31.0, 95.0, 19.0, 91.0, 18.0]
2025-08-07 02:42:49,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 71/100 (estimated time remaining: 48 minutes, 49 seconds)
2025-08-07 02:44:26,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:44:27,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 67.08331 ± 78.766
2025-08-07 02:44:27,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [20.4013, 11.891687, 15.549589, 18.704952, 122.928734, 212.41039, 20.04291, 12.473137, 25.597134, 210.83325]
2025-08-07 02:44:27,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [31.0, 16.0, 23.0, 25.0, 101.0, 114.0, 19.0, 19.0, 26.0, 103.0]
2025-08-07 02:44:27,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 72/100 (estimated time remaining: 47 minutes, 15 seconds)
2025-08-07 02:46:03,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:46:04,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 78.55913 ± 100.648
2025-08-07 02:46:04,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [12.513483, 21.983685, 53.938198, 17.759537, 19.673546, 26.46796, 89.87638, 342.25375, 21.116884, 180.0079]
2025-08-07 02:46:04,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [15.0, 23.0, 45.0, 22.0, 30.0, 33.0, 51.0, 159.0, 21.0, 110.0]
2025-08-07 02:46:04,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 73/100 (estimated time remaining: 45 minutes, 29 seconds)
2025-08-07 02:47:41,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:47:42,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 62.98806 ± 74.255
2025-08-07 02:47:42,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [212.44336, 14.08845, 14.053728, 199.607, 36.534912, 12.146461, 22.046827, 10.971431, 81.58898, 26.399395]
2025-08-07 02:47:42,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [119.0, 19.0, 15.0, 160.0, 39.0, 16.0, 24.0, 14.0, 63.0, 29.0]
2025-08-07 02:47:42,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 74/100 (estimated time remaining: 43 minutes, 53 seconds)
2025-08-07 02:49:19,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:49:20,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 56.65709 ± 83.651
2025-08-07 02:49:20,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [15.265156, 24.866257, 17.342781, 15.346771, 121.508965, 17.154922, 290.1307, 18.610176, 19.130337, 27.214848]
2025-08-07 02:49:20,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 27.0, 25.0, 21.0, 95.0, 23.0, 148.0, 21.0, 30.0, 26.0]
2025-08-07 02:49:20,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 75/100 (estimated time remaining: 42 minutes, 20 seconds)
2025-08-07 02:50:58,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:50:59,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 74.80818 ± 81.419
2025-08-07 02:50:59,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [26.751633, 272.00027, 99.22395, 12.957153, 90.67916, 166.3597, 17.750057, 23.206495, 16.235872, 22.917519]
2025-08-07 02:50:59,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [30.0, 141.0, 63.0, 23.0, 66.0, 122.0, 24.0, 22.0, 19.0, 23.0]
2025-08-07 02:50:59,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 76/100 (estimated time remaining: 40 minutes, 47 seconds)
2025-08-07 02:52:34,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:52:35,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 45.57123 ± 39.335
2025-08-07 02:52:35,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [26.40195, 11.198078, 19.12879, 107.484085, 12.145972, 27.733406, 117.39929, 17.899948, 28.890852, 87.4299]
2025-08-07 02:52:35,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [32.0, 17.0, 25.0, 89.0, 17.0, 31.0, 82.0, 21.0, 30.0, 60.0]
2025-08-07 02:52:35,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 77/100 (estimated time remaining: 39 minutes, 2 seconds)
2025-08-07 02:54:11,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:54:11,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 37.94358 ± 41.540
2025-08-07 02:54:11,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [107.575195, 16.599466, 22.655258, 13.61925, 24.819687, 17.997305, 20.724281, 132.11838, 13.862493, 9.464476]
2025-08-07 02:54:11,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [84.0, 17.0, 26.0, 19.0, 30.0, 25.0, 31.0, 72.0, 18.0, 14.0]
2025-08-07 02:54:11,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 78/100 (estimated time remaining: 37 minutes, 21 seconds)
2025-08-07 02:55:48,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:55:49,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 108.04799 ± 81.679
2025-08-07 02:55:49,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [89.05136, 236.82558, 69.41106, 19.58246, 92.91213, 234.7247, 28.387123, 101.26565, 198.37679, 9.943012]
2025-08-07 02:55:49,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [60.0, 132.0, 46.0, 24.0, 67.0, 130.0, 24.0, 79.0, 99.0, 15.0]
2025-08-07 02:55:49,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1226 [INFO]: New best (108.05) for latency ExtremeSparseL4U32
2025-08-07 02:55:49,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 79/100 (estimated time remaining: 35 minutes, 46 seconds)
2025-08-07 02:57:26,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:57:27,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 105.31116 ± 109.937
2025-08-07 02:57:27,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [12.350483, 15.673746, 19.039259, 218.70328, 351.19943, 151.32208, 16.587868, 177.42383, 22.426723, 68.38491]
2025-08-07 02:57:27,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [20.0, 21.0, 30.0, 111.0, 180.0, 91.0, 21.0, 102.0, 32.0, 49.0]
2025-08-07 02:57:27,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 80/100 (estimated time remaining: 34 minutes, 5 seconds)
2025-08-07 02:59:04,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:59:05,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 47.05795 ± 44.830
2025-08-07 02:59:05,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [35.53901, 16.5837, 130.48999, 19.748093, 19.554691, 141.40923, 23.663425, 31.042824, 27.528038, 25.020445]
2025-08-07 02:59:05,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [33.0, 20.0, 121.0, 20.0, 21.0, 74.0, 28.0, 30.0, 30.0, 32.0]
2025-08-07 02:59:05,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 81/100 (estimated time remaining: 32 minutes, 25 seconds)
2025-08-07 03:00:41,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:00:42,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 79.55849 ± 71.522
2025-08-07 03:00:42,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [30.842134, 16.233839, 13.953652, 16.149345, 161.4505, 178.72064, 148.1535, 19.59391, 33.526894, 176.96048]
2025-08-07 03:00:42,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [29.0, 32.0, 20.0, 19.0, 102.0, 114.0, 93.0, 22.0, 29.0, 101.0]
2025-08-07 03:00:42,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 82/100 (estimated time remaining: 30 minutes, 51 seconds)
2025-08-07 03:02:19,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:02:19,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 69.22047 ± 80.754
2025-08-07 03:02:19,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [177.92915, 32.330444, 109.893524, 12.688586, 19.825483, 25.71151, 21.002462, 17.268219, 256.99567, 18.559692]
2025-08-07 03:02:19,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [108.0, 31.0, 83.0, 17.0, 27.0, 26.0, 25.0, 29.0, 130.0, 29.0]
2025-08-07 03:02:19,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 83/100 (estimated time remaining: 29 minutes, 17 seconds)
2025-08-07 03:03:56,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:03:56,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 57.16754 ± 59.589
2025-08-07 03:03:56,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [190.87596, 110.60074, 15.407018, 128.73975, 17.727676, 18.304373, 20.181295, 21.488548, 27.73037, 20.619736]
2025-08-07 03:03:56,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [146.0, 80.0, 23.0, 80.0, 25.0, 23.0, 28.0, 26.0, 31.0, 29.0]
2025-08-07 03:03:57,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 84/100 (estimated time remaining: 27 minutes, 36 seconds)
2025-08-07 03:05:33,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:05:34,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 62.49424 ± 79.100
2025-08-07 03:05:34,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [14.09653, 30.417603, 14.675953, 17.123049, 27.662008, 265.93723, 37.848866, 159.3676, 28.252111, 29.561419]
2025-08-07 03:05:34,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [26.0, 29.0, 17.0, 25.0, 31.0, 157.0, 33.0, 106.0, 27.0, 28.0]
2025-08-07 03:05:34,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 85/100 (estimated time remaining: 25 minutes, 57 seconds)
2025-08-07 03:07:11,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:07:11,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 33.34773 ± 28.963
2025-08-07 03:07:11,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [86.23186, 19.10161, 94.02223, 19.458439, 11.441496, 12.352344, 28.715254, 28.21554, 19.13927, 14.799249]
2025-08-07 03:07:11,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [59.0, 24.0, 75.0, 21.0, 31.0, 21.0, 30.0, 32.0, 25.0, 20.0]
2025-08-07 03:07:11,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 86/100 (estimated time remaining: 24 minutes, 19 seconds)
2025-08-07 03:08:49,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:08:50,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 57.85808 ± 62.678
2025-08-07 03:08:50,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [13.818909, 206.40195, 75.789154, 14.798772, 136.80508, 19.508963, 64.23496, 19.726562, 16.594728, 10.901761]
2025-08-07 03:08:50,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [20.0, 137.0, 54.0, 25.0, 83.0, 20.0, 45.0, 29.0, 20.0, 21.0]
2025-08-07 03:08:50,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 87/100 (estimated time remaining: 22 minutes, 45 seconds)
2025-08-07 03:10:26,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:10:27,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 120.02714 ± 137.440
2025-08-07 03:10:27,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [18.525904, 14.01542, 320.56537, 247.95465, 22.769602, 155.49382, 14.181681, 377.90457, 9.228582, 19.631666]
2025-08-07 03:10:27,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [31.0, 16.0, 152.0, 136.0, 29.0, 103.0, 16.0, 199.0, 11.0, 26.0]
2025-08-07 03:10:27,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1226 [INFO]: New best (120.03) for latency ExtremeSparseL4U32
2025-08-07 03:10:27,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 88/100 (estimated time remaining: 21 minutes, 8 seconds)
2025-08-07 03:12:04,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:12:04,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 67.28069 ± 80.209
2025-08-07 03:12:04,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [18.783232, 13.183637, 74.421196, 29.299711, 24.58796, 31.398966, 21.256023, 13.065695, 241.36664, 205.44386]
2025-08-07 03:12:04,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [29.0, 26.0, 54.0, 29.0, 28.0, 31.0, 25.0, 25.0, 120.0, 114.0]
2025-08-07 03:12:04,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 89/100 (estimated time remaining: 19 minutes, 31 seconds)
2025-08-07 03:13:41,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:13:42,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 94.86111 ± 87.743
2025-08-07 03:13:42,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [143.21645, 18.94961, 73.00712, 169.22638, 32.576126, 201.41714, 264.16748, 13.482233, 18.643198, 13.925349]
2025-08-07 03:13:42,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [83.0, 24.0, 50.0, 104.0, 28.0, 131.0, 143.0, 25.0, 22.0, 29.0]
2025-08-07 03:13:42,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 90/100 (estimated time remaining: 17 minutes, 53 seconds)
2025-08-07 03:15:19,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:15:20,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 64.56552 ± 74.479
2025-08-07 03:15:20,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [27.523146, 11.735065, 109.04038, 23.125877, 137.63884, 12.65231, 24.826326, 28.508215, 250.82495, 19.780107]
2025-08-07 03:15:20,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [26.0, 13.0, 100.0, 24.0, 115.0, 22.0, 30.0, 33.0, 194.0, 30.0]
2025-08-07 03:15:20,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 91/100 (estimated time remaining: 16 minutes, 17 seconds)
2025-08-07 03:16:57,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:16:58,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 111.29734 ± 97.673
2025-08-07 03:16:58,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [185.39693, 14.068353, 153.58919, 20.904593, 13.015749, 256.63458, 242.98088, 29.05052, 186.10376, 11.228768]
2025-08-07 03:16:58,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [102.0, 16.0, 98.0, 32.0, 20.0, 145.0, 119.0, 31.0, 131.0, 14.0]
2025-08-07 03:16:58,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 92/100 (estimated time remaining: 14 minutes, 38 seconds)
2025-08-07 03:18:35,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:18:35,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 58.17612 ± 48.829
2025-08-07 03:18:35,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [129.82939, 22.41264, 15.500104, 17.0546, 29.164862, 132.89795, 14.235324, 110.96585, 17.753025, 91.947464]
2025-08-07 03:18:35,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [108.0, 29.0, 24.0, 27.0, 30.0, 92.0, 15.0, 73.0, 31.0, 67.0]
2025-08-07 03:18:35,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 93/100 (estimated time remaining: 13 minutes, 1 second)
2025-08-07 03:20:12,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:20:13,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 71.00130 ± 61.792
2025-08-07 03:20:13,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [130.22467, 8.694967, 17.84647, 15.861739, 110.37339, 77.50877, 14.75841, 19.985865, 193.75562, 121.003174]
2025-08-07 03:20:13,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [80.0, 12.0, 22.0, 17.0, 74.0, 60.0, 21.0, 21.0, 123.0, 98.0]
2025-08-07 03:20:13,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 94/100 (estimated time remaining: 11 minutes, 24 seconds)
2025-08-07 03:21:51,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:21:52,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 68.43177 ± 58.919
2025-08-07 03:21:52,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [12.299866, 27.431705, 124.11639, 24.893534, 23.159525, 112.65946, 138.34393, 17.817493, 176.64536, 26.950422]
2025-08-07 03:21:52,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [25.0, 28.0, 80.0, 25.0, 21.0, 88.0, 85.0, 26.0, 104.0, 28.0]
2025-08-07 03:21:52,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 95/100 (estimated time remaining: 9 minutes, 47 seconds)
2025-08-07 03:23:28,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:23:28,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 57.26440 ± 68.791
2025-08-07 03:23:28,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [27.540686, 13.168606, 22.35709, 13.658933, 237.01297, 119.46521, 23.170658, 83.18282, 11.764342, 21.322685]
2025-08-07 03:23:28,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [27.0, 15.0, 24.0, 16.0, 115.0, 72.0, 28.0, 53.0, 15.0, 24.0]
2025-08-07 03:23:28,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 8 seconds)
2025-08-07 03:25:06,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:25:06,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 31.52568 ± 34.785
2025-08-07 03:25:06,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [15.742095, 16.680248, 17.071623, 23.502392, 135.31647, 26.863865, 19.641489, 19.17576, 16.626478, 24.636435]
2025-08-07 03:25:06,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [22.0, 19.0, 23.0, 26.0, 95.0, 32.0, 19.0, 24.0, 21.0, 27.0]
2025-08-07 03:25:06,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 30 seconds)
2025-08-07 03:26:43,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:26:44,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 72.82048 ± 102.610
2025-08-07 03:26:44,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [21.836624, 356.37845, 17.128069, 13.2242365, 120.99804, 118.344925, 15.250311, 28.683397, 16.934845, 19.425817]
2025-08-07 03:26:44,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [23.0, 181.0, 23.0, 20.0, 87.0, 73.0, 21.0, 32.0, 19.0, 21.0]
2025-08-07 03:26:44,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 98/100 (estimated time remaining: 4 minutes, 53 seconds)
2025-08-07 03:28:21,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:28:22,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 102.29689 ± 81.354
2025-08-07 03:28:22,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [148.38057, 16.708834, 10.648844, 98.68612, 19.763968, 114.810715, 17.942074, 138.67598, 250.72693, 206.6249]
2025-08-07 03:28:22,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [96.0, 27.0, 13.0, 77.0, 31.0, 76.0, 24.0, 92.0, 137.0, 139.0]
2025-08-07 03:28:22,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 15 seconds)
2025-08-07 03:29:59,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:30:00,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 116.90149 ± 91.609
2025-08-07 03:30:00,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [206.19798, 14.15714, 15.156281, 12.279101, 69.36621, 152.16173, 61.933598, 218.53174, 280.49783, 138.73326]
2025-08-07 03:30:00,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [119.0, 20.0, 23.0, 19.0, 48.0, 82.0, 55.0, 152.0, 147.0, 101.0]
2025-08-07 03:30:00,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 37 seconds)
2025-08-07 03:31:39,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:31:39,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1221 [DEBUG]: Total Reward: 53.85471 ± 58.231
2025-08-07 03:31:39,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1222 [DEBUG]: All rewards: [22.560738, 9.750449, 131.33775, 142.6964, 21.105795, 10.288716, 13.143934, 16.651617, 152.85718, 18.154459]
2025-08-07 03:31:39,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1223 [DEBUG]: All trajectory lengths: [24.0, 14.0, 88.0, 88.0, 24.0, 15.0, 15.0, 25.0, 88.0, 21.0]
2025-08-07 03:31:39,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-hopper):1251 [DEBUG]: Training session finished
