2025-08-07 01:16:44,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc25-hopper/ExtremeSparseL4U32-bpql-mem32
2025-08-07 01:16:44,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc25-hopper/ExtremeSparseL4U32-bpql-mem32
2025-08-07 01:16:44,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x1481e2163450>}
2025-08-07 01:16:44,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1111 [DEBUG]: using device: cuda
2025-08-07 01:16:44,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1133 [INFO]: Creating new trainer
2025-08-07 01:16:44,832 baseline-bpql-noiseperc25-hopper:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=107, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-08-07 01:16:44,833 baseline-bpql-noiseperc25-hopper:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 01:16:45,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1194 [DEBUG]: Starting training session...
2025-08-07 01:16:45,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 1/100
2025-08-07 01:18:21,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:18:22,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 19.27706 ± 22.568
2025-08-07 01:18:22,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [12.476537, 10.543343, 86.30603, 8.824411, 14.720692, 10.669506, 11.841127, 6.057203, 12.703512, 18.628288]
2025-08-07 01:18:22,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [22.0, 13.0, 70.0, 15.0, 26.0, 11.0, 13.0, 9.0, 16.0, 24.0]
2025-08-07 01:18:22,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1226 [INFO]: New best (19.28) for latency ExtremeSparseL4U32
2025-08-07 01:18:22,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 38 minutes, 59 seconds)
2025-08-07 01:20:04,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:20:05,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 27.20432 ± 14.133
2025-08-07 01:20:05,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [31.609072, 23.006653, 13.374123, 32.149105, 14.264403, 33.26301, 19.443167, 34.625496, 9.670821, 60.63735]
2025-08-07 01:20:05,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [62.0, 54.0, 22.0, 51.0, 19.0, 37.0, 20.0, 45.0, 17.0, 45.0]
2025-08-07 01:20:05,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1226 [INFO]: New best (27.20) for latency ExtremeSparseL4U32
2025-08-07 01:20:05,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 42 minutes, 55 seconds)
2025-08-07 01:21:47,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:21:47,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 19.46194 ± 5.805
2025-08-07 01:21:47,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [21.070724, 22.842718, 21.298445, 18.14644, 13.815666, 13.896295, 32.829483, 17.03241, 11.521401, 22.165756]
2025-08-07 01:21:47,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [22.0, 27.0, 23.0, 23.0, 28.0, 15.0, 27.0, 20.0, 18.0, 25.0]
2025-08-07 01:21:47,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 42 minutes, 46 seconds)
2025-08-07 01:23:30,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:23:30,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 20.94985 ± 9.047
2025-08-07 01:23:30,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [26.081182, 14.007053, 32.149685, 18.95941, 19.496414, 11.252275, 37.648792, 9.531947, 27.53457, 12.837175]
2025-08-07 01:23:30,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [28.0, 23.0, 32.0, 18.0, 24.0, 17.0, 30.0, 40.0, 29.0, 29.0]
2025-08-07 01:23:30,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 42 minutes, 4 seconds)
2025-08-07 01:25:12,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:25:12,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 17.67354 ± 9.402
2025-08-07 01:25:12,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [8.375638, 38.19403, 11.870883, 7.8723874, 16.261671, 12.52332, 12.047402, 16.618177, 22.610022, 30.36188]
2025-08-07 01:25:12,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [12.0, 59.0, 22.0, 14.0, 24.0, 24.0, 14.0, 23.0, 28.0, 24.0]
2025-08-07 01:25:12,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 40 minutes, 34 seconds)
2025-08-07 01:26:54,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:26:54,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 15.12398 ± 6.805
2025-08-07 01:26:54,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [21.888481, 21.066385, 9.389473, 24.118898, 8.767034, 12.057834, 25.50959, 8.878282, 7.098553, 12.465251]
2025-08-07 01:26:54,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [33.0, 22.0, 13.0, 25.0, 14.0, 18.0, 26.0, 14.0, 14.0, 17.0]
2025-08-07 01:26:54,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 40 minutes, 34 seconds)
2025-08-07 01:28:35,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:28:36,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 31.73555 ± 20.313
2025-08-07 01:28:36,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [38.800472, 36.21461, 32.316517, 78.596855, 11.677731, 12.893014, 49.093132, 10.403028, 34.083584, 13.27655]
2025-08-07 01:28:36,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [33.0, 31.0, 32.0, 57.0, 19.0, 31.0, 43.0, 20.0, 29.0, 24.0]
2025-08-07 01:28:36,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1226 [INFO]: New best (31.74) for latency ExtremeSparseL4U32
2025-08-07 01:28:36,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 38 minutes, 23 seconds)
2025-08-07 01:30:19,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:30:19,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 17.02007 ± 4.942
2025-08-07 01:30:19,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [14.981503, 11.552197, 21.825384, 15.606913, 26.904716, 17.769573, 11.368917, 20.131233, 10.71095, 19.349266]
2025-08-07 01:30:19,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [16.0, 14.0, 25.0, 19.0, 29.0, 24.0, 14.0, 24.0, 25.0, 23.0]
2025-08-07 01:30:19,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 37 minutes, 3 seconds)
2025-08-07 01:32:03,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:32:03,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 37.39616 ± 55.761
2025-08-07 01:32:03,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [14.932654, 9.662463, 76.45662, 12.486527, 194.60341, 11.034356, 10.998764, 16.533033, 14.620555, 12.6331835]
2025-08-07 01:32:03,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [17.0, 12.0, 57.0, 21.0, 111.0, 15.0, 17.0, 25.0, 16.0, 24.0]
2025-08-07 01:32:03,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1226 [INFO]: New best (37.40) for latency ExtremeSparseL4U32
2025-08-07 01:32:03,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 35 minutes, 35 seconds)
2025-08-07 01:33:46,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:33:46,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 19.28214 ± 12.490
2025-08-07 01:33:46,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [27.125124, 53.532024, 13.593948, 14.237053, 16.05553, 9.449178, 13.740722, 14.106533, 9.401295, 21.57998]
2025-08-07 01:33:46,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [26.0, 41.0, 22.0, 23.0, 17.0, 12.0, 19.0, 26.0, 12.0, 23.0]
2025-08-07 01:33:46,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 34 minutes, 9 seconds)
2025-08-07 01:35:31,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:35:32,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 25.45369 ± 35.565
2025-08-07 01:35:32,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [15.093154, 13.572306, 11.79458, 12.389475, 10.5128355, 12.478535, 8.499449, 11.860368, 27.16571, 131.1705]
2025-08-07 01:35:32,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [17.0, 13.0, 19.0, 21.0, 14.0, 19.0, 13.0, 14.0, 24.0, 115.0]
2025-08-07 01:35:32,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 33 minutes, 37 seconds)
2025-08-07 01:37:15,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:37:15,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 29.56844 ± 32.582
2025-08-07 01:37:15,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [20.061766, 10.355213, 15.583927, 23.245281, 11.545879, 60.930523, 18.95757, 116.97787, 11.677311, 6.3490696]
2025-08-07 01:37:15,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [25.0, 20.0, 21.0, 38.0, 15.0, 44.0, 20.0, 80.0, 19.0, 9.0]
2025-08-07 01:37:15,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 32 minutes, 26 seconds)
2025-08-07 01:38:58,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:38:59,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 24.66697 ± 14.450
2025-08-07 01:38:59,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [17.092373, 11.44062, 27.125374, 19.476114, 38.254436, 19.987318, 62.254463, 21.288897, 15.115901, 14.634219]
2025-08-07 01:38:59,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [44.0, 26.0, 27.0, 26.0, 32.0, 19.0, 82.0, 24.0, 17.0, 20.0]
2025-08-07 01:38:59,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 30 minutes, 34 seconds)
2025-08-07 01:40:39,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:40:40,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 17.95836 ± 12.263
2025-08-07 01:40:40,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [16.556171, 22.8706, 15.187034, 15.934854, 5.587632, 12.228791, 52.382706, 10.0506935, 16.286613, 12.498506]
2025-08-07 01:40:40,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 21.0, 20.0, 24.0, 10.0, 17.0, 44.0, 17.0, 34.0, 15.0]
2025-08-07 01:40:40,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 28 minutes, 3 seconds)
2025-08-07 01:42:23,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:42:24,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 38.67865 ± 44.655
2025-08-07 01:42:24,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [20.35304, 25.589113, 24.852716, 16.704197, 21.641277, 7.5942616, 68.14629, 164.11736, 26.552757, 11.23553]
2025-08-07 01:42:24,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [27.0, 22.0, 33.0, 28.0, 21.0, 11.0, 41.0, 115.0, 26.0, 17.0]
2025-08-07 01:42:24,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1226 [INFO]: New best (38.68) for latency ExtremeSparseL4U32
2025-08-07 01:42:24,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 26 minutes, 44 seconds)
2025-08-07 01:44:08,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:44:08,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 25.87229 ± 33.306
2025-08-07 01:44:08,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [7.619934, 124.65832, 10.808446, 19.128399, 20.62977, 9.845214, 12.228364, 18.04929, 23.746159, 12.009027]
2025-08-07 01:44:08,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [12.0, 87.0, 15.0, 21.0, 29.0, 18.0, 14.0, 26.0, 22.0, 14.0]
2025-08-07 01:44:08,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 24 minutes, 33 seconds)
2025-08-07 01:45:52,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:45:53,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 28.27981 ± 39.298
2025-08-07 01:45:53,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [20.6012, 12.25046, 22.119156, 19.896671, 10.261779, 8.437478, 7.8942966, 28.645475, 144.40613, 8.285446]
2025-08-07 01:45:53,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 16.0, 21.0, 28.0, 12.0, 22.0, 12.0, 25.0, 78.0, 11.0]
2025-08-07 01:45:53,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 23 minutes, 5 seconds)
2025-08-07 01:47:35,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:47:35,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 19.45885 ± 7.466
2025-08-07 01:47:35,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [20.756786, 9.078257, 10.573733, 26.421076, 29.858086, 21.654778, 13.781822, 14.250374, 16.85959, 31.354029]
2025-08-07 01:47:35,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 18.0, 17.0, 25.0, 31.0, 25.0, 20.0, 24.0, 17.0, 41.0]
2025-08-07 01:47:35,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 21 minutes, 12 seconds)
2025-08-07 01:49:18,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:49:18,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 29.57578 ± 32.160
2025-08-07 01:49:18,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [118.47624, 15.864139, 16.171164, 40.43745, 13.252477, 13.342075, 46.88701, 14.567005, 8.430273, 8.3299265]
2025-08-07 01:49:18,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [70.0, 18.0, 20.0, 53.0, 31.0, 16.0, 67.0, 20.0, 10.0, 14.0]
2025-08-07 01:49:18,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 19 minutes, 58 seconds)
2025-08-07 01:51:01,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:51:01,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 27.44182 ± 27.302
2025-08-07 01:51:01,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [23.491827, 25.90525, 11.928007, 82.370926, 17.16223, 7.78762, 7.3851757, 78.954926, 8.696217, 10.736008]
2025-08-07 01:51:01,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [22.0, 31.0, 15.0, 62.0, 17.0, 11.0, 14.0, 54.0, 11.0, 15.0]
2025-08-07 01:51:01,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 17 minutes, 59 seconds)
2025-08-07 01:52:45,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:52:46,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 12.62646 ± 4.965
2025-08-07 01:52:46,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [13.586883, 24.208406, 15.452554, 8.517729, 15.575629, 12.255146, 13.828612, 8.366725, 7.263189, 7.209699]
2025-08-07 01:52:46,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [15.0, 21.0, 43.0, 15.0, 16.0, 29.0, 18.0, 10.0, 25.0, 12.0]
2025-08-07 01:52:46,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 16 minutes, 14 seconds)
2025-08-07 01:54:30,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:54:30,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 33.34915 ± 36.209
2025-08-07 01:54:30,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [11.631747, 116.49103, 34.09355, 12.221755, 13.013545, 90.914314, 14.602884, 11.133975, 17.104359, 12.284353]
2025-08-07 01:54:30,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 84.0, 32.0, 16.0, 18.0, 68.0, 27.0, 15.0, 18.0, 17.0]
2025-08-07 01:54:30,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 14 minutes, 36 seconds)
2025-08-07 01:56:13,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:56:14,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 30.33378 ± 33.965
2025-08-07 01:56:14,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [21.92732, 121.36385, 6.998482, 12.237832, 13.637773, 17.2511, 18.655697, 8.68744, 18.794958, 63.78339]
2025-08-07 01:56:14,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [29.0, 72.0, 10.0, 17.0, 18.0, 19.0, 25.0, 12.0, 26.0, 73.0]
2025-08-07 01:56:14,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 13 minutes, 5 seconds)
2025-08-07 01:57:57,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:57:57,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 23.24269 ± 23.717
2025-08-07 01:57:57,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [26.758608, 92.688255, 15.513788, 20.986, 14.6556425, 7.946243, 13.055574, 15.789209, 8.783249, 16.250378]
2025-08-07 01:57:57,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [42.0, 53.0, 26.0, 22.0, 19.0, 12.0, 15.0, 20.0, 12.0, 18.0]
2025-08-07 01:57:57,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 11 minutes, 29 seconds)
2025-08-07 01:59:41,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:59:41,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 14.32615 ± 5.355
2025-08-07 01:59:41,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [7.6271935, 17.532587, 12.796493, 8.205264, 10.656402, 24.464445, 14.4177265, 9.191456, 20.186481, 18.183434]
2025-08-07 01:59:41,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [10.0, 28.0, 14.0, 12.0, 21.0, 45.0, 21.0, 14.0, 30.0, 22.0]
2025-08-07 01:59:41,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 9 minutes, 59 seconds)
2025-08-07 02:01:23,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:01:23,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 24.04322 ± 25.325
2025-08-07 02:01:23,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [11.371466, 9.214982, 15.158984, 14.86028, 15.571126, 26.764462, 98.690315, 12.604476, 15.179766, 21.016397]
2025-08-07 02:01:23,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [20.0, 12.0, 17.0, 16.0, 19.0, 34.0, 65.0, 17.0, 18.0, 24.0]
2025-08-07 02:01:23,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 7 minutes, 44 seconds)
2025-08-07 02:03:06,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:03:07,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 28.94257 ± 26.898
2025-08-07 02:03:07,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [33.057774, 15.377741, 10.612196, 19.817621, 12.644056, 12.733812, 56.710148, 15.482401, 99.01622, 13.973754]
2025-08-07 02:03:07,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [30.0, 21.0, 15.0, 19.0, 16.0, 23.0, 49.0, 19.0, 62.0, 26.0]
2025-08-07 02:03:07,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 5 minutes, 40 seconds)
2025-08-07 02:04:51,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:04:51,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 23.52627 ± 22.729
2025-08-07 02:04:51,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [18.025818, 10.998095, 24.117863, 11.082757, 16.958467, 16.41668, 90.11427, 12.5509, 24.678934, 10.318929]
2025-08-07 02:04:51,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [18.0, 42.0, 22.0, 18.0, 19.0, 26.0, 100.0, 21.0, 22.0, 14.0]
2025-08-07 02:04:51,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 4 minutes, 9 seconds)
2025-08-07 02:06:32,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:06:33,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 15.87273 ± 6.213
2025-08-07 02:06:33,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [9.5401945, 20.008371, 9.250876, 10.725681, 15.195144, 13.975754, 18.469076, 11.103899, 30.221684, 20.236599]
2025-08-07 02:06:33,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [17.0, 27.0, 13.0, 17.0, 17.0, 19.0, 27.0, 13.0, 30.0, 18.0]
2025-08-07 02:06:33,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 2 minutes)
2025-08-07 02:08:17,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:08:18,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 39.95967 ± 45.063
2025-08-07 02:08:18,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [15.060177, 11.347277, 24.865772, 112.99219, 11.900212, 41.136124, 13.403507, 141.92711, 16.010565, 10.953773]
2025-08-07 02:08:18,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [18.0, 14.0, 24.0, 72.0, 14.0, 32.0, 20.0, 97.0, 19.0, 15.0]
2025-08-07 02:08:18,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1226 [INFO]: New best (39.96) for latency ExtremeSparseL4U32
2025-08-07 02:08:18,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 26 seconds)
2025-08-07 02:10:00,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:10:01,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 14.68996 ± 5.242
2025-08-07 02:10:01,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [16.423092, 16.412947, 11.04798, 14.097329, 10.914272, 12.086937, 7.08069, 25.912006, 21.156967, 11.7673645]
2025-08-07 02:10:01,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [22.0, 17.0, 14.0, 21.0, 14.0, 16.0, 11.0, 23.0, 26.0, 16.0]
2025-08-07 02:10:01,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 58 minutes, 57 seconds)
2025-08-07 02:11:42,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:11:42,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 33.63195 ± 35.865
2025-08-07 02:11:42,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [7.9009976, 13.901815, 105.297554, 13.659559, 10.649376, 27.881311, 20.64589, 25.173904, 8.118158, 103.09089]
2025-08-07 02:11:42,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [17.0, 19.0, 79.0, 26.0, 15.0, 27.0, 41.0, 25.0, 16.0, 95.0]
2025-08-07 02:11:42,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 56 minutes, 53 seconds)
2025-08-07 02:13:24,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:13:25,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 38.86067 ± 31.613
2025-08-07 02:13:25,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [34.486053, 21.461708, 24.662674, 17.291336, 17.72999, 56.645218, 15.984181, 82.25836, 8.940368, 109.14682]
2025-08-07 02:13:25,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [28.0, 23.0, 21.0, 19.0, 19.0, 88.0, 17.0, 88.0, 13.0, 83.0]
2025-08-07 02:13:25,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 54 minutes, 47 seconds)
2025-08-07 02:15:09,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:15:10,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 44.16636 ± 88.876
2025-08-07 02:15:10,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [13.373575, 310.12564, 18.337425, 31.226366, 13.24721, 6.1483827, 13.592572, 11.696296, 10.165895, 13.750259]
2025-08-07 02:15:10,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [17.0, 250.0, 47.0, 25.0, 23.0, 15.0, 17.0, 20.0, 17.0, 18.0]
2025-08-07 02:15:10,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1226 [INFO]: New best (44.17) for latency ExtremeSparseL4U32
2025-08-07 02:15:10,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 53 minutes, 51 seconds)
2025-08-07 02:16:53,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:16:53,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 33.31606 ± 29.986
2025-08-07 02:16:53,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [38.947018, 32.614838, 113.98815, 12.213273, 19.182508, 16.582966, 55.293472, 11.908237, 16.164215, 16.265972]
2025-08-07 02:16:53,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [56.0, 32.0, 127.0, 16.0, 22.0, 18.0, 41.0, 16.0, 25.0, 28.0]
2025-08-07 02:16:53,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 51 minutes, 43 seconds)
2025-08-07 02:18:36,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:18:37,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 40.25562 ± 39.846
2025-08-07 02:18:37,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [30.639372, 21.298018, 83.45476, 24.797123, 143.95813, 18.675505, 36.027378, 7.39774, 21.724571, 14.583609]
2025-08-07 02:18:37,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [27.0, 22.0, 94.0, 23.0, 118.0, 24.0, 33.0, 15.0, 27.0, 20.0]
2025-08-07 02:18:37,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 50 minutes, 10 seconds)
2025-08-07 02:20:21,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:20:21,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 35.94577 ± 32.815
2025-08-07 02:20:21,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [14.467773, 23.19827, 10.546984, 34.95907, 103.4528, 93.948814, 38.467846, 10.9818945, 9.937568, 19.496742]
2025-08-07 02:20:21,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [24.0, 33.0, 21.0, 56.0, 71.0, 79.0, 30.0, 22.0, 16.0, 21.0]
2025-08-07 02:20:21,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 48 minutes, 57 seconds)
2025-08-07 02:22:06,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:22:06,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 29.55781 ± 26.643
2025-08-07 02:22:06,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [6.078508, 15.002106, 12.16707, 8.963739, 8.184224, 24.449984, 55.74916, 74.32713, 14.195788, 76.46039]
2025-08-07 02:22:06,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [16.0, 16.0, 15.0, 12.0, 15.0, 24.0, 51.0, 46.0, 20.0, 75.0]
2025-08-07 02:22:06,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 47 minutes, 40 seconds)
2025-08-07 02:23:51,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:23:51,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 40.95120 ± 48.525
2025-08-07 02:23:51,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [15.463121, 24.569523, 11.982189, 166.30771, 99.36453, 24.049479, 22.9394, 18.286877, 15.190429, 11.358772]
2025-08-07 02:23:51,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [17.0, 25.0, 21.0, 107.0, 64.0, 36.0, 25.0, 24.0, 18.0, 12.0]
2025-08-07 02:23:51,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 45 minutes, 57 seconds)
2025-08-07 02:25:35,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:25:35,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 38.62443 ± 33.888
2025-08-07 02:25:35,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [14.631739, 17.201717, 25.031475, 80.41529, 16.418839, 19.164368, 11.603479, 111.87261, 72.06902, 17.83574]
2025-08-07 02:25:35,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [18.0, 23.0, 23.0, 65.0, 26.0, 21.0, 13.0, 92.0, 46.0, 18.0]
2025-08-07 02:25:35,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 44 minutes, 23 seconds)
2025-08-07 02:27:18,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:27:18,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 25.78074 ± 22.577
2025-08-07 02:27:18,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [89.92064, 19.910788, 16.787199, 15.295315, 20.86124, 38.95538, 14.097389, 13.255459, 12.602683, 16.12134]
2025-08-07 02:27:18,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [63.0, 22.0, 20.0, 29.0, 22.0, 35.0, 19.0, 31.0, 20.0, 18.0]
2025-08-07 02:27:18,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 42 minutes, 31 seconds)
2025-08-07 02:29:00,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:29:00,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 35.78127 ± 50.417
2025-08-07 02:29:00,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [29.561262, 13.914602, 11.946004, 10.737883, 15.842952, 9.973533, 54.369118, 15.520723, 13.85486, 182.0918]
2025-08-07 02:29:00,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [27.0, 19.0, 31.0, 14.0, 19.0, 15.0, 62.0, 23.0, 22.0, 119.0]
2025-08-07 02:29:00,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 40 minutes, 21 seconds)
2025-08-07 02:30:43,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:30:43,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 32.49931 ± 40.090
2025-08-07 02:30:43,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [145.09387, 18.41861, 12.008944, 11.20938, 14.362226, 11.508999, 60.365837, 17.73714, 10.536245, 23.751871]
2025-08-07 02:30:43,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [82.0, 26.0, 27.0, 14.0, 22.0, 16.0, 43.0, 22.0, 19.0, 24.0]
2025-08-07 02:30:43,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 38 minutes, 17 seconds)
2025-08-07 02:32:28,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:32:28,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 31.92026 ± 38.292
2025-08-07 02:32:28,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [11.6624155, 16.727976, 132.88818, 20.260809, 73.19502, 8.215389, 14.118546, 13.921269, 22.772896, 5.440085]
2025-08-07 02:32:28,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [14.0, 41.0, 70.0, 27.0, 50.0, 12.0, 15.0, 16.0, 22.0, 9.0]
2025-08-07 02:32:28,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 36 minutes, 31 seconds)
2025-08-07 02:34:10,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:34:11,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 36.59510 ± 49.123
2025-08-07 02:34:11,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [73.49719, 11.594172, 21.634544, 12.037001, 173.6686, 12.08926, 21.648079, 9.665162, 20.119389, 9.99766]
2025-08-07 02:34:11,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [77.0, 22.0, 19.0, 15.0, 108.0, 20.0, 30.0, 13.0, 21.0, 12.0]
2025-08-07 02:34:11,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 34 minutes, 30 seconds)
2025-08-07 02:35:55,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:35:56,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 44.16808 ± 53.533
2025-08-07 02:35:56,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [163.25964, 135.8033, 13.287921, 14.298687, 11.597741, 34.579315, 9.626342, 22.283876, 25.057339, 11.886617]
2025-08-07 02:35:56,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [135.0, 72.0, 15.0, 22.0, 21.0, 28.0, 13.0, 26.0, 25.0, 14.0]
2025-08-07 02:35:56,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1226 [INFO]: New best (44.17) for latency ExtremeSparseL4U32
2025-08-07 02:35:56,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 33 minutes, 7 seconds)
2025-08-07 02:37:41,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:37:42,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 27.11657 ± 30.515
2025-08-07 02:37:42,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [15.765522, 15.785636, 12.761176, 12.633884, 11.560259, 12.295767, 20.087189, 17.908457, 36.011738, 116.356064]
2025-08-07 02:37:42,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [16.0, 22.0, 17.0, 18.0, 13.0, 19.0, 26.0, 23.0, 94.0, 98.0]
2025-08-07 02:37:42,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 32 minutes, 5 seconds)
2025-08-07 02:39:25,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:39:26,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 17.62243 ± 12.024
2025-08-07 02:39:26,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [17.718082, 7.597447, 12.17315, 9.422058, 18.303366, 17.832111, 22.11032, 50.97475, 9.574258, 10.518722]
2025-08-07 02:39:26,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [25.0, 11.0, 14.0, 14.0, 20.0, 24.0, 21.0, 43.0, 11.0, 15.0]
2025-08-07 02:39:26,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 30 minutes, 31 seconds)
2025-08-07 02:41:09,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:41:10,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 26.91711 ± 24.516
2025-08-07 02:41:10,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [82.73166, 11.177033, 17.057339, 64.95766, 9.474715, 10.55219, 30.888885, 10.431563, 14.715529, 17.18454]
2025-08-07 02:41:10,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [69.0, 14.0, 19.0, 43.0, 19.0, 14.0, 30.0, 14.0, 25.0, 20.0]
2025-08-07 02:41:10,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 28 minutes, 38 seconds)
2025-08-07 02:42:52,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:42:52,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 18.46617 ± 18.732
2025-08-07 02:42:52,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [16.142712, 9.846969, 73.28284, 21.003386, 13.97447, 7.031015, 15.395828, 8.66741, 7.8734746, 11.44356]
2025-08-07 02:42:52,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [16.0, 16.0, 42.0, 21.0, 16.0, 10.0, 22.0, 12.0, 27.0, 13.0]
2025-08-07 02:42:52,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 26 minutes, 50 seconds)
2025-08-07 02:44:37,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:44:37,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 27.64316 ± 41.274
2025-08-07 02:44:37,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [150.96097, 6.4778657, 13.697018, 13.509615, 14.55913, 12.489411, 20.576937, 10.603987, 18.843374, 14.713351]
2025-08-07 02:44:37,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [86.0, 11.0, 19.0, 15.0, 26.0, 21.0, 27.0, 13.0, 26.0, 17.0]
2025-08-07 02:44:37,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 25 minutes, 10 seconds)
2025-08-07 02:46:20,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:46:20,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 21.99505 ± 22.913
2025-08-07 02:46:20,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [15.716825, 13.271011, 17.074978, 89.44074, 9.773253, 6.3829713, 15.266977, 17.215775, 23.67321, 12.13477]
2025-08-07 02:46:20,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [18.0, 15.0, 23.0, 67.0, 13.0, 10.0, 16.0, 29.0, 26.0, 16.0]
2025-08-07 02:46:20,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 22 minutes, 59 seconds)
2025-08-07 02:48:04,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:48:05,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 24.41507 ± 15.486
2025-08-07 02:48:05,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [52.155552, 9.185614, 15.951982, 6.763644, 44.067455, 10.912587, 17.332666, 26.13572, 43.264404, 18.38106]
2025-08-07 02:48:05,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [100.0, 13.0, 17.0, 10.0, 36.0, 19.0, 20.0, 48.0, 40.0, 18.0]
2025-08-07 02:48:05,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 21 minutes, 21 seconds)
2025-08-07 02:49:48,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:49:48,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 24.76510 ± 23.795
2025-08-07 02:49:48,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [16.37277, 16.622791, 14.902812, 12.860002, 93.189835, 34.060246, 14.897079, 11.555856, 24.100447, 9.089206]
2025-08-07 02:49:48,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 17.0, 18.0, 22.0, 123.0, 32.0, 26.0, 17.0, 23.0, 15.0]
2025-08-07 02:49:48,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 19 minutes, 27 seconds)
2025-08-07 02:51:32,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:51:32,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 42.95992 ± 44.327
2025-08-07 02:51:32,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [14.400108, 23.378958, 117.916595, 16.554337, 10.520154, 6.311661, 125.36182, 24.635868, 81.33008, 9.189593]
2025-08-07 02:51:32,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [24.0, 21.0, 128.0, 31.0, 12.0, 9.0, 69.0, 26.0, 61.0, 16.0]
2025-08-07 02:51:32,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 18 minutes, 5 seconds)
2025-08-07 02:53:17,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:53:17,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 21.51310 ± 19.773
2025-08-07 02:53:17,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [15.497418, 25.382868, 9.172784, 12.133489, 76.50642, 10.283673, 7.780601, 14.053629, 11.602084, 32.717983]
2025-08-07 02:53:17,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [16.0, 25.0, 18.0, 20.0, 51.0, 15.0, 12.0, 17.0, 22.0, 30.0]
2025-08-07 02:53:17,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 16 minutes, 13 seconds)
2025-08-07 02:55:02,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:55:02,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 29.60878 ± 35.341
2025-08-07 02:55:02,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [21.777508, 42.925735, 11.025568, 10.960473, 16.70613, 9.769536, 19.826141, 13.899848, 132.07544, 17.12141]
2025-08-07 02:55:02,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [28.0, 56.0, 18.0, 15.0, 24.0, 14.0, 19.0, 21.0, 90.0, 17.0]
2025-08-07 02:55:02,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 14 minutes, 47 seconds)
2025-08-07 02:56:46,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:56:47,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 37.21767 ± 50.293
2025-08-07 02:56:47,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [22.180801, 21.928442, 10.797632, 185.43884, 28.977829, 10.96019, 7.0074253, 33.694706, 15.243968, 35.94681]
2025-08-07 02:56:47,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [26.0, 25.0, 12.0, 146.0, 29.0, 20.0, 10.0, 29.0, 18.0, 61.0]
2025-08-07 02:56:47,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 13 minutes, 1 second)
2025-08-07 02:58:30,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:58:30,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 20.24604 ± 6.068
2025-08-07 02:58:30,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [24.808764, 13.278444, 32.85696, 19.9115, 18.363688, 24.608973, 19.433723, 10.684463, 15.8635645, 22.650331]
2025-08-07 02:58:30,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [31.0, 17.0, 31.0, 20.0, 18.0, 27.0, 21.0, 13.0, 19.0, 22.0]
2025-08-07 02:58:30,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 11 minutes, 19 seconds)
2025-08-07 03:00:13,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:00:14,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 39.01815 ± 47.279
2025-08-07 03:00:14,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [13.965325, 33.0028, 7.9235168, 152.87393, 23.280087, 107.88521, 11.202078, 10.943747, 12.329759, 16.77495]
2025-08-07 03:00:14,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [22.0, 31.0, 12.0, 115.0, 21.0, 192.0, 12.0, 14.0, 18.0, 27.0]
2025-08-07 03:00:14,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 9 minutes, 30 seconds)
2025-08-07 03:01:59,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:02:00,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 51.59649 ± 55.700
2025-08-07 03:02:00,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [19.712418, 143.54005, 154.21492, 16.904268, 9.777, 17.017994, 106.32222, 24.831661, 11.070011, 12.574398]
2025-08-07 03:02:00,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [27.0, 217.0, 180.0, 21.0, 18.0, 23.0, 78.0, 24.0, 21.0, 19.0]
2025-08-07 03:02:00,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1226 [INFO]: New best (51.60) for latency ExtremeSparseL4U32
2025-08-07 03:02:00,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 8 minutes)
2025-08-07 03:03:41,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:03:42,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 30.52149 ± 44.883
2025-08-07 03:03:42,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [20.198656, 14.730096, 12.013504, 8.976168, 11.927371, 20.006182, 20.361006, 21.178844, 11.268722, 164.55437]
2025-08-07 03:03:42,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 15.0, 28.0, 13.0, 21.0, 18.0, 20.0, 22.0, 15.0, 140.0]
2025-08-07 03:03:42,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 5 minutes, 50 seconds)
2025-08-07 03:05:24,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:05:25,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 47.63236 ± 47.975
2025-08-07 03:05:25,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [33.91826, 86.721504, 15.297924, 6.303132, 13.567476, 79.72875, 164.34842, 10.974867, 54.467094, 10.99617]
2025-08-07 03:05:25,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [32.0, 114.0, 20.0, 18.0, 15.0, 81.0, 101.0, 13.0, 79.0, 17.0]
2025-08-07 03:05:25,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 3 minutes, 57 seconds)
2025-08-07 03:07:10,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:07:10,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 43.65682 ± 54.675
2025-08-07 03:07:10,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [8.054427, 16.453543, 24.829351, 91.74639, 9.188252, 187.06819, 12.040489, 12.708988, 63.927547, 10.551057]
2025-08-07 03:07:10,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [12.0, 24.0, 22.0, 58.0, 11.0, 101.0, 14.0, 15.0, 38.0, 12.0]
2025-08-07 03:07:10,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 2 minutes, 25 seconds)
2025-08-07 03:08:55,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:08:56,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 77.78809 ± 72.131
2025-08-07 03:08:56,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [15.999756, 109.08113, 49.22158, 264.9096, 26.252552, 22.628471, 14.481461, 118.11947, 72.82407, 84.36271]
2025-08-07 03:08:56,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [21.0, 61.0, 66.0, 208.0, 27.0, 24.0, 19.0, 64.0, 45.0, 59.0]
2025-08-07 03:08:56,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1226 [INFO]: New best (77.79) for latency ExtremeSparseL4U32
2025-08-07 03:08:56,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 56 seconds)
2025-08-07 03:10:40,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:10:40,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 19.57233 ± 10.556
2025-08-07 03:10:40,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [15.489983, 10.695983, 9.752988, 21.499548, 24.85858, 9.221412, 45.21353, 10.991442, 21.412893, 26.586926]
2025-08-07 03:10:40,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [16.0, 16.0, 13.0, 23.0, 22.0, 24.0, 32.0, 13.0, 21.0, 28.0]
2025-08-07 03:10:40,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 67/100 (estimated time remaining: 58 minutes, 56 seconds)
2025-08-07 03:12:22,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:12:23,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 52.31341 ± 43.715
2025-08-07 03:12:23,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [129.37076, 127.947556, 68.29067, 10.48422, 8.109992, 13.308464, 33.230255, 64.64868, 53.55459, 14.188908]
2025-08-07 03:12:23,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [89.0, 107.0, 40.0, 12.0, 10.0, 30.0, 29.0, 47.0, 64.0, 16.0]
2025-08-07 03:12:23,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 68/100 (estimated time remaining: 57 minutes, 18 seconds)
2025-08-07 03:14:03,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:14:04,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 25.47196 ± 18.938
2025-08-07 03:14:04,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [10.940712, 18.7838, 23.963951, 14.721105, 25.48051, 17.048895, 11.608144, 15.911935, 39.092564, 77.16803]
2025-08-07 03:14:04,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [14.0, 19.0, 22.0, 28.0, 23.0, 19.0, 14.0, 21.0, 33.0, 47.0]
2025-08-07 03:14:04,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 69/100 (estimated time remaining: 55 minutes, 20 seconds)
2025-08-07 03:15:46,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:15:47,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 32.30915 ± 27.306
2025-08-07 03:15:47,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [107.24433, 35.460648, 11.567535, 21.870134, 51.17881, 18.430996, 24.09994, 12.903656, 18.673582, 21.661867]
2025-08-07 03:15:47,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [72.0, 31.0, 16.0, 22.0, 45.0, 20.0, 24.0, 15.0, 24.0, 20.0]
2025-08-07 03:15:47,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 70/100 (estimated time remaining: 53 minutes, 23 seconds)
2025-08-07 03:17:30,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:17:30,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 15.39927 ± 10.408
2025-08-07 03:17:30,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [9.194202, 42.12552, 14.771576, 9.938023, 9.35494, 19.070595, 12.593141, 23.92552, 4.1075816, 8.911628]
2025-08-07 03:17:30,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [11.0, 59.0, 16.0, 11.0, 11.0, 31.0, 27.0, 27.0, 11.0, 14.0]
2025-08-07 03:17:30,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 71/100 (estimated time remaining: 51 minutes, 25 seconds)
2025-08-07 03:19:14,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:19:15,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 44.19426 ± 37.524
2025-08-07 03:19:15,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [8.511362, 30.372074, 24.197065, 78.82372, 12.423239, 115.14688, 29.281363, 11.488699, 28.812807, 102.885345]
2025-08-07 03:19:15,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [12.0, 27.0, 25.0, 74.0, 18.0, 132.0, 28.0, 22.0, 26.0, 79.0]
2025-08-07 03:19:15,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 72/100 (estimated time remaining: 49 minutes, 44 seconds)
2025-08-07 03:20:58,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:20:58,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 22.08812 ± 27.774
2025-08-07 03:20:58,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [11.982079, 104.6078, 14.722998, 13.654508, 12.413454, 10.082411, 6.7422767, 15.716602, 9.314516, 21.64456]
2025-08-07 03:20:58,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [13.0, 134.0, 21.0, 19.0, 17.0, 13.0, 11.0, 19.0, 13.0, 25.0]
2025-08-07 03:20:58,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 73/100 (estimated time remaining: 48 minutes, 4 seconds)
2025-08-07 03:22:42,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:22:42,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 27.07025 ± 45.739
2025-08-07 03:22:42,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [9.754093, 8.760524, 14.2505665, 8.285511, 18.081314, 10.081676, 163.94623, 15.5294285, 13.679371, 8.333712]
2025-08-07 03:22:42,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [12.0, 16.0, 29.0, 12.0, 20.0, 20.0, 80.0, 19.0, 15.0, 10.0]
2025-08-07 03:22:42,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 74/100 (estimated time remaining: 46 minutes, 37 seconds)
2025-08-07 03:24:26,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:24:27,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 41.76682 ± 58.098
2025-08-07 03:24:27,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [35.46306, 23.560982, 213.4459, 33.304977, 21.36475, 14.156786, 12.584161, 9.234828, 14.454536, 40.098198]
2025-08-07 03:24:27,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [28.0, 23.0, 111.0, 27.0, 26.0, 21.0, 22.0, 16.0, 18.0, 30.0]
2025-08-07 03:24:27,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 75/100 (estimated time remaining: 45 minutes, 5 seconds)
2025-08-07 03:26:11,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:26:11,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 56.21198 ± 76.753
2025-08-07 03:26:11,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [17.67575, 29.057625, 189.31032, 10.432549, 9.527452, 227.1875, 19.246365, 24.071974, 25.386261, 10.224041]
2025-08-07 03:26:11,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [30.0, 36.0, 170.0, 20.0, 13.0, 120.0, 21.0, 25.0, 32.0, 15.0]
2025-08-07 03:26:11,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 76/100 (estimated time remaining: 43 minutes, 25 seconds)
2025-08-07 03:27:55,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:27:55,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 29.01661 ± 27.685
2025-08-07 03:27:55,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [69.95952, 93.714035, 13.582855, 11.345132, 12.025768, 10.657881, 12.230948, 26.850615, 29.222015, 10.577405]
2025-08-07 03:27:55,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [41.0, 59.0, 16.0, 19.0, 18.0, 13.0, 15.0, 24.0, 26.0, 13.0]
2025-08-07 03:27:55,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 77/100 (estimated time remaining: 41 minutes, 39 seconds)
2025-08-07 03:29:40,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:29:40,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 16.14556 ± 5.518
2025-08-07 03:29:40,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [19.486954, 14.2074795, 13.955354, 30.164341, 12.173395, 17.993265, 10.24814, 10.625665, 14.940982, 17.660046]
2025-08-07 03:29:40,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [32.0, 25.0, 18.0, 29.0, 16.0, 23.0, 15.0, 14.0, 31.0, 24.0]
2025-08-07 03:29:40,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 78/100 (estimated time remaining: 40 minutes, 2 seconds)
2025-08-07 03:31:24,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:31:25,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 42.99259 ± 35.902
2025-08-07 03:31:25,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [20.313324, 31.670355, 20.871223, 14.000805, 20.963774, 28.086308, 92.96819, 45.549095, 129.1774, 26.325365]
2025-08-07 03:31:25,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [26.0, 30.0, 20.0, 19.0, 24.0, 26.0, 69.0, 48.0, 94.0, 23.0]
2025-08-07 03:31:25,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 79/100 (estimated time remaining: 38 minutes, 20 seconds)
2025-08-07 03:33:09,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:33:09,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 15.40832 ± 3.690
2025-08-07 03:33:09,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [14.321544, 18.672594, 20.660465, 12.905882, 10.657129, 17.360975, 9.995653, 11.9819765, 19.292889, 18.234102]
2025-08-07 03:33:09,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [17.0, 17.0, 28.0, 18.0, 13.0, 19.0, 23.0, 25.0, 25.0, 32.0]
2025-08-07 03:33:09,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 80/100 (estimated time remaining: 36 minutes, 32 seconds)
2025-08-07 03:34:52,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:34:52,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 23.35007 ± 18.088
2025-08-07 03:34:52,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [75.29675, 17.776636, 16.08085, 20.254065, 28.942698, 22.819479, 16.10552, 11.574623, 9.671197, 14.978883]
2025-08-07 03:34:52,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [64.0, 24.0, 28.0, 25.0, 26.0, 28.0, 20.0, 22.0, 12.0, 16.0]
2025-08-07 03:34:52,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 81/100 (estimated time remaining: 34 minutes, 43 seconds)
2025-08-07 03:36:36,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:36:36,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 33.71886 ± 50.128
2025-08-07 03:36:36,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [8.134622, 12.112956, 12.358242, 43.361675, 181.28432, 10.657001, 21.702293, 16.750467, 20.269566, 10.557469]
2025-08-07 03:36:36,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [11.0, 14.0, 14.0, 33.0, 174.0, 17.0, 20.0, 31.0, 25.0, 13.0]
2025-08-07 03:36:36,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 82/100 (estimated time remaining: 32 minutes, 59 seconds)
2025-08-07 03:38:19,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:38:20,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 54.71185 ± 65.961
2025-08-07 03:38:20,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [9.00284, 31.508453, 12.720253, 14.266328, 12.08676, 13.728188, 19.340363, 72.84513, 198.17787, 163.44226]
2025-08-07 03:38:20,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 32.0, 23.0, 23.0, 18.0, 21.0, 32.0, 50.0, 114.0, 86.0]
2025-08-07 03:38:20,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 83/100 (estimated time remaining: 31 minutes, 9 seconds)
2025-08-07 03:40:03,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:40:03,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 25.19120 ± 18.646
2025-08-07 03:40:03,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [22.402706, 9.255408, 10.613343, 31.587069, 31.695541, 76.70919, 19.736444, 19.169128, 17.159298, 13.583864]
2025-08-07 03:40:03,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [27.0, 15.0, 22.0, 29.0, 31.0, 68.0, 32.0, 19.0, 23.0, 21.0]
2025-08-07 03:40:03,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 84/100 (estimated time remaining: 29 minutes, 23 seconds)
2025-08-07 03:41:46,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:41:47,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 43.15329 ± 58.700
2025-08-07 03:41:47,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [18.794283, 13.520499, 17.843779, 14.009358, 10.995479, 9.107184, 14.788671, 11.841373, 165.9431, 154.6892]
2025-08-07 03:41:47,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [20.0, 15.0, 20.0, 15.0, 17.0, 13.0, 21.0, 26.0, 94.0, 77.0]
2025-08-07 03:41:47,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 85/100 (estimated time remaining: 27 minutes, 36 seconds)
2025-08-07 03:43:31,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:43:32,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 44.90623 ± 36.631
2025-08-07 03:43:32,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [102.68382, 27.43657, 14.382816, 14.37367, 9.7082205, 83.38811, 12.052553, 93.887825, 16.318512, 74.83027]
2025-08-07 03:43:32,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [62.0, 31.0, 17.0, 19.0, 16.0, 96.0, 19.0, 68.0, 20.0, 76.0]
2025-08-07 03:43:32,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 86/100 (estimated time remaining: 25 minutes, 58 seconds)
2025-08-07 03:45:15,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:45:16,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 20.20824 ± 16.701
2025-08-07 03:45:16,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [32.308243, 8.298321, 13.753143, 11.467695, 25.212608, 9.636157, 12.869395, 11.293228, 11.839528, 65.40407]
2025-08-07 03:45:16,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [26.0, 11.0, 16.0, 16.0, 28.0, 26.0, 22.0, 15.0, 17.0, 59.0]
2025-08-07 03:45:16,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 87/100 (estimated time remaining: 24 minutes, 14 seconds)
2025-08-07 03:46:58,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:46:58,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 32.18540 ± 31.297
2025-08-07 03:46:58,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [102.18183, 20.53344, 11.809002, 20.2307, 11.833754, 84.7928, 11.516249, 15.7042265, 14.630412, 28.62163]
2025-08-07 03:46:58,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [70.0, 28.0, 29.0, 22.0, 17.0, 63.0, 15.0, 19.0, 21.0, 27.0]
2025-08-07 03:46:58,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 88/100 (estimated time remaining: 22 minutes, 28 seconds)
2025-08-07 03:48:42,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:48:43,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 43.58161 ± 43.041
2025-08-07 03:48:43,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [18.632393, 16.887028, 11.490083, 126.91929, 31.788216, 21.101067, 126.350975, 16.714151, 52.625782, 13.307067]
2025-08-07 03:48:43,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [21.0, 28.0, 14.0, 65.0, 30.0, 19.0, 65.0, 26.0, 49.0, 21.0]
2025-08-07 03:48:43,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 89/100 (estimated time remaining: 20 minutes, 46 seconds)
2025-08-07 03:50:27,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:50:27,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 43.40571 ± 52.695
2025-08-07 03:50:27,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [13.492677, 14.981555, 146.49559, 24.7237, 8.464707, 144.96707, 52.679497, 8.548242, 9.745349, 9.958741]
2025-08-07 03:50:27,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 20.0, 85.0, 25.0, 12.0, 87.0, 47.0, 13.0, 14.0, 26.0]
2025-08-07 03:50:27,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 90/100 (estimated time remaining: 19 minutes, 5 seconds)
2025-08-07 03:52:11,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:52:11,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 14.42476 ± 3.822
2025-08-07 03:52:11,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [10.353611, 17.10275, 12.383762, 11.353066, 11.844054, 12.834508, 20.188818, 22.167934, 14.101162, 11.917914]
2025-08-07 03:52:11,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [12.0, 17.0, 19.0, 17.0, 14.0, 15.0, 21.0, 22.0, 15.0, 19.0]
2025-08-07 03:52:12,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 91/100 (estimated time remaining: 17 minutes, 19 seconds)
2025-08-07 03:53:55,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:53:56,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 21.63287 ± 10.695
2025-08-07 03:53:56,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [23.038221, 19.825018, 13.285987, 17.22573, 15.038255, 29.557428, 20.463636, 49.24798, 20.33956, 8.306886]
2025-08-07 03:53:56,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [21.0, 27.0, 22.0, 23.0, 19.0, 35.0, 26.0, 54.0, 18.0, 15.0]
2025-08-07 03:53:56,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 92/100 (estimated time remaining: 15 minutes, 36 seconds)
2025-08-07 03:55:39,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:55:40,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 41.29261 ± 55.256
2025-08-07 03:55:40,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [107.271545, 13.604989, 8.619943, 183.8487, 14.745, 6.4732213, 29.370344, 18.317987, 21.979666, 8.694661]
2025-08-07 03:55:40,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [80.0, 16.0, 13.0, 84.0, 16.0, 21.0, 24.0, 21.0, 27.0, 11.0]
2025-08-07 03:55:40,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 93/100 (estimated time remaining: 13 minutes, 54 seconds)
2025-08-07 03:57:23,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:57:23,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 35.57835 ± 48.306
2025-08-07 03:57:23,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [34.69395, 21.71792, 7.2339535, 18.247591, 27.40599, 15.651104, 178.78638, 13.239227, 24.072315, 14.734989]
2025-08-07 03:57:23,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [38.0, 24.0, 10.0, 24.0, 25.0, 29.0, 119.0, 17.0, 25.0, 30.0]
2025-08-07 03:57:24,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 94/100 (estimated time remaining: 12 minutes, 8 seconds)
2025-08-07 03:59:06,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:59:07,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 14.06700 ± 6.343
2025-08-07 03:59:07,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [29.402586, 9.603114, 16.126534, 8.205562, 12.139516, 13.445621, 7.373284, 18.78024, 16.936604, 8.656897]
2025-08-07 03:59:07,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [40.0, 11.0, 17.0, 13.0, 15.0, 16.0, 12.0, 21.0, 22.0, 12.0]
2025-08-07 03:59:07,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 23 seconds)
2025-08-07 04:00:48,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:00:48,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 21.75195 ± 25.596
2025-08-07 04:00:48,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [9.561598, 6.2985973, 9.226858, 11.38447, 20.217993, 30.617176, 10.3841915, 7.591002, 95.65858, 16.578997]
2025-08-07 04:00:48,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [11.0, 13.0, 14.0, 20.0, 29.0, 25.0, 12.0, 12.0, 77.0, 17.0]
2025-08-07 04:00:48,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 36 seconds)
2025-08-07 04:02:30,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:02:31,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 57.66872 ± 68.598
2025-08-07 04:02:31,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [7.73559, 16.998384, 132.89305, 15.054932, 16.769596, 15.001488, 9.384137, 129.62434, 210.11162, 23.114]
2025-08-07 04:02:31,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [31.0, 28.0, 121.0, 30.0, 17.0, 23.0, 16.0, 66.0, 108.0, 39.0]
2025-08-07 04:02:31,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 52 seconds)
2025-08-07 04:04:13,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:04:13,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 34.35337 ± 25.923
2025-08-07 04:04:13,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [20.554964, 21.814875, 15.620332, 36.227398, 11.472974, 17.281515, 65.68852, 31.861593, 98.52278, 24.488747]
2025-08-07 04:04:13,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [24.0, 20.0, 17.0, 31.0, 21.0, 24.0, 60.0, 29.0, 71.0, 28.0]
2025-08-07 04:04:13,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 8 seconds)
2025-08-07 04:05:55,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:05:55,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 29.37856 ± 30.987
2025-08-07 04:05:55,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [23.433558, 30.14039, 10.687499, 27.505955, 119.830635, 8.6160555, 14.95068, 25.259829, 21.817242, 11.543771]
2025-08-07 04:05:55,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [25.0, 27.0, 13.0, 27.0, 63.0, 11.0, 16.0, 29.0, 24.0, 14.0]
2025-08-07 04:05:55,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 24 seconds)
2025-08-07 04:07:37,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:07:37,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 24.70705 ± 32.055
2025-08-07 04:07:37,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [14.127514, 117.708664, 35.251453, 9.042166, 12.953746, 22.759085, 10.060685, 8.39584, 9.452605, 7.318729]
2025-08-07 04:07:37,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [30.0, 86.0, 30.0, 15.0, 14.0, 22.0, 12.0, 11.0, 13.0, 10.0]
2025-08-07 04:07:37,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 42 seconds)
2025-08-07 04:09:21,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:09:21,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1221 [DEBUG]: Total Reward: 35.85273 ± 63.649
2025-08-07 04:09:21,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1222 [DEBUG]: All rewards: [226.51541, 20.702408, 14.6652975, 10.152501, 19.539383, 11.39226, 12.359133, 16.060087, 10.415324, 16.725475]
2025-08-07 04:09:21,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1223 [DEBUG]: All trajectory lengths: [153.0, 20.0, 24.0, 17.0, 28.0, 13.0, 16.0, 19.0, 20.0, 25.0]
2025-08-07 04:09:21,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-hopper):1251 [DEBUG]: Training session finished
