2025-08-07 00:48:25,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc10-hopper/ExtremeSparseL4U32-bpql-mem32
2025-08-07 00:48:25,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc10-hopper/ExtremeSparseL4U32-bpql-mem32
2025-08-07 00:48:25,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x14f2b651fa90>}
2025-08-07 00:48:25,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1111 [DEBUG]: using device: cuda
2025-08-07 00:48:25,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1133 [INFO]: Creating new trainer
2025-08-07 00:48:25,771 baseline-bpql-noiseperc10-hopper:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=107, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-08-07 00:48:25,771 baseline-bpql-noiseperc10-hopper:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 00:48:27,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1194 [DEBUG]: Starting training session...
2025-08-07 00:48:27,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 1/100
2025-08-07 00:50:00,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:50:01,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 40.54405 ± 8.267
2025-08-07 00:50:01,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [25.641478, 39.055206, 39.09038, 30.633324, 36.749855, 47.65722, 45.898148, 37.116062, 51.696968, 51.90188]
2025-08-07 00:50:01,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [31.0, 44.0, 38.0, 32.0, 43.0, 54.0, 51.0, 34.0, 53.0, 53.0]
2025-08-07 00:50:01,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1226 [INFO]: New best (40.54) for latency ExtremeSparseL4U32
2025-08-07 00:50:01,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 35 minutes, 19 seconds)
2025-08-07 00:51:40,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:51:41,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 87.98789 ± 48.210
2025-08-07 00:51:41,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [17.838223, 113.235344, 73.613174, 153.9379, 25.007725, 98.53998, 62.214127, 150.01143, 140.14134, 45.339645]
2025-08-07 00:51:41,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [21.0, 74.0, 64.0, 90.0, 28.0, 74.0, 57.0, 86.0, 100.0, 50.0]
2025-08-07 00:51:41,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1226 [INFO]: New best (87.99) for latency ExtremeSparseL4U32
2025-08-07 00:51:41,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 39 minutes)
2025-08-07 00:53:21,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:53:22,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 63.59553 ± 59.780
2025-08-07 00:53:22,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [21.930162, 26.449018, 129.09192, 18.432062, 210.6795, 77.53787, 61.074665, 11.07915, 22.966011, 56.714977]
2025-08-07 00:53:22,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [24.0, 31.0, 82.0, 20.0, 114.0, 70.0, 64.0, 24.0, 24.0, 57.0]
2025-08-07 00:53:22,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 39 minutes, 15 seconds)
2025-08-07 00:55:02,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:55:03,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 72.81276 ± 68.609
2025-08-07 00:55:03,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [20.53108, 188.91515, 23.482672, 207.53255, 24.150246, 92.42058, 24.271187, 27.75587, 96.844505, 22.223787]
2025-08-07 00:55:03,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [28.0, 105.0, 25.0, 116.0, 32.0, 63.0, 39.0, 34.0, 65.0, 29.0]
2025-08-07 00:55:03,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 38 minutes, 27 seconds)
2025-08-07 00:56:42,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:56:43,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 30.54389 ± 28.397
2025-08-07 00:56:43,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [18.190186, 14.2988, 20.186113, 27.508162, 28.4602, 114.52765, 17.677671, 17.198765, 27.944296, 19.447052]
2025-08-07 00:56:43,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [22.0, 19.0, 30.0, 31.0, 29.0, 81.0, 21.0, 21.0, 31.0, 31.0]
2025-08-07 00:56:43,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 37 minutes, 4 seconds)
2025-08-07 00:58:23,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:58:24,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 45.56779 ± 23.005
2025-08-07 00:58:24,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [38.639076, 77.062195, 27.07835, 23.006983, 72.14158, 17.139673, 48.430546, 22.0318, 81.870155, 48.277557]
2025-08-07 00:58:24,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [42.0, 59.0, 32.0, 23.0, 62.0, 22.0, 48.0, 26.0, 50.0, 51.0]
2025-08-07 00:58:24,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 37 minutes, 32 seconds)
2025-08-07 01:00:04,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:00:04,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 59.85668 ± 50.655
2025-08-07 01:00:04,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [19.477165, 14.789908, 103.08536, 26.325308, 20.914639, 36.825134, 28.05569, 52.132225, 157.91501, 139.04642]
2025-08-07 01:00:04,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [24.0, 20.0, 70.0, 26.0, 23.0, 46.0, 30.0, 52.0, 115.0, 100.0]
2025-08-07 01:00:04,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 35 minutes, 54 seconds)
2025-08-07 01:01:45,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:01:46,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 89.89175 ± 79.953
2025-08-07 01:01:46,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [264.2555, 23.708084, 44.735096, 165.62561, 143.93611, 25.610523, 16.038118, 25.921682, 45.94093, 143.14594]
2025-08-07 01:01:46,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [156.0, 29.0, 45.0, 154.0, 105.0, 30.0, 26.0, 31.0, 50.0, 120.0]
2025-08-07 01:01:46,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1226 [INFO]: New best (89.89) for latency ExtremeSparseL4U32
2025-08-07 01:01:46,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 34 minutes, 33 seconds)
2025-08-07 01:03:27,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:03:28,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 49.32442 ± 43.699
2025-08-07 01:03:28,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [21.942825, 29.691456, 19.063995, 32.56088, 50.79621, 118.667725, 23.991343, 149.23251, 21.216972, 26.080225]
2025-08-07 01:03:28,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [30.0, 38.0, 20.0, 38.0, 52.0, 135.0, 28.0, 131.0, 27.0, 33.0]
2025-08-07 01:03:28,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 33 minutes, 8 seconds)
2025-08-07 01:05:08,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:05:09,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 132.02275 ± 86.017
2025-08-07 01:05:09,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [97.734024, 117.1799, 262.71225, 18.577972, 190.6169, 286.1004, 47.28418, 147.14809, 45.76118, 107.11254]
2025-08-07 01:05:09,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [66.0, 93.0, 143.0, 21.0, 120.0, 152.0, 57.0, 109.0, 49.0, 87.0]
2025-08-07 01:05:09,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1226 [INFO]: New best (132.02) for latency ExtremeSparseL4U32
2025-08-07 01:05:09,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 31 minutes, 58 seconds)
2025-08-07 01:06:50,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:06:51,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 69.50534 ± 78.080
2025-08-07 01:06:51,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [30.717386, 282.73267, 20.095066, 17.967108, 121.79654, 20.040623, 50.85564, 79.88178, 57.23134, 13.735263]
2025-08-07 01:06:51,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [32.0, 157.0, 32.0, 25.0, 82.0, 25.0, 55.0, 60.0, 52.0, 18.0]
2025-08-07 01:06:51,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 30 minutes, 32 seconds)
2025-08-07 01:08:32,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:08:33,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 75.58359 ± 53.044
2025-08-07 01:08:33,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [140.49171, 86.21223, 26.961285, 133.1077, 165.32405, 15.715643, 20.428211, 92.10219, 50.42416, 25.068718]
2025-08-07 01:08:33,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [89.0, 75.0, 27.0, 102.0, 98.0, 21.0, 23.0, 57.0, 50.0, 25.0]
2025-08-07 01:08:33,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 29 minutes, 13 seconds)
2025-08-07 01:10:13,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:10:14,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 108.20332 ± 70.323
2025-08-07 01:10:14,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [24.723167, 219.48288, 95.04283, 219.71277, 115.52842, 29.142115, 122.725266, 19.730686, 79.50324, 156.44182]
2025-08-07 01:10:14,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [26.0, 130.0, 64.0, 132.0, 86.0, 32.0, 100.0, 21.0, 57.0, 105.0]
2025-08-07 01:10:14,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 27 minutes, 17 seconds)
2025-08-07 01:11:55,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:11:56,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 93.00322 ± 68.616
2025-08-07 01:11:56,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [132.9134, 25.931675, 21.663744, 158.34799, 65.88514, 84.114944, 74.76566, 256.00763, 78.17245, 32.229538]
2025-08-07 01:11:56,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [95.0, 25.0, 23.0, 94.0, 49.0, 72.0, 61.0, 155.0, 60.0, 30.0]
2025-08-07 01:11:56,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 25 minutes, 40 seconds)
2025-08-07 01:13:36,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:13:37,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 49.22066 ± 34.687
2025-08-07 01:13:37,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [19.14084, 116.753494, 19.968866, 81.32913, 22.861513, 76.71925, 84.381996, 22.515167, 20.155262, 28.38107]
2025-08-07 01:13:37,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [25.0, 94.0, 31.0, 67.0, 31.0, 58.0, 63.0, 27.0, 28.0, 28.0]
2025-08-07 01:13:37,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 23 minutes, 43 seconds)
2025-08-07 01:15:17,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:15:18,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 69.34904 ± 57.487
2025-08-07 01:15:18,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [40.808434, 73.92951, 141.63869, 38.535057, 22.102137, 21.415987, 14.316884, 22.880682, 170.53973, 147.32335]
2025-08-07 01:15:18,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [43.0, 57.0, 90.0, 49.0, 24.0, 28.0, 20.0, 32.0, 103.0, 108.0]
2025-08-07 01:15:18,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 21 minutes, 57 seconds)
2025-08-07 01:17:00,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:17:00,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 52.20875 ± 43.823
2025-08-07 01:17:00,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [16.46679, 37.966072, 41.59677, 117.38806, 138.41203, 17.53544, 34.051075, 90.675835, 13.65961, 14.335867]
2025-08-07 01:17:00,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [21.0, 32.0, 48.0, 89.0, 83.0, 22.0, 34.0, 66.0, 17.0, 20.0]
2025-08-07 01:17:00,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 20 minutes, 18 seconds)
2025-08-07 01:18:40,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:18:41,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 96.60165 ± 91.026
2025-08-07 01:18:41,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [104.82754, 100.320274, 88.32368, 13.3584, 18.941963, 90.163994, 195.01067, 315.96274, 15.74636, 23.360878]
2025-08-07 01:18:41,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [79.0, 79.0, 67.0, 16.0, 24.0, 60.0, 108.0, 154.0, 22.0, 28.0]
2025-08-07 01:18:41,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 18 minutes, 32 seconds)
2025-08-07 01:20:22,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:20:23,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 51.59242 ± 44.327
2025-08-07 01:20:23,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [20.103357, 71.421165, 155.05801, 26.004435, 19.867008, 95.23419, 16.531328, 19.463385, 17.213102, 75.02817]
2025-08-07 01:20:23,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [25.0, 84.0, 107.0, 29.0, 26.0, 65.0, 21.0, 26.0, 20.0, 67.0]
2025-08-07 01:20:23,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 16 minutes, 54 seconds)
2025-08-07 01:22:03,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:22:04,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 73.84527 ± 93.070
2025-08-07 01:22:04,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [82.79434, 18.223541, 15.128568, 81.47697, 81.43692, 56.421425, 15.331215, 24.909012, 22.243301, 340.48743]
2025-08-07 01:22:04,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [65.0, 23.0, 21.0, 74.0, 69.0, 65.0, 17.0, 30.0, 28.0, 185.0]
2025-08-07 01:22:04,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 15 minutes, 22 seconds)
2025-08-07 01:23:45,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:23:46,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 67.71116 ± 73.142
2025-08-07 01:23:46,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [25.77041, 19.648266, 13.478469, 268.42624, 83.906555, 53.56495, 22.52619, 78.32184, 96.34015, 15.128546]
2025-08-07 01:23:46,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [31.0, 28.0, 18.0, 171.0, 72.0, 56.0, 26.0, 71.0, 67.0, 21.0]
2025-08-07 01:23:46,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 13 minutes, 48 seconds)
2025-08-07 01:25:27,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:25:28,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 91.62659 ± 80.734
2025-08-07 01:25:28,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [244.09343, 189.18811, 25.751696, 26.504726, 33.023373, 29.154741, 60.36029, 195.42725, 22.750751, 90.01152]
2025-08-07 01:25:28,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [154.0, 115.0, 31.0, 31.0, 32.0, 31.0, 49.0, 117.0, 30.0, 85.0]
2025-08-07 01:25:28,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 12 minutes, 3 seconds)
2025-08-07 01:27:08,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:27:09,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 57.73197 ± 48.029
2025-08-07 01:27:09,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [16.231213, 172.3267, 110.10845, 23.861504, 39.923626, 80.078896, 27.096891, 23.023745, 63.422256, 21.246424]
2025-08-07 01:27:09,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [23.0, 102.0, 76.0, 32.0, 45.0, 73.0, 28.0, 25.0, 58.0, 27.0]
2025-08-07 01:27:09,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 10 minutes, 18 seconds)
2025-08-07 01:28:50,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:28:50,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 65.13808 ± 54.884
2025-08-07 01:28:50,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [134.6414, 71.71632, 69.823975, 21.8407, 21.639538, 18.653645, 19.956463, 31.536104, 192.06004, 69.51258]
2025-08-07 01:28:50,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [87.0, 65.0, 62.0, 28.0, 27.0, 31.0, 24.0, 32.0, 111.0, 58.0]
2025-08-07 01:28:50,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 8 minutes, 32 seconds)
2025-08-07 01:30:30,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:30:31,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 70.05306 ± 44.626
2025-08-07 01:30:31,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [29.197067, 65.152794, 23.850159, 31.843864, 148.32224, 90.60509, 65.95361, 20.535862, 80.909386, 144.16057]
2025-08-07 01:30:31,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [29.0, 65.0, 27.0, 30.0, 91.0, 75.0, 68.0, 26.0, 69.0, 98.0]
2025-08-07 01:30:31,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 6 minutes, 46 seconds)
2025-08-07 01:32:12,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:32:13,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 106.41885 ± 80.871
2025-08-07 01:32:13,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [206.73735, 211.78168, 123.28624, 27.702099, 23.575493, 24.107405, 25.234207, 92.8219, 237.27531, 91.66677]
2025-08-07 01:32:13,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [118.0, 123.0, 87.0, 31.0, 28.0, 30.0, 32.0, 91.0, 129.0, 67.0]
2025-08-07 01:32:13,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 4 minutes, 57 seconds)
2025-08-07 01:33:55,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:33:55,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 60.88746 ± 75.061
2025-08-07 01:33:55,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [18.172356, 17.260092, 263.50555, 126.115555, 23.605282, 13.730502, 68.11751, 23.785196, 28.912498, 25.670103]
2025-08-07 01:33:55,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [25.0, 27.0, 143.0, 92.0, 26.0, 23.0, 62.0, 30.0, 30.0, 28.0]
2025-08-07 01:33:55,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 3 minutes, 26 seconds)
2025-08-07 01:35:34,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:35:35,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 88.03200 ± 77.093
2025-08-07 01:35:35,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [258.78506, 18.661293, 20.909882, 23.765406, 153.98965, 139.58519, 140.75642, 52.007465, 23.396132, 48.463474]
2025-08-07 01:35:35,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [145.0, 26.0, 25.0, 28.0, 92.0, 91.0, 87.0, 55.0, 29.0, 61.0]
2025-08-07 01:35:35,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 1 minute, 35 seconds)
2025-08-07 01:37:16,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:37:17,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 85.18404 ± 91.158
2025-08-07 01:37:17,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [28.76214, 13.048104, 52.286793, 22.380957, 211.88605, 52.308983, 303.9436, 88.49155, 54.922485, 23.809824]
2025-08-07 01:37:17,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [28.0, 18.0, 65.0, 23.0, 129.0, 60.0, 145.0, 71.0, 58.0, 32.0]
2025-08-07 01:37:17,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 30/100 (estimated time remaining: 1 hour, 59 minutes, 58 seconds)
2025-08-07 01:38:58,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:38:59,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 89.38991 ± 58.008
2025-08-07 01:38:59,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [16.878975, 143.13387, 65.568146, 109.45294, 17.790731, 171.43785, 62.474754, 148.78415, 15.101058, 143.27655]
2025-08-07 01:38:59,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [23.0, 110.0, 64.0, 76.0, 25.0, 118.0, 56.0, 98.0, 20.0, 109.0]
2025-08-07 01:38:59,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 58 minutes, 22 seconds)
2025-08-07 01:40:39,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:40:40,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 55.30830 ± 44.159
2025-08-07 01:40:40,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [83.23763, 28.373236, 57.464573, 145.17555, 22.57616, 13.532057, 122.79411, 26.779367, 32.81415, 20.336151]
2025-08-07 01:40:40,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [89.0, 31.0, 58.0, 114.0, 29.0, 19.0, 123.0, 26.0, 31.0, 29.0]
2025-08-07 01:40:40,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 56 minutes, 39 seconds)
2025-08-07 01:42:21,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:42:22,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 72.13380 ± 63.012
2025-08-07 01:42:22,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [20.119844, 144.35455, 197.00372, 15.779436, 32.132275, 29.055643, 72.632286, 149.82007, 27.923225, 32.516937]
2025-08-07 01:42:22,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [27.0, 94.0, 123.0, 24.0, 31.0, 29.0, 63.0, 85.0, 31.0, 31.0]
2025-08-07 01:42:22,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 54 minutes, 49 seconds)
2025-08-07 01:44:03,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:44:04,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 85.28438 ± 66.858
2025-08-07 01:44:04,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [19.093445, 18.074913, 18.245813, 195.00095, 92.8194, 58.888805, 172.66354, 89.788086, 20.974646, 167.29417]
2025-08-07 01:44:04,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [23.0, 23.0, 28.0, 121.0, 71.0, 55.0, 113.0, 88.0, 24.0, 102.0]
2025-08-07 01:44:04,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 53 minutes, 33 seconds)
2025-08-07 01:45:44,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:45:45,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 131.79613 ± 89.372
2025-08-07 01:45:45,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [71.40357, 218.02985, 24.67449, 234.62817, 93.985504, 160.23285, 300.14963, 119.08969, 16.942745, 78.82487]
2025-08-07 01:45:45,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [66.0, 127.0, 26.0, 133.0, 64.0, 99.0, 176.0, 86.0, 21.0, 66.0]
2025-08-07 01:45:45,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 51 minutes, 45 seconds)
2025-08-07 01:47:27,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:47:28,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 170.64307 ± 107.393
2025-08-07 01:47:28,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [317.7832, 184.6453, 290.2194, 301.7771, 129.55316, 21.203344, 210.36165, 176.60223, 56.029427, 18.255991]
2025-08-07 01:47:28,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [193.0, 112.0, 164.0, 161.0, 97.0, 24.0, 132.0, 99.0, 60.0, 21.0]
2025-08-07 01:47:28,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1226 [INFO]: New best (170.64) for latency ExtremeSparseL4U32
2025-08-07 01:47:28,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 50 minutes, 25 seconds)
2025-08-07 01:49:09,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:49:10,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 45.43796 ± 40.797
2025-08-07 01:49:10,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [72.18134, 26.865574, 29.139755, 78.35391, 149.64372, 15.527, 19.763798, 25.077322, 18.868988, 18.958166]
2025-08-07 01:49:10,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [58.0, 31.0, 26.0, 72.0, 86.0, 32.0, 27.0, 31.0, 22.0, 22.0]
2025-08-07 01:49:10,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 48 minutes, 46 seconds)
2025-08-07 01:50:49,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:50:50,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 86.25511 ± 64.806
2025-08-07 01:50:50,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [137.96814, 19.23616, 28.418386, 159.28496, 111.73364, 188.87778, 28.678875, 144.26785, 17.249496, 26.83577]
2025-08-07 01:50:50,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [94.0, 24.0, 32.0, 96.0, 86.0, 111.0, 33.0, 86.0, 21.0, 28.0]
2025-08-07 01:50:50,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 46 minutes, 39 seconds)
2025-08-07 01:52:31,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:52:32,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 73.73266 ± 76.813
2025-08-07 01:52:32,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [166.93108, 106.34132, 17.026049, 23.61172, 26.617313, 19.065971, 22.043184, 256.231, 74.09626, 25.362726]
2025-08-07 01:52:32,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [97.0, 78.0, 24.0, 26.0, 32.0, 27.0, 24.0, 153.0, 77.0, 27.0]
2025-08-07 01:52:32,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 44 minutes, 56 seconds)
2025-08-07 01:54:12,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:54:13,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 97.23808 ± 109.854
2025-08-07 01:54:13,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [12.131192, 27.676079, 131.2617, 15.67858, 201.94055, 362.19705, 31.313915, 152.23535, 18.24763, 19.698818]
2025-08-07 01:54:13,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [17.0, 33.0, 93.0, 21.0, 109.0, 190.0, 31.0, 94.0, 23.0, 29.0]
2025-08-07 01:54:13,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 43 minutes, 16 seconds)
2025-08-07 01:55:54,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:55:55,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 79.43992 ± 91.512
2025-08-07 01:55:55,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [248.50328, 10.34245, 25.866957, 53.939754, 17.64941, 17.21461, 130.86903, 20.550442, 250.81334, 18.649887]
2025-08-07 01:55:55,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [133.0, 13.0, 29.0, 60.0, 24.0, 20.0, 94.0, 24.0, 151.0, 22.0]
2025-08-07 01:55:55,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 41 minutes, 23 seconds)
2025-08-07 01:57:37,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:57:38,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 82.65431 ± 70.383
2025-08-07 01:57:38,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [18.1582, 173.29979, 77.89995, 21.430311, 95.797035, 205.88535, 19.96598, 24.370766, 22.732016, 167.00371]
2025-08-07 01:57:38,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [24.0, 124.0, 57.0, 24.0, 77.0, 134.0, 23.0, 32.0, 31.0, 95.0]
2025-08-07 01:57:38,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 39 minutes, 52 seconds)
2025-08-07 01:59:18,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:59:19,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 79.63641 ± 73.511
2025-08-07 01:59:19,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [21.409029, 22.42279, 196.5047, 153.34229, 20.85891, 23.434538, 162.09023, 162.84702, 12.701548, 20.75304]
2025-08-07 01:59:19,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [26.0, 28.0, 112.0, 94.0, 23.0, 23.0, 103.0, 102.0, 18.0, 26.0]
2025-08-07 01:59:19,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 38 minutes, 27 seconds)
2025-08-07 02:00:59,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:01:01,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 95.86829 ± 66.815
2025-08-07 02:01:01,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [43.626747, 27.106829, 142.42671, 122.751976, 113.29438, 244.86427, 116.91365, 13.35755, 29.140627, 105.20006]
2025-08-07 02:01:01,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [63.0, 32.0, 86.0, 92.0, 87.0, 125.0, 103.0, 20.0, 32.0, 73.0]
2025-08-07 02:01:01,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 36 minutes, 41 seconds)
2025-08-07 02:02:41,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:02:43,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 115.08867 ± 100.256
2025-08-07 02:02:43,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [261.84518, 132.04828, 27.196594, 31.515322, 91.2542, 17.096518, 304.53534, 194.15384, 77.35771, 13.883844]
2025-08-07 02:02:43,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [137.0, 99.0, 32.0, 33.0, 78.0, 23.0, 146.0, 114.0, 77.0, 20.0]
2025-08-07 02:02:43,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 35 minutes, 5 seconds)
2025-08-07 02:04:24,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:04:25,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 95.47205 ± 108.684
2025-08-07 02:04:25,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [180.24765, 369.23904, 151.99756, 21.642681, 23.02376, 23.910366, 29.807655, 13.321506, 22.34117, 119.189064]
2025-08-07 02:04:25,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [120.0, 159.0, 100.0, 28.0, 28.0, 23.0, 32.0, 17.0, 28.0, 85.0]
2025-08-07 02:04:25,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 33 minutes, 24 seconds)
2025-08-07 02:06:06,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:06:07,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 62.68310 ± 72.482
2025-08-07 02:06:07,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [19.698656, 128.90709, 19.126776, 18.024628, 31.049963, 23.180706, 252.47554, 87.62203, 15.740054, 31.005562]
2025-08-07 02:06:07,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [22.0, 87.0, 22.0, 25.0, 32.0, 28.0, 167.0, 61.0, 27.0, 31.0]
2025-08-07 02:06:07,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 31 minutes, 37 seconds)
2025-08-07 02:07:47,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:07:49,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 139.97739 ± 122.626
2025-08-07 02:07:49,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [22.76985, 21.282015, 332.7031, 112.85105, 312.35217, 101.92693, 157.0153, 17.573595, 297.35657, 23.943115]
2025-08-07 02:07:49,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [27.0, 25.0, 162.0, 75.0, 145.0, 79.0, 97.0, 20.0, 153.0, 25.0]
2025-08-07 02:07:49,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 30 minutes, 1 second)
2025-08-07 02:09:30,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:09:31,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 119.28717 ± 80.844
2025-08-07 02:09:31,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [82.16366, 318.78766, 18.475073, 162.98225, 152.49597, 109.74061, 126.73272, 106.88624, 18.443947, 96.163574]
2025-08-07 02:09:31,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [52.0, 170.0, 21.0, 89.0, 106.0, 70.0, 87.0, 69.0, 31.0, 65.0]
2025-08-07 02:09:31,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 28 minutes, 33 seconds)
2025-08-07 02:11:12,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:11:13,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 83.20598 ± 85.414
2025-08-07 02:11:13,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [15.727998, 259.34625, 25.396288, 132.60962, 19.809889, 215.69766, 26.906713, 87.07755, 25.645735, 23.842062]
2025-08-07 02:11:13,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [18.0, 162.0, 33.0, 82.0, 27.0, 135.0, 28.0, 59.0, 29.0, 28.0]
2025-08-07 02:11:13,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 26 minutes, 44 seconds)
2025-08-07 02:12:54,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:12:55,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 93.70723 ± 85.904
2025-08-07 02:12:55,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [30.280748, 186.02052, 258.50153, 91.26473, 103.29083, 20.5953, 197.52248, 14.865771, 17.0059, 17.724506]
2025-08-07 02:12:55,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [30.0, 118.0, 131.0, 78.0, 80.0, 25.0, 113.0, 19.0, 19.0, 32.0]
2025-08-07 02:12:55,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 25 minutes, 1 second)
2025-08-07 02:14:36,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:14:37,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 82.22538 ± 48.261
2025-08-07 02:14:37,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [111.56096, 20.949078, 23.8217, 63.74188, 93.84641, 113.66028, 103.69041, 18.995293, 178.18134, 93.80649]
2025-08-07 02:14:37,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [82.0, 23.0, 25.0, 42.0, 66.0, 82.0, 66.0, 19.0, 114.0, 69.0]
2025-08-07 02:14:37,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 23 minutes, 15 seconds)
2025-08-07 02:16:19,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:16:20,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 88.70470 ± 53.452
2025-08-07 02:16:20,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [106.50057, 146.38792, 50.51968, 22.442701, 114.47473, 111.120186, 26.761639, 13.530824, 120.50915, 174.79968]
2025-08-07 02:16:20,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [78.0, 105.0, 69.0, 30.0, 87.0, 69.0, 29.0, 15.0, 74.0, 109.0]
2025-08-07 02:16:20,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 21 minutes, 52 seconds)
2025-08-07 02:18:00,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:18:01,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 91.08897 ± 97.814
2025-08-07 02:18:01,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [21.416367, 20.009409, 18.097212, 17.552475, 12.061364, 249.51166, 112.00112, 17.355246, 265.2139, 177.67099]
2025-08-07 02:18:01,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [26.0, 22.0, 24.0, 20.0, 16.0, 146.0, 90.0, 22.0, 148.0, 109.0]
2025-08-07 02:18:01,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 19 minutes, 49 seconds)
2025-08-07 02:19:43,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:19:44,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 86.43893 ± 81.830
2025-08-07 02:19:44,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [182.22987, 270.45682, 20.056335, 17.998882, 100.350464, 95.33947, 22.571222, 15.5172615, 119.459465, 20.409525]
2025-08-07 02:19:44,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [113.0, 172.0, 31.0, 22.0, 65.0, 70.0, 26.0, 19.0, 83.0, 25.0]
2025-08-07 02:19:44,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 18 minutes, 18 seconds)
2025-08-07 02:21:24,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:21:25,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 101.66286 ± 85.115
2025-08-07 02:21:25,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [167.30284, 23.578194, 25.234179, 18.584782, 197.18578, 110.18901, 18.663738, 191.07155, 240.40344, 24.414995]
2025-08-07 02:21:25,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [99.0, 25.0, 27.0, 24.0, 119.0, 74.0, 20.0, 111.0, 134.0, 29.0]
2025-08-07 02:21:25,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 16 minutes, 32 seconds)
2025-08-07 02:23:06,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:23:08,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 94.93617 ± 72.785
2025-08-07 02:23:08,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [28.372137, 15.531267, 82.68209, 100.31114, 25.616997, 205.83838, 96.916534, 220.10875, 154.66399, 19.320395]
2025-08-07 02:23:08,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [32.0, 17.0, 57.0, 73.0, 29.0, 133.0, 71.0, 108.0, 119.0, 22.0]
2025-08-07 02:23:08,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 14 minutes, 56 seconds)
2025-08-07 02:24:48,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:24:49,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 133.77699 ± 110.247
2025-08-07 02:24:49,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [19.005781, 326.9531, 125.24018, 130.25381, 23.324444, 344.29184, 132.241, 90.674675, 123.7846, 22.000515]
2025-08-07 02:24:49,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [23.0, 147.0, 84.0, 84.0, 33.0, 157.0, 89.0, 58.0, 93.0, 27.0]
2025-08-07 02:24:49,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 12 minutes, 55 seconds)
2025-08-07 02:26:31,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:26:32,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 76.49382 ± 55.570
2025-08-07 02:26:32,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [94.74855, 199.28406, 88.02453, 112.3939, 105.43305, 87.406876, 13.85059, 24.462238, 20.5226, 18.811815]
2025-08-07 02:26:32,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [72.0, 115.0, 69.0, 76.0, 74.0, 64.0, 23.0, 31.0, 23.0, 19.0]
2025-08-07 02:26:32,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 11 minutes, 30 seconds)
2025-08-07 02:28:13,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:28:14,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 139.89102 ± 109.389
2025-08-07 02:28:14,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [114.39462, 25.95267, 23.011513, 90.51859, 185.00996, 213.88672, 26.21328, 128.43826, 397.68524, 193.79944]
2025-08-07 02:28:14,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [74.0, 28.0, 29.0, 69.0, 102.0, 136.0, 27.0, 100.0, 181.0, 105.0]
2025-08-07 02:28:14,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 9 minutes, 45 seconds)
2025-08-07 02:29:55,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:29:56,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 109.35248 ± 76.564
2025-08-07 02:29:56,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [27.519314, 194.53554, 29.947348, 119.984245, 153.71301, 214.44887, 194.36792, 12.2529545, 18.644836, 128.11081]
2025-08-07 02:29:56,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [31.0, 129.0, 32.0, 91.0, 114.0, 192.0, 166.0, 19.0, 25.0, 124.0]
2025-08-07 02:29:56,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 8 minutes, 9 seconds)
2025-08-07 02:31:39,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:31:40,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 87.75282 ± 75.357
2025-08-07 02:31:40,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [24.00433, 27.399124, 22.669456, 16.645424, 126.21858, 132.09665, 16.197233, 229.24213, 89.4717, 193.5835]
2025-08-07 02:31:40,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [31.0, 32.0, 26.0, 22.0, 80.0, 92.0, 22.0, 120.0, 61.0, 122.0]
2025-08-07 02:31:40,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 6 minutes, 34 seconds)
2025-08-07 02:33:20,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:33:21,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 106.29295 ± 123.383
2025-08-07 02:33:21,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [25.981503, 19.720486, 118.55537, 19.190762, 29.17532, 26.345064, 395.81482, 268.69177, 137.19897, 22.255497]
2025-08-07 02:33:21,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [31.0, 27.0, 79.0, 25.0, 27.0, 31.0, 189.0, 164.0, 97.0, 24.0]
2025-08-07 02:33:21,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 4 minutes, 47 seconds)
2025-08-07 02:35:03,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:35:04,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 104.85105 ± 106.352
2025-08-07 02:35:04,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [18.683449, 126.32631, 181.7853, 18.817957, 18.414034, 184.19847, 19.956083, 358.06177, 22.480175, 99.786934]
2025-08-07 02:35:04,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [21.0, 118.0, 146.0, 28.0, 23.0, 122.0, 23.0, 189.0, 28.0, 86.0]
2025-08-07 02:35:04,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 3 minutes, 11 seconds)
2025-08-07 02:36:45,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:36:46,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 124.25913 ± 77.608
2025-08-07 02:36:46,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [160.86777, 23.967737, 175.02676, 18.510084, 230.4003, 14.8570595, 220.32352, 123.4291, 104.05651, 171.1525]
2025-08-07 02:36:46,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [95.0, 27.0, 106.0, 20.0, 152.0, 19.0, 141.0, 102.0, 76.0, 107.0]
2025-08-07 02:36:46,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 1 minute, 28 seconds)
2025-08-07 02:38:27,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:38:28,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 70.98000 ± 49.889
2025-08-07 02:38:28,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [29.288326, 13.632557, 115.33923, 20.3892, 103.1136, 132.21468, 30.885916, 17.466812, 107.67963, 139.78995]
2025-08-07 02:38:28,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [31.0, 17.0, 73.0, 23.0, 73.0, 82.0, 30.0, 22.0, 83.0, 92.0]
2025-08-07 02:38:28,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 66/100 (estimated time remaining: 59 minutes, 37 seconds)
2025-08-07 02:40:09,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:40:10,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 81.26164 ± 74.388
2025-08-07 02:40:10,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [31.650284, 14.687769, 169.34483, 17.424736, 23.440548, 163.10478, 26.715506, 215.6732, 127.510735, 23.064108]
2025-08-07 02:40:10,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [32.0, 17.0, 99.0, 22.0, 32.0, 98.0, 28.0, 128.0, 84.0, 31.0]
2025-08-07 02:40:10,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 67/100 (estimated time remaining: 57 minutes, 51 seconds)
2025-08-07 02:41:51,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:41:52,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 106.25853 ± 102.890
2025-08-07 02:41:52,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [324.86887, 26.394955, 161.37685, 26.673697, 19.683577, 243.2276, 127.43907, 17.849129, 18.450392, 96.6213]
2025-08-07 02:41:52,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [201.0, 29.0, 97.0, 26.0, 21.0, 134.0, 90.0, 24.0, 24.0, 68.0]
2025-08-07 02:41:52,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 68/100 (estimated time remaining: 56 minutes, 14 seconds)
2025-08-07 02:43:34,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:43:35,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 102.15337 ± 99.388
2025-08-07 02:43:35,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [27.24404, 27.787825, 208.16415, 233.07199, 27.286709, 180.79782, 17.648281, 263.73138, 21.395332, 14.406268]
2025-08-07 02:43:35,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [29.0, 32.0, 130.0, 146.0, 31.0, 134.0, 20.0, 130.0, 24.0, 19.0]
2025-08-07 02:43:35,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 69/100 (estimated time remaining: 54 minutes, 30 seconds)
2025-08-07 02:45:16,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:45:17,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 190.78824 ± 135.892
2025-08-07 02:45:17,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [383.44955, 25.372076, 23.643885, 326.65387, 158.95537, 281.80447, 102.04893, 16.73772, 343.42358, 245.79297]
2025-08-07 02:45:17,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [193.0, 27.0, 28.0, 166.0, 99.0, 142.0, 70.0, 20.0, 164.0, 143.0]
2025-08-07 02:45:17,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1226 [INFO]: New best (190.79) for latency ExtremeSparseL4U32
2025-08-07 02:45:17,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 70/100 (estimated time remaining: 52 minutes, 47 seconds)
2025-08-07 02:46:59,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:47:00,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 87.70764 ± 98.429
2025-08-07 02:47:00,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [26.528587, 186.37114, 22.857946, 19.995264, 17.938478, 308.68896, 30.661865, 35.43209, 32.10419, 196.49785]
2025-08-07 02:47:00,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [29.0, 125.0, 27.0, 22.0, 25.0, 190.0, 32.0, 33.0, 30.0, 117.0]
2025-08-07 02:47:00,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 71/100 (estimated time remaining: 51 minutes, 13 seconds)
2025-08-07 02:48:41,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:48:42,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 152.68919 ± 104.724
2025-08-07 02:48:42,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [138.91742, 214.50185, 22.393333, 180.26031, 377.18335, 180.08786, 21.93241, 178.86604, 20.637293, 192.11208]
2025-08-07 02:48:42,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [91.0, 121.0, 25.0, 121.0, 170.0, 119.0, 26.0, 101.0, 23.0, 118.0]
2025-08-07 02:48:42,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 72/100 (estimated time remaining: 49 minutes, 29 seconds)
2025-08-07 02:50:23,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:50:25,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 152.75276 ± 105.714
2025-08-07 02:50:25,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [232.93979, 123.814186, 230.38553, 20.967457, 275.35754, 25.121132, 313.70392, 20.752674, 86.46741, 198.0179]
2025-08-07 02:50:25,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [149.0, 76.0, 143.0, 22.0, 159.0, 25.0, 149.0, 26.0, 63.0, 111.0]
2025-08-07 02:50:25,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 73/100 (estimated time remaining: 47 minutes, 50 seconds)
2025-08-07 02:52:05,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:52:07,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 105.82312 ± 93.835
2025-08-07 02:52:07,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [87.59592, 18.021084, 27.164925, 19.535908, 15.788857, 209.92992, 197.76755, 197.52829, 24.369808, 260.52893]
2025-08-07 02:52:07,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [59.0, 21.0, 32.0, 26.0, 21.0, 117.0, 116.0, 117.0, 28.0, 163.0]
2025-08-07 02:52:07,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 74/100 (estimated time remaining: 46 minutes, 2 seconds)
2025-08-07 02:53:48,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:53:49,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 99.01883 ± 100.031
2025-08-07 02:53:49,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [253.44745, 22.837736, 28.066692, 18.366266, 263.10712, 28.438269, 115.90552, 24.489408, 16.715002, 218.8148]
2025-08-07 02:53:49,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [156.0, 24.0, 31.0, 27.0, 134.0, 32.0, 84.0, 31.0, 21.0, 133.0]
2025-08-07 02:53:49,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 75/100 (estimated time remaining: 44 minutes, 21 seconds)
2025-08-07 02:55:32,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:55:34,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 149.13565 ± 113.783
2025-08-07 02:55:34,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [175.15251, 222.13478, 253.83113, 13.7780485, 220.6299, 236.15427, 13.844555, 15.584654, 21.002598, 319.24405]
2025-08-07 02:55:34,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [109.0, 121.0, 155.0, 15.0, 126.0, 149.0, 17.0, 26.0, 28.0, 165.0]
2025-08-07 02:55:34,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 76/100 (estimated time remaining: 42 minutes, 48 seconds)
2025-08-07 02:57:13,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:57:14,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 66.04338 ± 56.347
2025-08-07 02:57:14,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [159.16516, 119.200264, 23.959305, 18.089132, 25.683172, 113.88069, 141.6779, 25.31867, 20.01549, 13.444037]
2025-08-07 02:57:14,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [94.0, 78.0, 32.0, 23.0, 30.0, 67.0, 88.0, 29.0, 27.0, 18.0]
2025-08-07 02:57:14,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 77/100 (estimated time remaining: 40 minutes, 57 seconds)
2025-08-07 02:58:55,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:58:56,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 59.88931 ± 107.191
2025-08-07 02:58:56,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [22.500454, 380.83762, 35.601475, 17.44014, 25.888971, 24.006813, 19.76986, 17.153246, 18.63633, 37.058105]
2025-08-07 02:58:56,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [25.0, 176.0, 41.0, 22.0, 33.0, 30.0, 23.0, 23.0, 22.0, 33.0]
2025-08-07 02:58:56,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 78/100 (estimated time remaining: 39 minutes, 11 seconds)
2025-08-07 03:00:39,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:00:40,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 153.63156 ± 163.725
2025-08-07 03:00:40,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [23.750992, 113.5445, 364.14825, 28.26877, 15.32805, 32.818707, 225.48358, 14.313213, 206.63712, 512.0225]
2025-08-07 03:00:40,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [27.0, 88.0, 153.0, 29.0, 18.0, 32.0, 128.0, 25.0, 122.0, 237.0]
2025-08-07 03:00:40,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 79/100 (estimated time remaining: 37 minutes, 38 seconds)
2025-08-07 03:02:20,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:02:21,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 149.91576 ± 105.400
2025-08-07 03:02:21,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [20.821358, 151.69728, 193.12097, 88.66976, 18.568401, 288.49683, 293.09158, 276.70987, 143.49281, 24.488762]
2025-08-07 03:02:21,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [23.0, 93.0, 109.0, 66.0, 25.0, 163.0, 148.0, 160.0, 89.0, 29.0]
2025-08-07 03:02:21,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 80/100 (estimated time remaining: 35 minutes, 50 seconds)
2025-08-07 03:04:03,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:04:05,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 128.58182 ± 97.317
2025-08-07 03:04:05,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [15.189713, 25.369698, 20.871466, 230.91225, 188.79474, 130.43484, 296.47754, 182.59627, 176.53171, 18.639849]
2025-08-07 03:04:05,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [21.0, 33.0, 24.0, 153.0, 113.0, 90.0, 152.0, 149.0, 102.0, 27.0]
2025-08-07 03:04:05,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 81/100 (estimated time remaining: 34 minutes, 3 seconds)
2025-08-07 03:05:45,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:05:46,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 91.82201 ± 93.294
2025-08-07 03:05:46,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [202.8923, 133.65605, 18.649994, 17.633114, 28.476044, 280.02792, 17.26171, 176.67607, 20.9419, 22.005083]
2025-08-07 03:05:46,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [124.0, 84.0, 28.0, 20.0, 31.0, 167.0, 23.0, 111.0, 23.0, 26.0]
2025-08-07 03:05:46,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 82/100 (estimated time remaining: 32 minutes, 24 seconds)
2025-08-07 03:07:28,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:07:29,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 58.26116 ± 78.476
2025-08-07 03:07:29,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [21.075743, 16.04404, 22.284592, 16.253073, 27.191887, 190.38092, 17.466007, 17.228542, 17.690916, 236.99593]
2025-08-07 03:07:29,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [26.0, 21.0, 30.0, 20.0, 28.0, 130.0, 20.0, 20.0, 28.0, 135.0]
2025-08-07 03:07:29,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 83/100 (estimated time remaining: 30 minutes, 46 seconds)
2025-08-07 03:09:09,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:09:10,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 120.06138 ± 92.122
2025-08-07 03:09:10,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [240.75967, 11.591233, 24.050825, 194.6202, 17.95823, 13.07746, 230.1845, 207.41292, 99.59236, 161.36629]
2025-08-07 03:09:10,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [140.0, 16.0, 29.0, 111.0, 22.0, 16.0, 126.0, 127.0, 62.0, 95.0]
2025-08-07 03:09:10,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 84/100 (estimated time remaining: 28 minutes, 54 seconds)
2025-08-07 03:10:52,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:10:53,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 128.91527 ± 110.820
2025-08-07 03:10:53,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [16.343157, 306.4877, 22.648478, 305.26398, 130.00859, 21.58161, 88.9624, 243.08546, 126.124405, 28.647032]
2025-08-07 03:10:53,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 155.0, 33.0, 166.0, 91.0, 22.0, 59.0, 138.0, 92.0, 32.0]
2025-08-07 03:10:53,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 85/100 (estimated time remaining: 27 minutes, 18 seconds)
2025-08-07 03:12:34,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:12:35,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 99.92072 ± 121.133
2025-08-07 03:12:35,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [20.476297, 216.44156, 27.006718, 27.25663, 102.043976, 122.08472, 413.28503, 22.613194, 29.015652, 18.983368]
2025-08-07 03:12:35,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [24.0, 117.0, 27.0, 29.0, 76.0, 81.0, 176.0, 30.0, 33.0, 24.0]
2025-08-07 03:12:35,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 86/100 (estimated time remaining: 25 minutes, 31 seconds)
2025-08-07 03:14:16,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:14:17,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 97.23354 ± 80.207
2025-08-07 03:14:17,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [195.58284, 31.651693, 28.200968, 85.01319, 20.73969, 20.516535, 192.83453, 154.04712, 21.562319, 222.18655]
2025-08-07 03:14:17,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [120.0, 33.0, 29.0, 61.0, 28.0, 24.0, 120.0, 102.0, 30.0, 135.0]
2025-08-07 03:14:18,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 87/100 (estimated time remaining: 23 minutes, 52 seconds)
2025-08-07 03:15:59,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:16:00,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 137.30965 ± 125.799
2025-08-07 03:16:00,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [344.94614, 20.394833, 123.834465, 100.22733, 118.76274, 19.71944, 20.337006, 324.96646, 14.8130865, 285.09482]
2025-08-07 03:16:00,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [176.0, 23.0, 86.0, 64.0, 89.0, 22.0, 23.0, 171.0, 16.0, 156.0]
2025-08-07 03:16:00,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 88/100 (estimated time remaining: 22 minutes, 10 seconds)
2025-08-07 03:17:41,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:17:43,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 180.80014 ± 100.745
2025-08-07 03:17:43,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [24.640696, 22.88599, 277.5329, 354.90485, 212.93071, 208.18047, 136.48004, 243.73473, 117.603065, 209.10818]
2025-08-07 03:17:43,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [29.0, 27.0, 149.0, 177.0, 125.0, 128.0, 91.0, 140.0, 92.0, 120.0]
2025-08-07 03:17:43,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 89/100 (estimated time remaining: 20 minutes, 30 seconds)
2025-08-07 03:19:25,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:19:26,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 88.76787 ± 95.704
2025-08-07 03:19:26,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [15.478795, 182.24702, 204.33794, 292.38043, 85.22755, 18.537174, 16.469675, 29.61276, 21.59757, 21.789661]
2025-08-07 03:19:26,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [20.0, 92.0, 104.0, 155.0, 57.0, 21.0, 20.0, 31.0, 26.0, 28.0]
2025-08-07 03:19:26,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 90/100 (estimated time remaining: 18 minutes, 46 seconds)
2025-08-07 03:21:06,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:21:07,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 102.14744 ± 72.658
2025-08-07 03:21:07,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [31.837624, 25.012726, 18.856268, 208.55415, 141.10535, 22.069918, 84.43332, 209.38928, 162.59752, 117.61813]
2025-08-07 03:21:07,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [31.0, 27.0, 20.0, 112.0, 89.0, 30.0, 62.0, 125.0, 113.0, 81.0]
2025-08-07 03:21:07,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 91/100 (estimated time remaining: 17 minutes, 4 seconds)
2025-08-07 03:22:48,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:22:50,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 172.12141 ± 122.901
2025-08-07 03:22:50,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [19.364904, 319.99896, 198.14073, 335.0305, 140.03088, 212.51508, 333.03848, 16.144594, 17.428898, 129.5211]
2025-08-07 03:22:50,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [28.0, 148.0, 123.0, 159.0, 92.0, 108.0, 148.0, 28.0, 18.0, 86.0]
2025-08-07 03:22:50,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 92/100 (estimated time remaining: 15 minutes, 22 seconds)
2025-08-07 03:24:31,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:24:32,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 125.31779 ± 111.063
2025-08-07 03:24:32,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [192.148, 341.6404, 26.26925, 215.04103, 29.981007, 231.60567, 16.35019, 24.53455, 154.87141, 20.736479]
2025-08-07 03:24:32,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [118.0, 166.0, 26.0, 148.0, 26.0, 123.0, 26.0, 32.0, 103.0, 26.0]
2025-08-07 03:24:32,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 93/100 (estimated time remaining: 13 minutes, 39 seconds)
2025-08-07 03:26:14,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:26:15,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 158.59084 ± 158.199
2025-08-07 03:26:15,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [23.957745, 251.92525, 13.057704, 407.33884, 421.95935, 21.422697, 24.17086, 14.856091, 134.9893, 272.23056]
2025-08-07 03:26:15,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [26.0, 155.0, 18.0, 194.0, 172.0, 27.0, 32.0, 17.0, 96.0, 140.0]
2025-08-07 03:26:15,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 94/100 (estimated time remaining: 11 minutes, 57 seconds)
2025-08-07 03:27:56,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:27:57,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 69.73637 ± 76.717
2025-08-07 03:27:57,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [156.28598, 18.31413, 162.58472, 21.256481, 20.44505, 25.017218, 18.457127, 231.27615, 19.51946, 24.207487]
2025-08-07 03:27:57,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [102.0, 21.0, 99.0, 23.0, 23.0, 25.0, 28.0, 125.0, 29.0, 30.0]
2025-08-07 03:27:57,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 13 seconds)
2025-08-07 03:29:38,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:29:39,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 123.75051 ± 116.521
2025-08-07 03:29:39,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [167.33507, 21.032602, 161.08617, 330.0069, 86.872635, 21.930916, 74.9679, 12.795727, 333.1735, 28.303768]
2025-08-07 03:29:39,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [110.0, 22.0, 98.0, 194.0, 65.0, 24.0, 52.0, 17.0, 157.0, 31.0]
2025-08-07 03:29:39,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 31 seconds)
2025-08-07 03:31:21,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:31:22,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 145.69107 ± 179.629
2025-08-07 03:31:22,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [20.223965, 194.85265, 25.08135, 19.744904, 346.0474, 33.09747, 25.231735, 23.87162, 186.9189, 581.84064]
2025-08-07 03:31:22,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [25.0, 126.0, 30.0, 21.0, 177.0, 32.0, 25.0, 25.0, 107.0, 298.0]
2025-08-07 03:31:22,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 49 seconds)
2025-08-07 03:33:00,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:33:02,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 117.67793 ± 92.748
2025-08-07 03:33:02,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [162.02638, 168.21553, 27.340504, 132.29062, 12.997188, 117.8516, 20.902845, 27.19578, 192.908, 315.0509]
2025-08-07 03:33:02,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [118.0, 113.0, 31.0, 100.0, 15.0, 85.0, 26.0, 28.0, 111.0, 180.0]
2025-08-07 03:33:02,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 5 seconds)
2025-08-07 03:34:41,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:34:43,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 155.49863 ± 136.909
2025-08-07 03:34:43,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [84.81638, 235.26865, 26.27273, 360.1559, 25.63321, 389.50388, 257.65518, 127.651436, 30.597218, 17.43175]
2025-08-07 03:34:43,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [56.0, 140.0, 31.0, 176.0, 27.0, 189.0, 144.0, 80.0, 28.0, 26.0]
2025-08-07 03:34:43,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 22 seconds)
2025-08-07 03:36:22,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:36:23,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 107.12054 ± 108.767
2025-08-07 03:36:23,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [23.72125, 185.9623, 21.117918, 159.86038, 122.01836, 122.78041, 14.609657, 18.369312, 374.99158, 27.77426]
2025-08-07 03:36:23,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [29.0, 114.0, 21.0, 104.0, 85.0, 90.0, 19.0, 22.0, 193.0, 27.0]
2025-08-07 03:36:23,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 41 seconds)
2025-08-07 03:38:03,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:38:05,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 187.75026 ± 131.323
2025-08-07 03:38:05,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [211.97519, 16.574171, 381.64832, 237.02094, 136.59947, 253.36826, 217.25754, 383.49664, 21.743896, 17.818201]
2025-08-07 03:38:05,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [119.0, 23.0, 165.0, 137.0, 79.0, 141.0, 138.0, 194.0, 25.0, 32.0]
2025-08-07 03:38:05,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1251 [DEBUG]: Training session finished
