2025-08-07 06:41:33,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc10-hopper/ExtremeClogL1U23-bpql-mem24
2025-08-07 06:41:33,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc10-hopper/ExtremeClogL1U23-bpql-mem24
2025-08-07 06:41:33,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x1515b724b750>}
2025-08-07 06:41:33,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1111 [DEBUG]: using device: cuda
2025-08-07 06:41:33,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1133 [INFO]: Creating new trainer
2025-08-07 06:41:33,702 baseline-bpql-noiseperc10-hopper:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=83, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-08-07 06:41:33,702 baseline-bpql-noiseperc10-hopper:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 06:41:35,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1194 [DEBUG]: Starting training session...
2025-08-07 06:41:35,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 1/100
2025-08-07 06:43:05,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:43:06,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 63.06649 ± 36.048
2025-08-07 06:43:06,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [111.44869, 23.45249, 59.02506, 43.93286, 64.63033, 133.5846, 26.315493, 64.561325, 84.434494, 19.279552]
2025-08-07 06:43:06,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [61.0, 22.0, 53.0, 39.0, 57.0, 79.0, 38.0, 54.0, 60.0, 22.0]
2025-08-07 06:43:06,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1226 [INFO]: New best (63.07) for latency ExtremeClogL1U23
2025-08-07 06:43:06,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 29 minutes, 41 seconds)
2025-08-07 06:44:43,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:44:44,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 88.18355 ± 54.968
2025-08-07 06:44:44,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [120.40752, 119.8982, 19.954327, 13.6354065, 87.18557, 87.89548, 152.85954, 182.99878, 77.08811, 19.912474]
2025-08-07 06:44:44,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [73.0, 73.0, 26.0, 17.0, 63.0, 70.0, 99.0, 133.0, 63.0, 27.0]
2025-08-07 06:44:44,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1226 [INFO]: New best (88.18) for latency ExtremeClogL1U23
2025-08-07 06:44:44,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 33 minutes, 46 seconds)
2025-08-07 06:46:20,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:46:21,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 117.61379 ± 81.161
2025-08-07 06:46:21,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [139.00485, 208.16985, 178.55928, 183.66673, 23.697454, 44.6052, 15.484719, 236.43298, 16.239998, 130.27687]
2025-08-07 06:46:21,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [89.0, 118.0, 98.0, 132.0, 25.0, 39.0, 17.0, 140.0, 20.0, 97.0]
2025-08-07 06:46:21,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1226 [INFO]: New best (117.61) for latency ExtremeClogL1U23
2025-08-07 06:46:21,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 34 minutes, 5 seconds)
2025-08-07 06:47:59,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:48:00,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 99.34880 ± 66.831
2025-08-07 06:48:00,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [261.39725, 108.98476, 60.040863, 111.20321, 58.45053, 49.840725, 144.63663, 133.77466, 19.353752, 45.805695]
2025-08-07 06:48:00,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [171.0, 79.0, 62.0, 95.0, 51.0, 43.0, 84.0, 85.0, 25.0, 42.0]
2025-08-07 06:48:00,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 33 minutes, 48 seconds)
2025-08-07 06:49:37,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:49:37,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 61.43476 ± 37.937
2025-08-07 06:49:37,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [23.219902, 78.154594, 115.247826, 107.546455, 26.591982, 14.995525, 38.053463, 96.9423, 22.091158, 91.50443]
2025-08-07 06:49:37,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [24.0, 61.0, 85.0, 70.0, 35.0, 23.0, 35.0, 65.0, 22.0, 72.0]
2025-08-07 06:49:37,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 32 minutes, 37 seconds)
2025-08-07 06:51:14,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:51:16,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 138.60510 ± 80.522
2025-08-07 06:51:16,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [162.30165, 15.464036, 125.75732, 19.437777, 115.674446, 274.07913, 185.069, 237.44102, 83.02883, 167.79782]
2025-08-07 06:51:16,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [115.0, 23.0, 98.0, 23.0, 80.0, 206.0, 126.0, 142.0, 67.0, 124.0]
2025-08-07 06:51:16,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1226 [INFO]: New best (138.61) for latency ExtremeClogL1U23
2025-08-07 06:51:16,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 33 minutes, 22 seconds)
2025-08-07 06:52:54,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:52:55,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 77.77318 ± 64.585
2025-08-07 06:52:55,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [14.088547, 27.575119, 16.717459, 102.55378, 182.68102, 29.924965, 91.72539, 106.880424, 15.597752, 189.98727]
2025-08-07 06:52:55,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [17.0, 24.0, 21.0, 96.0, 126.0, 36.0, 86.0, 68.0, 22.0, 127.0]
2025-08-07 06:52:55,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 32 minutes, 13 seconds)
2025-08-07 06:54:33,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:54:34,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 121.18851 ± 57.902
2025-08-07 06:54:34,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [196.53519, 177.98065, 140.55162, 46.556206, 198.37537, 105.86155, 90.630585, 133.4624, 15.420314, 106.51119]
2025-08-07 06:54:34,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [121.0, 142.0, 110.0, 48.0, 140.0, 77.0, 71.0, 103.0, 19.0, 78.0]
2025-08-07 06:54:34,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 31 minutes, 2 seconds)
2025-08-07 06:56:11,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:56:12,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 95.70705 ± 58.373
2025-08-07 06:56:12,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [84.16643, 68.94139, 168.84409, 20.318544, 122.783035, 15.461339, 40.209522, 149.40628, 190.86668, 96.07324]
2025-08-07 06:56:12,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [73.0, 61.0, 112.0, 24.0, 93.0, 19.0, 42.0, 104.0, 139.0, 89.0]
2025-08-07 06:56:12,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 29 minutes, 25 seconds)
2025-08-07 06:57:50,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:57:52,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 144.69167 ± 30.407
2025-08-07 06:57:52,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [119.4329, 126.348366, 113.884315, 105.92293, 191.00862, 195.65125, 123.11942, 164.40656, 147.85413, 159.28816]
2025-08-07 06:57:52,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [99.0, 98.0, 84.0, 87.0, 122.0, 136.0, 93.0, 109.0, 102.0, 129.0]
2025-08-07 06:57:52,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1226 [INFO]: New best (144.69) for latency ExtremeClogL1U23
2025-08-07 06:57:52,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 28 minutes, 21 seconds)
2025-08-07 06:59:29,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:59:30,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 112.84560 ± 73.997
2025-08-07 06:59:30,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [89.07706, 178.66185, 19.988815, 226.80045, 183.73276, 116.1302, 78.50238, 30.245934, 16.0411, 189.27559]
2025-08-07 06:59:30,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [56.0, 125.0, 21.0, 134.0, 128.0, 86.0, 57.0, 33.0, 23.0, 110.0]
2025-08-07 06:59:30,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 26 minutes, 49 seconds)
2025-08-07 07:01:09,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:01:10,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 119.15292 ± 59.754
2025-08-07 07:01:10,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [140.60747, 153.11882, 87.264565, 38.29849, 91.75545, 208.33719, 10.484457, 120.06133, 152.25412, 189.34729]
2025-08-07 07:01:10,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [109.0, 123.0, 81.0, 36.0, 80.0, 115.0, 13.0, 81.0, 103.0, 129.0]
2025-08-07 07:01:10,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 25 minutes, 21 seconds)
2025-08-07 07:02:48,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:02:49,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 138.12833 ± 109.599
2025-08-07 07:02:49,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [99.91265, 159.41539, 66.52927, 25.141584, 241.63104, 148.56697, 361.35992, 21.830654, 242.34908, 14.546597]
2025-08-07 07:02:49,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [65.0, 103.0, 49.0, 26.0, 140.0, 87.0, 162.0, 23.0, 153.0, 18.0]
2025-08-07 07:02:49,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 23 minutes, 30 seconds)
2025-08-07 07:04:27,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:04:28,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 116.50977 ± 85.336
2025-08-07 07:04:28,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [71.83886, 249.60542, 23.000221, 242.48148, 16.415684, 16.668995, 174.96138, 89.03123, 180.45973, 100.63463]
2025-08-07 07:04:28,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [58.0, 175.0, 29.0, 146.0, 21.0, 20.0, 107.0, 54.0, 105.0, 71.0]
2025-08-07 07:04:28,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 22 minutes, 6 seconds)
2025-08-07 07:06:07,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:06:08,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 159.09837 ± 82.817
2025-08-07 07:06:08,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [81.08288, 173.88559, 229.37688, 268.87717, 287.19345, 18.65882, 165.38344, 176.77866, 78.172646, 111.574234]
2025-08-07 07:06:08,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [60.0, 115.0, 129.0, 126.0, 141.0, 23.0, 109.0, 122.0, 51.0, 70.0]
2025-08-07 07:06:08,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1226 [INFO]: New best (159.10) for latency ExtremeClogL1U23
2025-08-07 07:06:08,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 20 minutes, 30 seconds)
2025-08-07 07:07:46,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:07:48,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 151.13628 ± 90.784
2025-08-07 07:07:48,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [154.16286, 182.63515, 226.1301, 239.85251, 191.15927, 265.95755, 191.17482, 17.207151, 19.728241, 23.355083]
2025-08-07 07:07:48,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [95.0, 101.0, 145.0, 136.0, 114.0, 155.0, 109.0, 19.0, 22.0, 23.0]
2025-08-07 07:07:48,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 19 minutes, 12 seconds)
2025-08-07 07:09:26,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:09:27,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 170.09346 ± 140.786
2025-08-07 07:09:27,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [21.003008, 179.6609, 20.435696, 135.96576, 408.50153, 310.5031, 15.811219, 26.858116, 324.03198, 258.1633]
2025-08-07 07:09:27,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [30.0, 106.0, 25.0, 89.0, 190.0, 161.0, 24.0, 25.0, 173.0, 145.0]
2025-08-07 07:09:27,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1226 [INFO]: New best (170.09) for latency ExtremeClogL1U23
2025-08-07 07:09:27,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 17 minutes, 26 seconds)
2025-08-07 07:11:05,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:11:06,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 164.09041 ± 69.817
2025-08-07 07:11:06,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [190.08282, 310.2355, 67.04735, 156.62651, 204.72258, 165.4222, 209.12785, 136.30423, 51.991608, 149.34341]
2025-08-07 07:11:06,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [108.0, 154.0, 59.0, 95.0, 126.0, 106.0, 113.0, 83.0, 52.0, 99.0]
2025-08-07 07:11:06,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 16 minutes, 1 second)
2025-08-07 07:12:44,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:12:46,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 173.48003 ± 100.422
2025-08-07 07:12:46,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [47.73251, 73.94425, 341.65094, 279.86758, 189.05164, 64.120575, 159.37128, 72.67277, 264.4606, 241.92809]
2025-08-07 07:12:46,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [42.0, 59.0, 152.0, 147.0, 119.0, 65.0, 97.0, 63.0, 149.0, 133.0]
2025-08-07 07:12:46,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1226 [INFO]: New best (173.48) for latency ExtremeClogL1U23
2025-08-07 07:12:46,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 14 minutes, 20 seconds)
2025-08-07 07:14:23,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:14:24,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 102.22794 ± 66.089
2025-08-07 07:14:24,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [161.83994, 175.65471, 17.602797, 54.70591, 167.09903, 164.7392, 16.946367, 25.505188, 160.73677, 77.449486]
2025-08-07 07:14:24,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [91.0, 104.0, 25.0, 50.0, 96.0, 103.0, 23.0, 25.0, 91.0, 76.0]
2025-08-07 07:14:24,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 12 minutes, 26 seconds)
2025-08-07 07:16:02,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:16:03,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 87.33281 ± 99.000
2025-08-07 07:16:03,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [162.11288, 19.720133, 17.435667, 40.162712, 97.75322, 15.27657, 18.279596, 336.8716, 18.213026, 147.50261]
2025-08-07 07:16:03,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [97.0, 25.0, 23.0, 44.0, 68.0, 19.0, 23.0, 149.0, 25.0, 104.0]
2025-08-07 07:16:03,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 10 minutes, 30 seconds)
2025-08-07 07:17:41,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:17:42,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 130.34497 ± 78.706
2025-08-07 07:17:42,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [120.33076, 17.05178, 16.464472, 210.48158, 185.94458, 19.74353, 164.32597, 214.0115, 148.52295, 206.5725]
2025-08-07 07:17:42,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [76.0, 22.0, 20.0, 121.0, 129.0, 22.0, 102.0, 129.0, 90.0, 123.0]
2025-08-07 07:17:42,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 8 minutes, 48 seconds)
2025-08-07 07:19:21,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:19:22,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 224.22737 ± 117.608
2025-08-07 07:19:22,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [19.900257, 278.25775, 311.84418, 222.86263, 102.3939, 316.28415, 172.65231, 330.82855, 398.34354, 88.9064]
2025-08-07 07:19:22,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [24.0, 131.0, 140.0, 115.0, 65.0, 164.0, 102.0, 151.0, 171.0, 63.0]
2025-08-07 07:19:22,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1226 [INFO]: New best (224.23) for latency ExtremeClogL1U23
2025-08-07 07:19:22,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 7 minutes, 17 seconds)
2025-08-07 07:21:01,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:21:02,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 114.66042 ± 103.346
2025-08-07 07:21:02,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [151.53575, 168.63873, 188.69005, 24.730848, 67.941734, 61.787632, 367.96545, 78.37399, 18.494152, 18.445686]
2025-08-07 07:21:02,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [88.0, 99.0, 105.0, 25.0, 47.0, 72.0, 143.0, 74.0, 22.0, 24.0]
2025-08-07 07:21:02,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 5 minutes, 46 seconds)
2025-08-07 07:22:40,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:22:42,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 216.42818 ± 93.353
2025-08-07 07:22:42,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [205.5775, 111.966736, 286.9826, 182.67867, 279.06027, 191.55692, 360.8913, 296.69272, 21.423069, 227.452]
2025-08-07 07:22:42,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [131.0, 71.0, 163.0, 105.0, 139.0, 113.0, 189.0, 178.0, 24.0, 128.0]
2025-08-07 07:22:42,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 4 minutes, 19 seconds)
2025-08-07 07:24:21,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:24:23,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 256.15292 ± 104.646
2025-08-07 07:24:23,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [335.88702, 473.1959, 277.66202, 272.81058, 201.69469, 92.461, 294.79218, 307.59125, 151.02202, 154.41277]
2025-08-07 07:24:23,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [193.0, 229.0, 151.0, 141.0, 115.0, 62.0, 149.0, 160.0, 102.0, 92.0]
2025-08-07 07:24:23,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1226 [INFO]: New best (256.15) for latency ExtremeClogL1U23
2025-08-07 07:24:23,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 3 minutes, 18 seconds)
2025-08-07 07:26:02,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:26:03,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 186.90472 ± 144.793
2025-08-07 07:26:03,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [265.71008, 177.80243, 15.651477, 23.042845, 158.53352, 18.718878, 289.53513, 109.824905, 348.6257, 461.60217]
2025-08-07 07:26:03,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [144.0, 115.0, 20.0, 24.0, 98.0, 20.0, 140.0, 71.0, 166.0, 188.0]
2025-08-07 07:26:03,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 1 minute, 47 seconds)
2025-08-07 07:27:40,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:27:41,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 169.60657 ± 127.279
2025-08-07 07:27:41,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [19.49846, 214.1811, 272.92947, 406.84875, 19.246162, 73.77026, 20.178919, 206.23987, 170.86496, 292.30768]
2025-08-07 07:27:41,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [21.0, 151.0, 147.0, 212.0, 24.0, 76.0, 24.0, 112.0, 99.0, 144.0]
2025-08-07 07:27:41,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 29/100 (estimated time remaining: 1 hour, 59 minutes, 42 seconds)
2025-08-07 07:29:20,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:29:21,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 149.39452 ± 106.324
2025-08-07 07:29:21,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [84.07056, 263.99304, 16.960884, 57.17409, 260.78296, 43.28825, 245.60985, 257.2641, 21.795862, 243.00545]
2025-08-07 07:29:21,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [56.0, 127.0, 23.0, 61.0, 134.0, 51.0, 137.0, 147.0, 30.0, 126.0]
2025-08-07 07:29:21,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 30/100 (estimated time remaining: 1 hour, 58 minutes, 9 seconds)
2025-08-07 07:30:59,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:31:00,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 130.02528 ± 54.412
2025-08-07 07:31:00,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [128.70105, 157.4568, 129.54959, 17.471992, 107.19161, 193.17169, 168.83241, 148.77438, 54.704975, 194.39835]
2025-08-07 07:31:00,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [83.0, 97.0, 87.0, 23.0, 70.0, 112.0, 99.0, 94.0, 64.0, 134.0]
2025-08-07 07:31:00,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 56 minutes, 17 seconds)
2025-08-07 07:32:38,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:32:39,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 109.41699 ± 92.647
2025-08-07 07:32:39,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [19.49653, 213.65611, 178.23555, 166.70695, 147.89668, 17.931572, 274.82495, 19.905807, 15.367435, 40.148293]
2025-08-07 07:32:39,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [24.0, 125.0, 115.0, 110.0, 83.0, 19.0, 134.0, 22.0, 20.0, 35.0]
2025-08-07 07:32:39,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 54 minutes, 5 seconds)
2025-08-07 07:34:19,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:34:21,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 239.93515 ± 182.836
2025-08-07 07:34:21,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [461.68668, 13.231499, 220.93285, 116.45494, 168.43092, 280.5593, 210.17455, 18.725367, 640.8423, 268.31332]
2025-08-07 07:34:21,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [190.0, 16.0, 118.0, 71.0, 101.0, 135.0, 115.0, 22.0, 245.0, 144.0]
2025-08-07 07:34:21,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 52 minutes, 48 seconds)
2025-08-07 07:35:59,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:36:00,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 136.10475 ± 53.825
2025-08-07 07:36:00,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [151.31847, 193.5333, 112.973305, 120.12505, 201.79189, 72.7032, 158.62827, 20.395895, 145.1021, 184.47618]
2025-08-07 07:36:00,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [102.0, 120.0, 83.0, 79.0, 111.0, 63.0, 88.0, 25.0, 81.0, 99.0]
2025-08-07 07:36:00,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 51 minutes, 30 seconds)
2025-08-07 07:37:37,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:37:39,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 304.18799 ± 281.652
2025-08-07 07:37:39,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [104.26833, 266.39157, 212.95224, 48.773735, 447.546, 49.15129, 635.1688, 174.66707, 136.69565, 966.265]
2025-08-07 07:37:39,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [94.0, 146.0, 122.0, 47.0, 191.0, 49.0, 293.0, 108.0, 78.0, 405.0]
2025-08-07 07:37:39,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1226 [INFO]: New best (304.19) for latency ExtremeClogL1U23
2025-08-07 07:37:39,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 49 minutes, 31 seconds)
2025-08-07 07:39:18,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:39:19,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 161.00421 ± 144.934
2025-08-07 07:39:19,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [18.377188, 278.32297, 77.68841, 319.3381, 14.139634, 12.858889, 413.96854, 168.88115, 287.86743, 18.599686]
2025-08-07 07:39:19,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [22.0, 135.0, 85.0, 142.0, 17.0, 17.0, 194.0, 102.0, 131.0, 19.0]
2025-08-07 07:39:19,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 48 minutes, 7 seconds)
2025-08-07 07:40:58,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:41:00,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 223.65186 ± 110.685
2025-08-07 07:41:00,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [308.64923, 394.16776, 230.38298, 271.50284, 197.63655, 205.01413, 331.5803, 47.782265, 22.994385, 226.80807]
2025-08-07 07:41:00,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [150.0, 181.0, 124.0, 137.0, 113.0, 122.0, 206.0, 46.0, 25.0, 128.0]
2025-08-07 07:41:00,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 46 minutes, 46 seconds)
2025-08-07 07:42:38,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:42:39,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 173.19699 ± 110.195
2025-08-07 07:42:39,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [54.872074, 316.31833, 133.21213, 258.65247, 312.6281, 133.3516, 244.41278, 16.29603, 240.1635, 22.063051]
2025-08-07 07:42:39,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [53.0, 180.0, 96.0, 145.0, 173.0, 99.0, 119.0, 22.0, 136.0, 24.0]
2025-08-07 07:42:39,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 44 minutes, 43 seconds)
2025-08-07 07:44:19,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:44:21,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 189.07599 ± 191.364
2025-08-07 07:44:21,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [218.82188, 118.62698, 181.16246, 17.735304, 208.71675, 329.50034, 20.661625, 21.886078, 683.72833, 89.91999]
2025-08-07 07:44:21,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [122.0, 78.0, 103.0, 22.0, 119.0, 164.0, 25.0, 23.0, 303.0, 76.0]
2025-08-07 07:44:21,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 43 minutes, 23 seconds)
2025-08-07 07:45:58,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:45:59,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 177.97006 ± 125.434
2025-08-07 07:45:59,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [259.33237, 393.63153, 363.6251, 109.314995, 148.31091, 18.22373, 65.27782, 20.397636, 195.51772, 206.06888]
2025-08-07 07:45:59,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [170.0, 189.0, 172.0, 68.0, 91.0, 22.0, 58.0, 24.0, 109.0, 123.0]
2025-08-07 07:45:59,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 41 minutes, 35 seconds)
2025-08-07 07:47:37,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:47:39,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 156.48552 ± 130.700
2025-08-07 07:47:39,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [228.28706, 15.753199, 19.907373, 18.181047, 221.63187, 16.054707, 293.1208, 233.77005, 120.46395, 397.68512]
2025-08-07 07:47:39,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [121.0, 17.0, 21.0, 24.0, 117.0, 19.0, 158.0, 126.0, 68.0, 183.0]
2025-08-07 07:47:39,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 39 minutes, 53 seconds)
2025-08-07 07:49:17,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:49:19,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 207.91716 ± 70.171
2025-08-07 07:49:19,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [200.26682, 124.54949, 136.47133, 228.20676, 293.90332, 346.47006, 261.58664, 138.57141, 184.86932, 164.27646]
2025-08-07 07:49:19,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [121.0, 82.0, 80.0, 110.0, 142.0, 163.0, 140.0, 85.0, 113.0, 97.0]
2025-08-07 07:49:19,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 38 minutes, 10 seconds)
2025-08-07 07:50:58,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:50:59,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 231.79626 ± 192.207
2025-08-07 07:50:59,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [108.29652, 443.07236, 11.989039, 185.70892, 608.7796, 22.297873, 311.33792, 414.1678, 137.3112, 75.00114]
2025-08-07 07:50:59,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [80.0, 230.0, 17.0, 119.0, 254.0, 22.0, 146.0, 193.0, 90.0, 73.0]
2025-08-07 07:50:59,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 36 minutes, 39 seconds)
2025-08-07 07:52:38,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:52:39,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 224.01157 ± 115.880
2025-08-07 07:52:39,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [214.45045, 95.31765, 284.8146, 282.69717, 140.14156, 215.42131, 200.24377, 17.639505, 363.40402, 425.9856]
2025-08-07 07:52:39,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [113.0, 62.0, 139.0, 145.0, 87.0, 125.0, 116.0, 22.0, 171.0, 182.0]
2025-08-07 07:52:39,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 34 minutes, 46 seconds)
2025-08-07 07:54:18,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:54:19,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 178.95374 ± 173.967
2025-08-07 07:54:19,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [122.74489, 157.92392, 19.373186, 125.106735, 255.2811, 630.7987, 258.40506, 12.692015, 191.29434, 15.917333]
2025-08-07 07:54:19,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [75.0, 94.0, 24.0, 83.0, 132.0, 272.0, 136.0, 17.0, 110.0, 19.0]
2025-08-07 07:54:19,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 33 minutes, 18 seconds)
2025-08-07 07:55:59,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:56:01,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 301.56381 ± 355.327
2025-08-07 07:56:01,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [28.115934, 12.964441, 609.7714, 289.81693, 79.06835, 384.89847, 252.48148, 127.60474, 1216.7908, 14.12558]
2025-08-07 07:56:01,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [25.0, 16.0, 266.0, 149.0, 54.0, 186.0, 133.0, 90.0, 420.0, 20.0]
2025-08-07 07:56:01,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 32 minutes, 4 seconds)
2025-08-07 07:57:39,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:57:40,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 150.29170 ± 104.575
2025-08-07 07:57:40,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [17.561468, 364.86972, 187.5164, 169.69724, 221.75659, 199.6322, 16.07554, 174.39655, 133.471, 17.940426]
2025-08-07 07:57:40,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [21.0, 166.0, 117.0, 94.0, 115.0, 121.0, 18.0, 114.0, 93.0, 20.0]
2025-08-07 07:57:40,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 30 minutes, 10 seconds)
2025-08-07 07:59:18,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:59:20,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 336.60477 ± 289.641
2025-08-07 07:59:20,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [129.18442, 33.480976, 733.3869, 121.82823, 21.574656, 600.50275, 137.00316, 466.15646, 266.66684, 856.26324]
2025-08-07 07:59:20,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [78.0, 36.0, 309.0, 78.0, 22.0, 290.0, 94.0, 199.0, 151.0, 334.0]
2025-08-07 07:59:20,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1226 [INFO]: New best (336.60) for latency ExtremeClogL1U23
2025-08-07 07:59:20,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 28 minutes, 32 seconds)
2025-08-07 08:00:59,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:01:00,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 246.79404 ± 295.372
2025-08-07 08:01:00,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [152.48088, 974.5584, 19.754547, 146.50793, 15.734916, 22.006607, 234.19873, 23.931267, 290.41943, 588.34753]
2025-08-07 08:01:00,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [88.0, 403.0, 22.0, 91.0, 20.0, 25.0, 126.0, 23.0, 121.0, 247.0]
2025-08-07 08:01:00,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 26 minutes, 49 seconds)
2025-08-07 08:02:39,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:02:41,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 161.23979 ± 149.411
2025-08-07 08:02:41,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [55.443405, 163.67833, 537.1215, 15.693207, 98.78825, 243.84007, 58.222717, 241.75801, 182.53665, 15.315943]
2025-08-07 08:02:41,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [52.0, 93.0, 226.0, 22.0, 58.0, 141.0, 52.0, 144.0, 101.0, 18.0]
2025-08-07 08:02:41,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 25 minutes, 17 seconds)
2025-08-07 08:04:18,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:04:19,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 170.24139 ± 71.625
2025-08-07 08:04:19,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [141.91446, 17.778442, 140.06378, 162.74588, 161.17432, 301.35672, 212.44305, 190.22513, 131.46509, 243.24704]
2025-08-07 08:04:19,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [83.0, 24.0, 94.0, 100.0, 109.0, 129.0, 114.0, 113.0, 77.0, 133.0]
2025-08-07 08:04:19,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 23 minutes, 5 seconds)
2025-08-07 08:05:59,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:06:00,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 183.36710 ± 69.557
2025-08-07 08:06:00,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [200.07475, 238.86024, 17.436441, 265.59573, 145.98476, 239.92482, 147.82344, 145.89095, 190.25824, 241.8215]
2025-08-07 08:06:00,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [116.0, 136.0, 23.0, 140.0, 87.0, 122.0, 86.0, 93.0, 119.0, 137.0]
2025-08-07 08:06:00,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 21 minutes, 42 seconds)
2025-08-07 08:07:38,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:07:40,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 263.09222 ± 204.433
2025-08-07 08:07:40,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [348.0112, 22.676601, 290.7504, 221.64792, 194.78857, 139.52017, 811.854, 173.19637, 305.08786, 123.38905]
2025-08-07 08:07:40,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [153.0, 23.0, 139.0, 117.0, 112.0, 87.0, 365.0, 106.0, 173.0, 85.0]
2025-08-07 08:07:40,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 19 minutes, 51 seconds)
2025-08-07 08:09:18,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:09:20,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 215.07755 ± 191.615
2025-08-07 08:09:20,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [466.3585, 594.4472, 39.41331, 16.551008, 234.15656, 123.29539, 54.01647, 17.16869, 319.7594, 285.60876]
2025-08-07 08:09:20,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [215.0, 254.0, 40.0, 20.0, 123.0, 68.0, 52.0, 24.0, 177.0, 131.0]
2025-08-07 08:09:20,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 18 minutes, 16 seconds)
2025-08-07 08:10:59,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:11:00,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 252.31277 ± 170.629
2025-08-07 08:11:00,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [119.69348, 207.77142, 10.335336, 16.862738, 139.48772, 309.92813, 467.4415, 414.58118, 335.82474, 501.20145]
2025-08-07 08:11:00,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [82.0, 104.0, 14.0, 20.0, 86.0, 168.0, 199.0, 185.0, 189.0, 213.0]
2025-08-07 08:11:00,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 16 minutes, 36 seconds)
2025-08-07 08:12:37,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:12:39,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 214.17871 ± 223.303
2025-08-07 08:12:39,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [162.50426, 842.6935, 17.288061, 230.3773, 233.64795, 14.365454, 118.676865, 230.22452, 184.30931, 107.69986]
2025-08-07 08:12:39,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [119.0, 370.0, 20.0, 132.0, 128.0, 19.0, 87.0, 118.0, 115.0, 69.0]
2025-08-07 08:12:39,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 14 minutes, 53 seconds)
2025-08-07 08:14:15,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:14:17,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 237.70940 ± 136.096
2025-08-07 08:14:17,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [259.8487, 444.3781, 19.006521, 392.53683, 227.8415, 19.119276, 196.91344, 304.32355, 168.46178, 344.6642]
2025-08-07 08:14:17,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [146.0, 201.0, 20.0, 184.0, 120.0, 22.0, 115.0, 151.0, 99.0, 157.0]
2025-08-07 08:14:17,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 12 minutes, 52 seconds)
2025-08-07 08:15:54,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:15:55,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 121.08201 ± 77.042
2025-08-07 08:15:55,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [17.503511, 127.62879, 194.12932, 21.253973, 254.90901, 119.48787, 191.21709, 20.29166, 136.94421, 127.45464]
2025-08-07 08:15:55,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [19.0, 77.0, 117.0, 22.0, 131.0, 74.0, 112.0, 22.0, 84.0, 92.0]
2025-08-07 08:15:55,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 11 minutes, 2 seconds)
2025-08-07 08:17:33,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:17:35,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 249.79068 ± 324.889
2025-08-07 08:17:35,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [19.39154, 98.266785, 118.47813, 182.70721, 254.08885, 1204.0388, 185.4759, 221.96252, 132.22787, 81.26897]
2025-08-07 08:17:35,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [23.0, 72.0, 73.0, 110.0, 130.0, 421.0, 113.0, 134.0, 89.0, 60.0]
2025-08-07 08:17:35,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 9 minutes, 17 seconds)
2025-08-07 08:19:12,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:19:13,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 140.85483 ± 68.816
2025-08-07 08:19:13,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [187.33379, 160.29942, 140.0536, 21.038223, 155.08041, 73.96117, 294.95032, 103.1627, 119.35166, 153.31699]
2025-08-07 08:19:13,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [112.0, 95.0, 92.0, 21.0, 96.0, 46.0, 156.0, 69.0, 68.0, 89.0]
2025-08-07 08:19:13,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 7 minutes, 23 seconds)
2025-08-07 08:20:50,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:20:52,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 274.90662 ± 194.577
2025-08-07 08:20:52,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [22.217909, 127.8866, 261.30145, 261.127, 208.81392, 160.22575, 202.44199, 227.6498, 677.8287, 599.57306]
2025-08-07 08:20:52,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [24.0, 86.0, 148.0, 129.0, 138.0, 92.0, 118.0, 118.0, 271.0, 245.0]
2025-08-07 08:20:52,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 5 minutes, 47 seconds)
2025-08-07 08:22:28,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:22:30,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 294.65375 ± 219.338
2025-08-07 08:22:30,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [153.10115, 87.449524, 361.39417, 280.6939, 308.37405, 125.78499, 875.3063, 157.64796, 178.36734, 418.41827]
2025-08-07 08:22:30,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [101.0, 59.0, 172.0, 143.0, 140.0, 79.0, 339.0, 110.0, 122.0, 205.0]
2025-08-07 08:22:30,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 4 minutes, 4 seconds)
2025-08-07 08:24:07,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:24:09,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 273.99274 ± 220.827
2025-08-07 08:24:09,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [800.7374, 448.9694, 59.554127, 181.85213, 448.7949, 170.34663, 254.545, 192.80016, 20.548319, 161.77934]
2025-08-07 08:24:09,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [371.0, 217.0, 55.0, 101.0, 187.0, 99.0, 119.0, 113.0, 23.0, 92.0]
2025-08-07 08:24:09,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 2 minutes, 29 seconds)
2025-08-07 08:25:46,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:25:47,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 173.34660 ± 154.332
2025-08-07 08:25:47,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [24.574362, 246.7647, 15.580165, 121.28414, 497.20367, 13.051512, 224.18123, 286.3045, 17.636232, 286.88556]
2025-08-07 08:25:47,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [25.0, 128.0, 21.0, 86.0, 216.0, 17.0, 124.0, 162.0, 22.0, 125.0]
2025-08-07 08:25:47,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 45 seconds)
2025-08-07 08:27:23,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:27:25,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 146.00510 ± 111.263
2025-08-07 08:27:25,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [216.02751, 210.59741, 110.8314, 397.43503, 17.746763, 176.95235, 154.45854, 139.89613, 19.63164, 16.474241]
2025-08-07 08:27:25,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [130.0, 130.0, 70.0, 201.0, 22.0, 111.0, 108.0, 93.0, 22.0, 17.0]
2025-08-07 08:27:25,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 65/100 (estimated time remaining: 58 minutes, 57 seconds)
2025-08-07 08:29:01,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:29:02,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 179.97227 ± 140.292
2025-08-07 08:29:02,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [383.02713, 58.86082, 140.66081, 380.48825, 192.13608, 148.7367, 371.15012, 14.52653, 93.27887, 16.857533]
2025-08-07 08:29:02,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [182.0, 53.0, 100.0, 158.0, 109.0, 86.0, 142.0, 22.0, 71.0, 17.0]
2025-08-07 08:29:02,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 66/100 (estimated time remaining: 57 minutes, 8 seconds)
2025-08-07 08:30:40,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:30:42,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 214.76157 ± 210.424
2025-08-07 08:30:42,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [205.65414, 24.417116, 32.032288, 112.040695, 292.095, 96.13459, 594.2902, 17.21014, 163.08362, 610.65784]
2025-08-07 08:30:42,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [115.0, 25.0, 38.0, 83.0, 164.0, 66.0, 258.0, 19.0, 94.0, 241.0]
2025-08-07 08:30:42,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 67/100 (estimated time remaining: 55 minutes, 44 seconds)
2025-08-07 08:32:18,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:32:20,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 267.86200 ± 282.187
2025-08-07 08:32:20,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [147.70901, 20.737886, 143.43327, 241.33975, 979.61664, 118.69022, 115.45114, 617.9187, 128.13045, 165.59314]
2025-08-07 08:32:20,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [92.0, 22.0, 90.0, 120.0, 391.0, 74.0, 70.0, 248.0, 82.0, 94.0]
2025-08-07 08:32:20,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 68/100 (estimated time remaining: 54 minutes)
2025-08-07 08:33:55,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:33:57,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 334.27188 ± 364.156
2025-08-07 08:33:57,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [205.56644, 312.05017, 1091.0529, 103.305336, 189.32668, 15.250182, 160.29323, 1003.1916, 138.56754, 124.114426]
2025-08-07 08:33:57,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [122.0, 176.0, 394.0, 69.0, 118.0, 23.0, 95.0, 409.0, 92.0, 82.0]
2025-08-07 08:33:57,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 69/100 (estimated time remaining: 52 minutes, 15 seconds)
2025-08-07 08:35:35,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:35:37,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 220.30258 ± 189.656
2025-08-07 08:35:37,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [156.15627, 516.9943, 25.643152, 172.95593, 150.9853, 100.527824, 445.83524, 532.7172, 83.81025, 17.40014]
2025-08-07 08:35:37,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [92.0, 239.0, 25.0, 102.0, 93.0, 59.0, 211.0, 233.0, 62.0, 24.0]
2025-08-07 08:35:37,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 70/100 (estimated time remaining: 50 minutes, 50 seconds)
2025-08-07 08:37:13,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:37:15,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 340.20746 ± 268.600
2025-08-07 08:37:15,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [475.10684, 441.80698, 279.6449, 952.7775, 18.031086, 199.50685, 143.74066, 601.7999, 85.4101, 204.24995]
2025-08-07 08:37:15,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [203.0, 195.0, 135.0, 339.0, 25.0, 112.0, 87.0, 234.0, 68.0, 111.0]
2025-08-07 08:37:15,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1226 [INFO]: New best (340.21) for latency ExtremeClogL1U23
2025-08-07 08:37:15,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 71/100 (estimated time remaining: 49 minutes, 21 seconds)
2025-08-07 08:38:51,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:38:53,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 222.17334 ± 195.858
2025-08-07 08:38:53,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [248.42473, 19.337404, 18.59524, 195.00322, 232.68628, 759.0951, 233.87279, 207.57466, 180.3438, 126.800385]
2025-08-07 08:38:53,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [153.0, 25.0, 23.0, 119.0, 124.0, 328.0, 125.0, 108.0, 104.0, 86.0]
2025-08-07 08:38:53,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 72/100 (estimated time remaining: 47 minutes, 27 seconds)
2025-08-07 08:40:29,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:40:31,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 280.43692 ± 229.489
2025-08-07 08:40:31,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [18.78641, 465.45694, 74.690575, 441.82324, 738.949, 17.204224, 120.48458, 481.9281, 182.74821, 262.29785]
2025-08-07 08:40:31,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [21.0, 192.0, 53.0, 193.0, 327.0, 25.0, 103.0, 225.0, 101.0, 125.0]
2025-08-07 08:40:31,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 73/100 (estimated time remaining: 45 minutes, 52 seconds)
2025-08-07 08:42:08,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:42:09,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 262.53903 ± 249.308
2025-08-07 08:42:09,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [15.787795, 16.462116, 377.7282, 722.6123, 179.92802, 699.58844, 147.5669, 17.050665, 202.76923, 245.89641]
2025-08-07 08:42:09,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [21.0, 23.0, 169.0, 282.0, 104.0, 276.0, 86.0, 22.0, 109.0, 127.0]
2025-08-07 08:42:09,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 74/100 (estimated time remaining: 44 minutes, 15 seconds)
2025-08-07 08:43:46,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:43:48,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 241.31311 ± 232.003
2025-08-07 08:43:48,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [102.677055, 204.06557, 135.79488, 400.91687, 870.18024, 243.36919, 10.911287, 92.14624, 206.79845, 146.27151]
2025-08-07 08:43:48,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [61.0, 126.0, 83.0, 203.0, 353.0, 123.0, 14.0, 68.0, 120.0, 101.0]
2025-08-07 08:43:48,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 75/100 (estimated time remaining: 42 minutes, 34 seconds)
2025-08-07 08:45:25,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:45:26,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 226.11880 ± 155.174
2025-08-07 08:45:26,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [23.255688, 337.41492, 327.51434, 169.56401, 315.6028, 263.3086, 19.232904, 498.04276, 17.384468, 289.8675]
2025-08-07 08:45:26,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [22.0, 151.0, 181.0, 98.0, 145.0, 147.0, 22.0, 222.0, 20.0, 152.0]
2025-08-07 08:45:26,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 76/100 (estimated time remaining: 40 minutes, 53 seconds)
2025-08-07 08:47:03,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:47:04,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 158.20651 ± 84.490
2025-08-07 08:47:04,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [185.08311, 50.549126, 91.798836, 13.672461, 212.9115, 156.16087, 231.29434, 237.39656, 289.34134, 113.85702]
2025-08-07 08:47:04,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [117.0, 44.0, 62.0, 18.0, 116.0, 88.0, 123.0, 138.0, 133.0, 71.0]
2025-08-07 08:47:04,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 77/100 (estimated time remaining: 39 minutes, 19 seconds)
2025-08-07 08:48:40,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:48:41,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 183.31302 ± 106.400
2025-08-07 08:48:41,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [223.73119, 225.51968, 144.73978, 256.57703, 108.492165, 254.74078, 18.0258, 20.625517, 202.18483, 378.49335]
2025-08-07 08:48:41,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [119.0, 132.0, 84.0, 118.0, 78.0, 124.0, 21.0, 23.0, 124.0, 181.0]
2025-08-07 08:48:42,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 78/100 (estimated time remaining: 37 minutes, 36 seconds)
2025-08-07 08:50:21,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:50:23,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 456.01434 ± 204.903
2025-08-07 08:50:23,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [512.92847, 619.61914, 121.41875, 158.4238, 273.87836, 502.30167, 470.2571, 439.72488, 766.43005, 695.1611]
2025-08-07 08:50:23,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [216.0, 267.0, 79.0, 103.0, 148.0, 233.0, 219.0, 201.0, 341.0, 296.0]
2025-08-07 08:50:23,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1226 [INFO]: New best (456.01) for latency ExtremeClogL1U23
2025-08-07 08:50:23,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 79/100 (estimated time remaining: 36 minutes, 14 seconds)
2025-08-07 08:51:57,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:51:59,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 273.39301 ± 201.465
2025-08-07 08:51:59,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [12.677344, 335.71252, 409.12595, 20.59251, 352.51703, 346.7582, 157.82033, 18.749063, 440.73166, 639.2454]
2025-08-07 08:51:59,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [20.0, 168.0, 170.0, 22.0, 163.0, 159.0, 92.0, 19.0, 212.0, 247.0]
2025-08-07 08:51:59,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 80/100 (estimated time remaining: 34 minutes, 23 seconds)
2025-08-07 08:53:38,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:53:40,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 313.51083 ± 197.497
2025-08-07 08:53:40,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [243.31737, 192.45802, 206.65695, 146.55101, 681.0781, 224.90071, 182.63167, 495.54486, 127.16652, 634.80334]
2025-08-07 08:53:40,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [138.0, 109.0, 121.0, 92.0, 291.0, 118.0, 96.0, 214.0, 74.0, 270.0]
2025-08-07 08:53:40,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 81/100 (estimated time remaining: 32 minutes, 53 seconds)
2025-08-07 08:55:16,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:55:18,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 198.51537 ± 170.606
2025-08-07 08:55:18,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [14.704687, 173.47856, 301.09152, 216.48212, 144.33844, 609.4089, 182.34023, 301.63196, 19.229332, 22.447958]
2025-08-07 08:55:18,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [20.0, 108.0, 152.0, 127.0, 85.0, 250.0, 116.0, 156.0, 22.0, 24.0]
2025-08-07 08:55:18,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 82/100 (estimated time remaining: 31 minutes, 14 seconds)
2025-08-07 08:56:52,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:56:54,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 283.70331 ± 188.240
2025-08-07 08:56:54,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [289.98712, 36.395584, 119.61273, 238.12967, 364.05533, 31.41021, 431.98807, 646.68805, 469.92627, 208.84015]
2025-08-07 08:56:54,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [168.0, 39.0, 82.0, 121.0, 179.0, 36.0, 187.0, 283.0, 191.0, 122.0]
2025-08-07 08:56:54,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 83/100 (estimated time remaining: 29 minutes, 32 seconds)
2025-08-07 08:58:30,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:58:33,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 496.48138 ± 349.624
2025-08-07 08:58:33,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [804.5697, 196.39497, 850.5181, 638.18286, 1170.2272, 269.01227, 461.9875, 16.994448, 451.87292, 105.0542]
2025-08-07 08:58:33,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [327.0, 120.0, 368.0, 257.0, 479.0, 146.0, 203.0, 19.0, 212.0, 72.0]
2025-08-07 08:58:33,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1226 [INFO]: New best (496.48) for latency ExtremeClogL1U23
2025-08-07 08:58:33,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 84/100 (estimated time remaining: 27 minutes, 45 seconds)
2025-08-07 09:00:11,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:00:12,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 244.30411 ± 206.333
2025-08-07 09:00:12,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [20.349358, 25.665224, 98.38095, 666.19495, 226.70459, 132.64165, 132.50092, 468.67596, 470.24878, 201.67882]
2025-08-07 09:00:12,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [24.0, 25.0, 60.0, 253.0, 126.0, 83.0, 89.0, 199.0, 199.0, 118.0]
2025-08-07 09:00:12,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 85/100 (estimated time remaining: 26 minutes, 18 seconds)
2025-08-07 09:01:50,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:01:52,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 324.19537 ± 224.191
2025-08-07 09:01:52,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [271.2387, 840.49243, 637.2526, 170.85631, 131.51657, 226.72993, 149.81735, 253.28174, 166.73845, 394.02985]
2025-08-07 09:01:52,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [138.0, 323.0, 253.0, 109.0, 83.0, 134.0, 97.0, 134.0, 106.0, 179.0]
2025-08-07 09:01:52,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 86/100 (estimated time remaining: 24 minutes, 37 seconds)
2025-08-07 09:03:27,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:03:29,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 236.05083 ± 228.266
2025-08-07 09:03:29,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [358.065, 141.57457, 200.53986, 858.79803, 201.45494, 222.30276, 18.314465, 23.92758, 117.52427, 218.00671]
2025-08-07 09:03:29,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [177.0, 84.0, 116.0, 356.0, 121.0, 118.0, 21.0, 24.0, 78.0, 113.0]
2025-08-07 09:03:29,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 87/100 (estimated time remaining: 22 minutes, 54 seconds)
2025-08-07 09:05:06,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:05:07,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 161.20209 ± 167.663
2025-08-07 09:05:07,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [607.14996, 100.250015, 13.288298, 15.681052, 231.90134, 170.61583, 91.15896, 125.77746, 233.0515, 23.146437]
2025-08-07 09:05:07,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [229.0, 67.0, 16.0, 20.0, 127.0, 105.0, 67.0, 88.0, 118.0, 24.0]
2025-08-07 09:05:07,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 88/100 (estimated time remaining: 21 minutes, 21 seconds)
2025-08-07 09:06:43,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:06:44,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 327.89374 ± 257.962
2025-08-07 09:06:44,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [24.657331, 516.4303, 251.74738, 878.77625, 95.52853, 356.99948, 202.43143, 636.1168, 210.32074, 105.92893]
2025-08-07 09:06:44,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [25.0, 228.0, 126.0, 369.0, 62.0, 189.0, 110.0, 234.0, 114.0, 75.0]
2025-08-07 09:06:45,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 89/100 (estimated time remaining: 19 minutes, 39 seconds)
2025-08-07 09:08:23,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:08:25,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 292.66071 ± 188.596
2025-08-07 09:08:25,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [132.92273, 191.0706, 208.07892, 231.9206, 223.37556, 294.43262, 707.20874, 419.7505, 497.58804, 20.258623]
2025-08-07 09:08:25,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [82.0, 113.0, 127.0, 132.0, 130.0, 146.0, 288.0, 189.0, 222.0, 22.0]
2025-08-07 09:08:25,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 90/100 (estimated time remaining: 18 minutes, 3 seconds)
2025-08-07 09:10:01,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:10:03,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 431.11182 ± 585.786
2025-08-07 09:10:03,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [2127.4395, 421.60846, 181.215, 239.67311, 107.97025, 537.49854, 150.54413, 393.1552, 18.405521, 133.60799]
2025-08-07 09:10:03,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [756.0, 187.0, 102.0, 147.0, 80.0, 222.0, 87.0, 178.0, 24.0, 84.0]
2025-08-07 09:10:03,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 91/100 (estimated time remaining: 16 minutes, 22 seconds)
2025-08-07 09:11:39,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:11:40,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 244.97635 ± 336.601
2025-08-07 09:11:40,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [299.60233, 17.353216, 523.42456, 182.55215, 102.08989, 25.01305, 1144.932, 21.322409, 114.14603, 19.327856]
2025-08-07 09:11:40,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [148.0, 22.0, 224.0, 109.0, 73.0, 24.0, 440.0, 25.0, 80.0, 24.0]
2025-08-07 09:11:40,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 92/100 (estimated time remaining: 14 minutes, 45 seconds)
2025-08-07 09:13:17,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:13:18,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 268.58868 ± 264.485
2025-08-07 09:13:18,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [867.05054, 341.28656, 20.539654, 602.33374, 13.750199, 14.8173485, 132.87817, 192.75124, 183.21773, 317.2615]
2025-08-07 09:13:18,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [339.0, 157.0, 20.0, 234.0, 17.0, 23.0, 82.0, 121.0, 94.0, 157.0]
2025-08-07 09:13:18,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 93/100 (estimated time remaining: 13 minutes, 6 seconds)
2025-08-07 09:14:55,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:14:57,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 303.24176 ± 220.489
2025-08-07 09:14:57,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [202.75534, 261.5216, 441.71054, 17.394287, 614.3565, 95.91757, 436.94208, 20.327097, 670.27924, 271.21347]
2025-08-07 09:14:57,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [116.0, 145.0, 198.0, 20.0, 245.0, 73.0, 214.0, 25.0, 256.0, 131.0]
2025-08-07 09:14:57,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 94/100 (estimated time remaining: 11 minutes, 29 seconds)
2025-08-07 09:16:35,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:16:37,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 438.82452 ± 477.323
2025-08-07 09:16:37,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [126.33346, 144.36105, 22.01007, 438.25055, 681.51385, 1208.5668, 12.374805, 95.481, 1405.6992, 253.65425]
2025-08-07 09:16:37,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [95.0, 95.0, 25.0, 189.0, 262.0, 448.0, 18.0, 71.0, 558.0, 130.0]
2025-08-07 09:16:37,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 95/100 (estimated time remaining: 9 minutes, 50 seconds)
2025-08-07 09:18:15,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:18:18,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 549.59741 ± 541.580
2025-08-07 09:18:18,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [472.91635, 260.5433, 1782.183, 1352.116, 224.46109, 19.51636, 170.71118, 310.31924, 643.49414, 259.71344]
2025-08-07 09:18:18,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [233.0, 126.0, 703.0, 497.0, 117.0, 25.0, 99.0, 163.0, 247.0, 138.0]
2025-08-07 09:18:18,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1226 [INFO]: New best (549.60) for latency ExtremeClogL1U23
2025-08-07 09:18:18,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 14 seconds)
2025-08-07 09:19:56,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:19:59,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 412.96527 ± 319.178
2025-08-07 09:19:59,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [233.92566, 1071.2416, 366.8635, 364.10123, 136.41861, 291.0308, 17.52336, 859.83606, 618.7924, 169.9196]
2025-08-07 09:19:59,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [119.0, 425.0, 172.0, 172.0, 98.0, 148.0, 18.0, 343.0, 275.0, 97.0]
2025-08-07 09:19:59,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 38 seconds)
2025-08-07 09:21:34,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:21:36,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 234.68466 ± 128.334
2025-08-07 09:21:36,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [154.47401, 18.329062, 390.07968, 398.776, 425.75793, 127.335335, 193.00879, 166.49269, 181.73322, 290.85974]
2025-08-07 09:21:36,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [91.0, 21.0, 182.0, 175.0, 176.0, 80.0, 122.0, 107.0, 100.0, 143.0]
2025-08-07 09:21:36,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 98/100 (estimated time remaining: 4 minutes, 58 seconds)
2025-08-07 09:23:13,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:23:15,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 276.36304 ± 230.218
2025-08-07 09:23:15,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [140.21452, 167.8881, 718.5731, 17.020105, 110.746284, 448.7285, 131.61348, 118.343544, 268.39694, 642.1058]
2025-08-07 09:23:15,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [87.0, 97.0, 266.0, 23.0, 69.0, 195.0, 82.0, 77.0, 140.0, 255.0]
2025-08-07 09:23:15,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 19 seconds)
2025-08-07 09:24:52,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:24:53,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 233.36295 ± 189.473
2025-08-07 09:24:53,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [311.89502, 634.54205, 199.3056, 14.257745, 255.50096, 22.225388, 227.47467, 200.71393, 453.57428, 14.139733]
2025-08-07 09:24:53,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [145.0, 268.0, 120.0, 21.0, 130.0, 23.0, 119.0, 119.0, 206.0, 25.0]
2025-08-07 09:24:53,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 39 seconds)
2025-08-07 09:26:30,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:26:33,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1221 [DEBUG]: Total Reward: 389.90118 ± 345.580
2025-08-07 09:26:33,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1222 [DEBUG]: All rewards: [533.79297, 233.80963, 13.990405, 291.31934, 890.84015, 193.45068, 257.26733, 207.40164, 121.82277, 1155.317]
2025-08-07 09:26:33,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1223 [DEBUG]: All trajectory lengths: [234.0, 130.0, 18.0, 151.0, 358.0, 123.0, 140.0, 128.0, 87.0, 458.0]
2025-08-07 09:26:33,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-hopper):1251 [DEBUG]: Training session finished
