2025-08-07 00:48:36,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc5-hopper/ExtremeSparseL4U32-bpql-mem32
2025-08-07 00:48:36,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc5-hopper/ExtremeSparseL4U32-bpql-mem32
2025-08-07 00:48:36,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x148f5a807e90>}
2025-08-07 00:48:36,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1111 [DEBUG]: using device: cuda
2025-08-07 00:48:36,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1133 [INFO]: Creating new trainer
2025-08-07 00:48:36,375 baseline-bpql-noiseperc5-hopper:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=107, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-08-07 00:48:36,375 baseline-bpql-noiseperc5-hopper:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 00:48:37,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1194 [DEBUG]: Starting training session...
2025-08-07 00:48:37,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 1/100
2025-08-07 00:50:10,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:50:11,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 75.30912 ± 28.638
2025-08-07 00:50:11,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [109.370834, 68.93545, 67.26584, 99.6431, 74.73155, 76.40669, 101.84412, 102.95065, 27.682508, 24.260492]
2025-08-07 00:50:11,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [68.0, 51.0, 51.0, 73.0, 69.0, 68.0, 69.0, 67.0, 31.0, 27.0]
2025-08-07 00:50:11,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1226 [INFO]: New best (75.31) for latency ExtremeSparseL4U32
2025-08-07 00:50:11,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 34 minutes, 22 seconds)
2025-08-07 00:51:49,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:51:51,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 154.75851 ± 61.228
2025-08-07 00:51:51,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [188.61983, 229.92744, 107.00309, 116.66987, 234.49622, 26.43543, 161.4777, 153.82762, 208.70996, 120.41815]
2025-08-07 00:51:51,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [107.0, 128.0, 73.0, 80.0, 149.0, 31.0, 100.0, 97.0, 128.0, 87.0]
2025-08-07 00:51:51,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1226 [INFO]: New best (154.76) for latency ExtremeSparseL4U32
2025-08-07 00:51:51,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 38 minutes, 5 seconds)
2025-08-07 00:53:30,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:53:31,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 106.83181 ± 33.063
2025-08-07 00:53:31,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [67.77583, 91.38373, 147.9435, 84.725945, 135.84833, 171.21852, 81.85952, 81.50324, 84.224236, 121.835365]
2025-08-07 00:53:31,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [54.0, 68.0, 99.0, 69.0, 87.0, 100.0, 62.0, 63.0, 59.0, 93.0]
2025-08-07 00:53:31,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 38 minutes, 30 seconds)
2025-08-07 00:55:09,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:55:11,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 129.01662 ± 73.874
2025-08-07 00:55:11,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [103.78985, 147.13065, 241.52124, 159.73245, 67.97634, 95.50064, 111.29125, 27.846035, 63.720215, 271.65747]
2025-08-07 00:55:11,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [83.0, 97.0, 121.0, 104.0, 58.0, 74.0, 81.0, 32.0, 58.0, 135.0]
2025-08-07 00:55:11,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 37 minutes, 29 seconds)
2025-08-07 00:56:50,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:56:51,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 115.44808 ± 33.533
2025-08-07 00:56:51,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [95.189735, 133.31824, 157.67802, 139.42212, 133.84273, 168.21275, 72.88872, 76.015495, 77.45049, 100.462524]
2025-08-07 00:56:51,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [71.0, 92.0, 103.0, 95.0, 86.0, 110.0, 55.0, 62.0, 62.0, 69.0]
2025-08-07 00:56:51,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 36 minutes, 31 seconds)
2025-08-07 00:58:30,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:58:31,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 103.29227 ± 43.302
2025-08-07 00:58:31,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [103.76722, 105.58457, 118.18268, 117.57948, 142.56178, 93.005035, 70.82964, 27.827597, 60.391766, 193.19296]
2025-08-07 00:58:31,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [80.0, 78.0, 81.0, 76.0, 97.0, 74.0, 63.0, 32.0, 57.0, 127.0]
2025-08-07 00:58:31,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 36 minutes, 54 seconds)
2025-08-07 01:00:10,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:00:11,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 113.47675 ± 64.075
2025-08-07 01:00:11,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [177.55891, 67.0017, 156.85625, 26.76096, 75.538574, 147.31717, 27.309282, 117.03661, 236.9439, 102.44409]
2025-08-07 01:00:11,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [125.0, 62.0, 112.0, 27.0, 65.0, 94.0, 30.0, 85.0, 144.0, 74.0]
2025-08-07 01:00:11,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 35 minutes, 12 seconds)
2025-08-07 01:01:51,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:01:53,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 140.07916 ± 39.916
2025-08-07 01:01:53,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [136.34229, 176.15659, 81.49061, 101.941666, 151.75389, 119.80728, 181.88892, 84.801025, 165.38045, 201.22888]
2025-08-07 01:01:53,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [99.0, 113.0, 60.0, 75.0, 104.0, 89.0, 110.0, 70.0, 119.0, 129.0]
2025-08-07 01:01:53,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 33 minutes, 47 seconds)
2025-08-07 01:03:31,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:03:33,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 150.62553 ± 67.840
2025-08-07 01:03:33,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [136.46233, 132.206, 132.10892, 333.9978, 96.82897, 87.98156, 140.01678, 193.60992, 98.46382, 154.57918]
2025-08-07 01:03:33,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [105.0, 97.0, 113.0, 187.0, 80.0, 69.0, 98.0, 134.0, 73.0, 107.0]
2025-08-07 01:03:33,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 32 minutes, 23 seconds)
2025-08-07 01:05:13,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:05:14,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 88.59586 ± 55.639
2025-08-07 01:05:14,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [28.542173, 91.25461, 22.179317, 77.071106, 104.04935, 76.06332, 235.31105, 62.80723, 102.92573, 85.754616]
2025-08-07 01:05:14,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [33.0, 73.0, 27.0, 63.0, 79.0, 59.0, 165.0, 56.0, 86.0, 70.0]
2025-08-07 01:05:14,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 30 minutes, 56 seconds)
2025-08-07 01:06:53,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:06:54,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 82.13745 ± 27.327
2025-08-07 01:06:54,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [74.03908, 69.92497, 67.24612, 89.61985, 138.7799, 96.08859, 23.741793, 92.94443, 88.63157, 80.35815]
2025-08-07 01:06:54,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [55.0, 49.0, 49.0, 63.0, 99.0, 69.0, 28.0, 66.0, 63.0, 59.0]
2025-08-07 01:06:54,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 29 minutes, 1 second)
2025-08-07 01:08:33,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:08:33,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 68.82002 ± 41.897
2025-08-07 01:08:33,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [163.32564, 33.988934, 135.82756, 62.93599, 60.94408, 47.82056, 34.726196, 55.51947, 51.611744, 41.500034]
2025-08-07 01:08:33,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [96.0, 37.0, 96.0, 58.0, 52.0, 45.0, 38.0, 51.0, 49.0, 42.0]
2025-08-07 01:08:33,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 27 minutes, 17 seconds)
2025-08-07 01:10:12,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:10:13,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 115.14345 ± 54.051
2025-08-07 01:10:13,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [116.01947, 70.3969, 99.76875, 90.44283, 52.871357, 185.53752, 81.78841, 116.1419, 241.92203, 96.545296]
2025-08-07 01:10:13,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [73.0, 59.0, 76.0, 68.0, 46.0, 119.0, 59.0, 89.0, 139.0, 66.0]
2025-08-07 01:10:13,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 25 minutes, 9 seconds)
2025-08-07 01:11:52,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:11:54,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 112.96017 ± 55.286
2025-08-07 01:11:54,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [138.80452, 207.10933, 54.99089, 71.45608, 91.18133, 122.199104, 116.93084, 195.12308, 111.70379, 20.102797]
2025-08-07 01:11:54,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [95.0, 113.0, 49.0, 57.0, 62.0, 88.0, 78.0, 119.0, 83.0, 23.0]
2025-08-07 01:11:54,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 23 minutes, 29 seconds)
2025-08-07 01:13:32,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:13:33,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 135.77261 ± 101.203
2025-08-07 01:13:33,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [129.75421, 80.90521, 90.934875, 64.87119, 273.86288, 383.5167, 59.8181, 84.8956, 86.24901, 102.91836]
2025-08-07 01:13:33,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [86.0, 65.0, 64.0, 53.0, 140.0, 178.0, 52.0, 66.0, 66.0, 77.0]
2025-08-07 01:13:33,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 21 minutes, 22 seconds)
2025-08-07 01:15:12,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:15:14,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 88.00616 ± 42.785
2025-08-07 01:15:14,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [109.860466, 153.88057, 105.67237, 28.541355, 54.853256, 60.68885, 73.78488, 68.73319, 167.10243, 56.944263]
2025-08-07 01:15:14,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [75.0, 105.0, 75.0, 30.0, 48.0, 52.0, 60.0, 59.0, 106.0, 53.0]
2025-08-07 01:15:14,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 19 minutes, 57 seconds)
2025-08-07 01:16:53,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:16:54,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 100.26336 ± 40.841
2025-08-07 01:16:54,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [105.324814, 68.25297, 104.5915, 125.65691, 50.423508, 68.60983, 148.44019, 63.23025, 186.51227, 81.5913]
2025-08-07 01:16:54,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [77.0, 59.0, 78.0, 94.0, 46.0, 60.0, 97.0, 54.0, 112.0, 67.0]
2025-08-07 01:16:54,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 18 minutes, 31 seconds)
2025-08-07 01:18:33,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:18:35,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 158.13618 ± 69.519
2025-08-07 01:18:35,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [112.92541, 221.48412, 206.5079, 105.21423, 158.78816, 90.89803, 109.41843, 301.66278, 203.42072, 71.04198]
2025-08-07 01:18:35,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [84.0, 113.0, 106.0, 85.0, 93.0, 65.0, 86.0, 155.0, 126.0, 55.0]
2025-08-07 01:18:35,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1226 [INFO]: New best (158.14) for latency ExtremeSparseL4U32
2025-08-07 01:18:35,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 17 minutes, 9 seconds)
2025-08-07 01:20:14,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:20:15,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 105.31580 ± 63.796
2025-08-07 01:20:15,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [29.223047, 136.50273, 93.93955, 45.03525, 72.485214, 119.53448, 101.597725, 272.34622, 74.14398, 108.349754]
2025-08-07 01:20:15,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [32.0, 88.0, 75.0, 41.0, 60.0, 87.0, 70.0, 135.0, 66.0, 83.0]
2025-08-07 01:20:15,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 15 minutes, 20 seconds)
2025-08-07 01:21:53,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:21:55,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 155.50581 ± 110.605
2025-08-07 01:21:55,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [315.44037, 23.078121, 162.50665, 181.31346, 91.37545, 36.28028, 355.82394, 73.645035, 239.45609, 76.13867]
2025-08-07 01:21:55,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [142.0, 24.0, 102.0, 110.0, 71.0, 39.0, 166.0, 61.0, 135.0, 65.0]
2025-08-07 01:21:55,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 13 minutes, 43 seconds)
2025-08-07 01:23:34,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:23:35,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 74.90781 ± 26.282
2025-08-07 01:23:35,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [30.176208, 84.17428, 79.53276, 74.67238, 55.437344, 95.1683, 37.700027, 71.06239, 104.84796, 116.30648]
2025-08-07 01:23:35,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [33.0, 62.0, 62.0, 60.0, 51.0, 72.0, 39.0, 60.0, 86.0, 92.0]
2025-08-07 01:23:35,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 11 minutes, 58 seconds)
2025-08-07 01:25:13,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:25:14,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 66.16858 ± 30.203
2025-08-07 01:25:14,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [71.47881, 31.267393, 47.267876, 60.714096, 72.87826, 52.320045, 115.65449, 23.503178, 65.819534, 120.78211]
2025-08-07 01:25:14,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [59.0, 32.0, 50.0, 56.0, 67.0, 48.0, 91.0, 26.0, 56.0, 84.0]
2025-08-07 01:25:14,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 10 minutes, 1 second)
2025-08-07 01:26:54,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:26:55,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 134.86343 ± 50.658
2025-08-07 01:26:55,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [182.20323, 142.68657, 27.65591, 105.56914, 118.20348, 192.96712, 178.40375, 75.04696, 150.62476, 175.27336]
2025-08-07 01:26:55,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [122.0, 105.0, 32.0, 74.0, 85.0, 116.0, 113.0, 61.0, 110.0, 105.0]
2025-08-07 01:26:55,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 8 minutes, 20 seconds)
2025-08-07 01:28:33,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:28:35,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 127.32832 ± 52.499
2025-08-07 01:28:35,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [142.79955, 106.01481, 112.81237, 98.709465, 218.23732, 183.27852, 22.43436, 155.31154, 151.75154, 81.93371]
2025-08-07 01:28:35,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [112.0, 88.0, 73.0, 77.0, 127.0, 138.0, 26.0, 100.0, 104.0, 68.0]
2025-08-07 01:28:35,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 6 minutes, 35 seconds)
2025-08-07 01:30:14,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:30:15,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 125.88092 ± 68.200
2025-08-07 01:30:15,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [50.65196, 182.16379, 109.67983, 103.861465, 240.32938, 60.346325, 226.91034, 137.36781, 117.16623, 30.331976]
2025-08-07 01:30:15,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [53.0, 137.0, 76.0, 78.0, 154.0, 54.0, 126.0, 100.0, 83.0, 32.0]
2025-08-07 01:30:15,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 5 minutes, 4 seconds)
2025-08-07 01:31:54,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:31:55,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 86.33886 ± 40.733
2025-08-07 01:31:55,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [62.72386, 82.19946, 68.761795, 144.40889, 137.06404, 24.433828, 151.70102, 65.946495, 54.116215, 72.03301]
2025-08-07 01:31:55,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [55.0, 61.0, 57.0, 92.0, 98.0, 25.0, 92.0, 58.0, 52.0, 61.0]
2025-08-07 01:31:56,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 3 minutes, 32 seconds)
2025-08-07 01:33:34,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:33:35,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 82.37425 ± 47.374
2025-08-07 01:33:35,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [45.84059, 65.80832, 135.29352, 28.196455, 123.66454, 98.79325, 26.007128, 142.7506, 22.687048, 134.7011]
2025-08-07 01:33:35,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [50.0, 58.0, 96.0, 30.0, 92.0, 72.0, 30.0, 96.0, 25.0, 96.0]
2025-08-07 01:33:35,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 1 minute, 49 seconds)
2025-08-07 01:35:14,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:35:15,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 69.83296 ± 41.518
2025-08-07 01:35:15,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [81.51585, 50.3603, 171.53714, 95.120476, 48.059334, 31.586386, 50.729637, 99.27132, 43.84412, 26.30505]
2025-08-07 01:35:15,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [64.0, 52.0, 118.0, 72.0, 44.0, 29.0, 46.0, 74.0, 45.0, 32.0]
2025-08-07 01:35:15,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 4 seconds)
2025-08-07 01:36:54,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:36:55,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 88.53059 ± 37.586
2025-08-07 01:36:55,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [88.007195, 25.567255, 62.499485, 116.08708, 119.06022, 103.04752, 155.52058, 109.94168, 59.17231, 46.402546]
2025-08-07 01:36:55,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [69.0, 33.0, 58.0, 90.0, 90.0, 79.0, 106.0, 87.0, 52.0, 48.0]
2025-08-07 01:36:55,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 30/100 (estimated time remaining: 1 hour, 58 minutes, 23 seconds)
2025-08-07 01:38:34,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:38:35,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 83.56035 ± 34.968
2025-08-07 01:38:35,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [30.75332, 133.39072, 69.87706, 93.49881, 87.86978, 64.84725, 85.70117, 27.80681, 134.9912, 106.86734]
2025-08-07 01:38:35,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [31.0, 108.0, 56.0, 67.0, 63.0, 52.0, 68.0, 29.0, 84.0, 84.0]
2025-08-07 01:38:35,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 56 minutes, 33 seconds)
2025-08-07 01:40:14,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:40:15,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 106.51156 ± 49.875
2025-08-07 01:40:15,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [71.292496, 116.232216, 124.28535, 157.35097, 140.20372, 44.969845, 27.475039, 107.294266, 76.826454, 199.18524]
2025-08-07 01:40:15,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [55.0, 88.0, 94.0, 104.0, 100.0, 44.0, 32.0, 74.0, 64.0, 123.0]
2025-08-07 01:40:15,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 54 minutes, 58 seconds)
2025-08-07 01:41:54,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:41:55,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 100.91318 ± 38.249
2025-08-07 01:41:55,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [67.28937, 90.98301, 49.024303, 145.99568, 112.46882, 171.93077, 105.02967, 85.85506, 129.19266, 51.36251]
2025-08-07 01:41:55,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [52.0, 77.0, 48.0, 95.0, 81.0, 109.0, 77.0, 72.0, 94.0, 50.0]
2025-08-07 01:41:55,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 53 minutes, 18 seconds)
2025-08-07 01:43:34,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:43:35,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 144.38766 ± 112.066
2025-08-07 01:43:35,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [159.20172, 110.43398, 345.39877, 28.111969, 131.94595, 65.22524, 25.515493, 305.61218, 27.559855, 244.87157]
2025-08-07 01:43:35,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [117.0, 87.0, 179.0, 33.0, 85.0, 55.0, 29.0, 174.0, 31.0, 147.0]
2025-08-07 01:43:35,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 51 minutes, 39 seconds)
2025-08-07 01:45:14,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:45:15,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 75.88478 ± 46.611
2025-08-07 01:45:15,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [169.81519, 119.63826, 86.15233, 74.1256, 28.750261, 28.348656, 22.171373, 28.59938, 95.42359, 105.823265]
2025-08-07 01:45:15,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [123.0, 94.0, 76.0, 61.0, 31.0, 32.0, 26.0, 32.0, 68.0, 78.0]
2025-08-07 01:45:15,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 50 minutes, 6 seconds)
2025-08-07 01:46:55,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:46:56,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 160.31839 ± 76.997
2025-08-07 01:46:56,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [165.38303, 24.751669, 217.96646, 319.50824, 196.18314, 115.74031, 105.67551, 130.48363, 113.5261, 213.96587]
2025-08-07 01:46:56,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [102.0, 31.0, 114.0, 173.0, 122.0, 76.0, 68.0, 89.0, 70.0, 122.0]
2025-08-07 01:46:56,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1226 [INFO]: New best (160.32) for latency ExtremeSparseL4U32
2025-08-07 01:46:56,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 48 minutes, 41 seconds)
2025-08-07 01:48:36,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:48:38,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 136.03323 ± 48.947
2025-08-07 01:48:38,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [198.69461, 148.06494, 125.88988, 200.28506, 24.104078, 124.715385, 117.92305, 113.39478, 181.37616, 125.88436]
2025-08-07 01:48:38,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [112.0, 94.0, 85.0, 109.0, 30.0, 85.0, 83.0, 76.0, 98.0, 76.0]
2025-08-07 01:48:38,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 47 minutes, 9 seconds)
2025-08-07 01:50:16,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:50:18,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 176.85983 ± 108.535
2025-08-07 01:50:18,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [274.84985, 409.40768, 56.098373, 271.26376, 139.16946, 67.655426, 171.81897, 151.40753, 48.021877, 178.90544]
2025-08-07 01:50:18,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [141.0, 182.0, 49.0, 146.0, 85.0, 55.0, 97.0, 120.0, 45.0, 100.0]
2025-08-07 01:50:18,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1226 [INFO]: New best (176.86) for latency ExtremeSparseL4U32
2025-08-07 01:50:18,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 45 minutes, 37 seconds)
2025-08-07 01:51:59,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:52:00,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 174.71178 ± 106.381
2025-08-07 01:52:00,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [138.79013, 136.5819, 165.34013, 414.5625, 210.60428, 216.17278, 116.182526, 28.135914, 49.737732, 271.00986]
2025-08-07 01:52:00,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [91.0, 82.0, 98.0, 171.0, 126.0, 121.0, 77.0, 32.0, 48.0, 127.0]
2025-08-07 01:52:00,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 44 minutes, 22 seconds)
2025-08-07 01:53:38,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:53:40,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 217.49008 ± 110.486
2025-08-07 01:53:40,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [349.4003, 99.95151, 344.93542, 92.257965, 365.05292, 136.11186, 192.87532, 161.68138, 100.539246, 332.0949]
2025-08-07 01:53:40,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [164.0, 76.0, 154.0, 65.0, 161.0, 90.0, 105.0, 108.0, 80.0, 156.0]
2025-08-07 01:53:40,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1226 [INFO]: New best (217.49) for latency ExtremeSparseL4U32
2025-08-07 01:53:40,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 42 minutes, 38 seconds)
2025-08-07 01:55:20,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:55:22,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 156.56548 ± 90.778
2025-08-07 01:55:22,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [161.09406, 117.574585, 210.19098, 96.60469, 101.36641, 247.11302, 27.876863, 107.79065, 130.80446, 365.23917]
2025-08-07 01:55:22,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [95.0, 80.0, 116.0, 69.0, 70.0, 133.0, 32.0, 76.0, 95.0, 171.0]
2025-08-07 01:55:22,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 41 minutes, 4 seconds)
2025-08-07 01:57:01,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:57:02,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 93.89539 ± 70.842
2025-08-07 01:57:02,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [91.63899, 25.923342, 173.5585, 21.542164, 248.68205, 28.294872, 31.417223, 82.46798, 101.15898, 134.2697]
2025-08-07 01:57:02,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [71.0, 31.0, 105.0, 24.0, 143.0, 31.0, 33.0, 59.0, 70.0, 83.0]
2025-08-07 01:57:02,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 39 minutes, 5 seconds)
2025-08-07 01:58:41,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:58:43,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 245.13562 ± 144.203
2025-08-07 01:58:43,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [116.508514, 118.926, 373.71765, 274.54773, 259.9069, 104.83928, 152.36696, 169.862, 594.7585, 285.92258]
2025-08-07 01:58:43,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [76.0, 85.0, 163.0, 138.0, 146.0, 79.0, 103.0, 107.0, 213.0, 142.0]
2025-08-07 01:58:43,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1226 [INFO]: New best (245.14) for latency ExtremeSparseL4U32
2025-08-07 01:58:43,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 37 minutes, 45 seconds)
2025-08-07 02:00:23,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:00:24,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 190.28960 ± 93.832
2025-08-07 02:00:24,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [354.14624, 133.84024, 343.36465, 242.70157, 195.30663, 109.54074, 222.21605, 108.193214, 92.15666, 101.43001]
2025-08-07 02:00:24,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [174.0, 83.0, 164.0, 131.0, 119.0, 73.0, 135.0, 74.0, 64.0, 75.0]
2025-08-07 02:00:24,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 35 minutes, 46 seconds)
2025-08-07 02:02:04,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:02:06,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 247.20097 ± 122.490
2025-08-07 02:02:06,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [150.48271, 443.66745, 217.38853, 453.20102, 215.88853, 170.70786, 151.37453, 136.23247, 388.36063, 144.70601]
2025-08-07 02:02:06,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [89.0, 196.0, 123.0, 203.0, 109.0, 104.0, 96.0, 82.0, 174.0, 94.0]
2025-08-07 02:02:06,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1226 [INFO]: New best (247.20) for latency ExtremeSparseL4U32
2025-08-07 02:02:06,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 34 minutes, 25 seconds)
2025-08-07 02:03:46,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:03:48,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 217.87558 ± 155.580
2025-08-07 02:03:48,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [100.65125, 113.57737, 141.45813, 512.00586, 102.26781, 118.26051, 100.05999, 264.0472, 215.6785, 510.74927]
2025-08-07 02:03:48,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [75.0, 84.0, 89.0, 191.0, 79.0, 80.0, 76.0, 142.0, 121.0, 214.0]
2025-08-07 02:03:48,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 32 minutes, 47 seconds)
2025-08-07 02:05:28,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:05:30,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 272.65524 ± 167.380
2025-08-07 02:05:30,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [130.20232, 359.9304, 123.08167, 676.2345, 111.394066, 107.32539, 300.19788, 260.59885, 271.97946, 385.60782]
2025-08-07 02:05:30,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [88.0, 182.0, 86.0, 242.0, 79.0, 73.0, 148.0, 129.0, 131.0, 201.0]
2025-08-07 02:05:30,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1226 [INFO]: New best (272.66) for latency ExtremeSparseL4U32
2025-08-07 02:05:30,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 31 minutes, 29 seconds)
2025-08-07 02:07:09,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:07:11,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 218.13863 ± 61.205
2025-08-07 02:07:11,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [134.83415, 194.9145, 145.9216, 213.66115, 341.36508, 286.1816, 260.67593, 165.0102, 207.24556, 231.57634]
2025-08-07 02:07:11,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [93.0, 109.0, 91.0, 114.0, 157.0, 144.0, 149.0, 101.0, 111.0, 130.0]
2025-08-07 02:07:11,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 29 minutes, 42 seconds)
2025-08-07 02:08:51,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:08:53,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 280.76495 ± 114.293
2025-08-07 02:08:53,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [169.88837, 228.79884, 370.91727, 228.26065, 474.2646, 198.02853, 175.13373, 386.0405, 426.2026, 150.11415]
2025-08-07 02:08:53,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [100.0, 125.0, 182.0, 131.0, 231.0, 103.0, 102.0, 178.0, 197.0, 92.0]
2025-08-07 02:08:53,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1226 [INFO]: New best (280.76) for latency ExtremeSparseL4U32
2025-08-07 02:08:53,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 28 minutes, 14 seconds)
2025-08-07 02:10:33,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:10:36,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 305.07440 ± 229.454
2025-08-07 02:10:36,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [136.03523, 114.70301, 328.18097, 113.884926, 864.00586, 105.15761, 428.71466, 355.45654, 467.1916, 137.41325]
2025-08-07 02:10:36,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [93.0, 80.0, 161.0, 75.0, 354.0, 75.0, 197.0, 165.0, 205.0, 88.0]
2025-08-07 02:10:36,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1226 [INFO]: New best (305.07) for latency ExtremeSparseL4U32
2025-08-07 02:10:36,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 26 minutes, 38 seconds)
2025-08-07 02:12:15,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:12:17,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 257.19733 ± 151.947
2025-08-07 02:12:17,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [23.274359, 271.29263, 158.1337, 166.31128, 342.89795, 218.18044, 345.60272, 621.595, 165.46155, 259.22357]
2025-08-07 02:12:17,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [26.0, 137.0, 90.0, 98.0, 179.0, 122.0, 181.0, 300.0, 94.0, 129.0]
2025-08-07 02:12:17,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 24 minutes, 55 seconds)
2025-08-07 02:13:57,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:13:59,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 255.29800 ± 74.389
2025-08-07 02:13:59,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [183.4619, 367.51514, 232.33267, 179.9638, 266.3156, 119.807724, 352.00046, 274.5636, 310.58386, 266.43506]
2025-08-07 02:13:59,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [125.0, 212.0, 153.0, 96.0, 157.0, 88.0, 174.0, 138.0, 168.0, 128.0]
2025-08-07 02:13:59,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 23 minutes, 14 seconds)
2025-08-07 02:15:39,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:15:41,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 245.14230 ± 91.452
2025-08-07 02:15:41,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [130.3285, 296.7861, 345.80057, 124.404076, 182.94179, 210.27385, 399.60764, 290.1883, 158.35242, 312.74]
2025-08-07 02:15:41,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [86.0, 175.0, 191.0, 85.0, 118.0, 118.0, 204.0, 150.0, 97.0, 176.0]
2025-08-07 02:15:41,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 21 minutes, 36 seconds)
2025-08-07 02:17:22,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:17:24,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 310.07584 ± 211.492
2025-08-07 02:17:24,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [106.110146, 23.509796, 112.40792, 137.61487, 317.68176, 771.65753, 384.68076, 454.87213, 408.3696, 383.85367]
2025-08-07 02:17:24,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [72.0, 25.0, 80.0, 94.0, 172.0, 316.0, 181.0, 236.0, 206.0, 185.0]
2025-08-07 02:17:24,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1226 [INFO]: New best (310.08) for latency ExtremeSparseL4U32
2025-08-07 02:17:24,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 20 minutes)
2025-08-07 02:19:04,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:19:06,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 334.96002 ± 110.482
2025-08-07 02:19:06,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [471.85727, 414.89648, 193.66562, 287.85022, 429.52353, 437.54477, 336.9214, 407.4528, 235.01288, 134.87558]
2025-08-07 02:19:06,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [208.0, 201.0, 120.0, 161.0, 214.0, 211.0, 159.0, 202.0, 136.0, 85.0]
2025-08-07 02:19:06,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1226 [INFO]: New best (334.96) for latency ExtremeSparseL4U32
2025-08-07 02:19:06,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 18 minutes, 17 seconds)
2025-08-07 02:20:46,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:20:48,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 300.52362 ± 145.629
2025-08-07 02:20:48,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [331.1037, 165.76927, 366.43472, 117.67471, 188.22098, 405.6353, 359.5073, 421.73764, 87.70521, 561.447]
2025-08-07 02:20:48,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [186.0, 102.0, 174.0, 73.0, 112.0, 178.0, 180.0, 192.0, 66.0, 233.0]
2025-08-07 02:20:48,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 16 minutes, 39 seconds)
2025-08-07 02:22:29,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:22:31,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 223.78906 ± 101.089
2025-08-07 02:22:31,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [194.09715, 134.2208, 296.51987, 27.379398, 116.00912, 358.43362, 324.15314, 298.14792, 204.36185, 284.56793]
2025-08-07 02:22:31,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [113.0, 90.0, 157.0, 32.0, 74.0, 195.0, 185.0, 165.0, 113.0, 152.0]
2025-08-07 02:22:31,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 14 minutes, 57 seconds)
2025-08-07 02:24:10,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:24:12,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 160.10654 ± 102.334
2025-08-07 02:24:12,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [143.5927, 127.83483, 176.21323, 432.5127, 216.7168, 108.05627, 21.510601, 130.42886, 117.050026, 127.14939]
2025-08-07 02:24:12,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [101.0, 77.0, 103.0, 200.0, 121.0, 76.0, 27.0, 89.0, 84.0, 85.0]
2025-08-07 02:24:12,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 13 minutes, 11 seconds)
2025-08-07 02:25:52,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:25:54,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 305.25067 ± 212.061
2025-08-07 02:25:54,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [277.5981, 234.6684, 130.36423, 124.98945, 606.21313, 119.032, 118.34786, 771.2532, 305.4178, 364.6226]
2025-08-07 02:25:54,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [135.0, 116.0, 87.0, 83.0, 294.0, 83.0, 80.0, 379.0, 147.0, 169.0]
2025-08-07 02:25:54,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 11 minutes, 23 seconds)
2025-08-07 02:27:34,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:27:37,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 285.93762 ± 145.739
2025-08-07 02:27:37,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [275.22772, 252.15315, 26.406702, 125.90682, 140.36168, 489.57587, 453.62167, 413.26486, 391.5555, 291.30222]
2025-08-07 02:27:37,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [150.0, 132.0, 32.0, 85.0, 89.0, 241.0, 211.0, 170.0, 183.0, 154.0]
2025-08-07 02:27:37,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 9 minutes, 46 seconds)
2025-08-07 02:29:16,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:29:18,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 199.34651 ± 121.298
2025-08-07 02:29:18,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [331.43597, 150.84099, 112.74377, 420.85092, 158.79031, 215.45465, 29.86781, 273.59457, 27.986317, 271.8998]
2025-08-07 02:29:18,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [158.0, 95.0, 78.0, 197.0, 97.0, 115.0, 31.0, 142.0, 30.0, 154.0]
2025-08-07 02:29:18,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 7 minutes, 52 seconds)
2025-08-07 02:30:58,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:31:00,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 218.33655 ± 114.805
2025-08-07 02:31:00,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [134.11023, 210.97496, 185.7377, 193.89986, 218.4857, 163.89641, 424.1825, 202.06818, 421.60657, 28.403238]
2025-08-07 02:31:00,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [89.0, 119.0, 112.0, 110.0, 122.0, 107.0, 215.0, 119.0, 198.0, 31.0]
2025-08-07 02:31:00,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 6 minutes, 11 seconds)
2025-08-07 02:32:41,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:32:43,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 250.12444 ± 165.577
2025-08-07 02:32:43,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [136.73811, 230.56554, 28.718124, 171.64528, 186.45314, 634.65674, 447.5994, 147.09482, 303.90378, 213.86934]
2025-08-07 02:32:43,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [90.0, 125.0, 29.0, 103.0, 113.0, 276.0, 203.0, 95.0, 160.0, 122.0]
2025-08-07 02:32:43,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 4 minutes, 44 seconds)
2025-08-07 02:34:23,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:34:26,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 395.94669 ± 175.297
2025-08-07 02:34:26,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [190.94986, 329.84137, 326.2484, 310.1243, 681.834, 349.11212, 324.4107, 627.17285, 639.8602, 179.91301]
2025-08-07 02:34:26,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [105.0, 192.0, 173.0, 148.0, 262.0, 163.0, 158.0, 282.0, 277.0, 103.0]
2025-08-07 02:34:26,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1226 [INFO]: New best (395.95) for latency ExtremeSparseL4U32
2025-08-07 02:34:26,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 3 minutes, 10 seconds)
2025-08-07 02:36:05,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:36:07,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 278.58261 ± 158.338
2025-08-07 02:36:07,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [131.5331, 385.64645, 96.19743, 292.1654, 448.20605, 569.66907, 129.25156, 135.92325, 415.27426, 181.95982]
2025-08-07 02:36:07,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [87.0, 178.0, 73.0, 149.0, 209.0, 230.0, 90.0, 87.0, 178.0, 112.0]
2025-08-07 02:36:07,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 1 minute, 12 seconds)
2025-08-07 02:37:46,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:37:48,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 285.38834 ± 127.971
2025-08-07 02:37:48,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [122.749664, 182.58925, 496.7725, 167.07542, 400.48132, 340.01724, 300.79108, 343.79135, 400.89487, 98.72058]
2025-08-07 02:37:48,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [80.0, 105.0, 197.0, 104.0, 184.0, 173.0, 156.0, 151.0, 188.0, 68.0]
2025-08-07 02:37:48,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 66/100 (estimated time remaining: 59 minutes, 33 seconds)
2025-08-07 02:39:29,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:39:31,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 278.82965 ± 159.523
2025-08-07 02:39:31,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [554.9888, 115.988594, 28.210934, 379.24008, 194.233, 321.29834, 95.169815, 409.11517, 273.0088, 417.04315]
2025-08-07 02:39:31,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [221.0, 76.0, 30.0, 192.0, 102.0, 180.0, 86.0, 180.0, 152.0, 192.0]
2025-08-07 02:39:31,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 67/100 (estimated time remaining: 57 minutes, 56 seconds)
2025-08-07 02:41:10,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:41:12,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 272.23123 ± 199.287
2025-08-07 02:41:12,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [124.60893, 515.58527, 325.79892, 218.92761, 663.1139, 24.336294, 267.5074, 28.206493, 141.8955, 412.33182]
2025-08-07 02:41:12,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [77.0, 240.0, 170.0, 115.0, 264.0, 26.0, 138.0, 33.0, 93.0, 194.0]
2025-08-07 02:41:12,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 68/100 (estimated time remaining: 55 minutes, 58 seconds)
2025-08-07 02:42:53,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:42:55,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 383.49512 ± 278.433
2025-08-07 02:42:55,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [137.93292, 410.9298, 542.654, 100.17534, 27.741737, 604.85376, 214.81096, 907.50183, 196.55338, 691.7974]
2025-08-07 02:42:55,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [96.0, 176.0, 246.0, 76.0, 28.0, 218.0, 124.0, 331.0, 107.0, 226.0]
2025-08-07 02:42:55,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 69/100 (estimated time remaining: 54 minutes, 18 seconds)
2025-08-07 02:44:34,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:44:37,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 339.86053 ± 145.372
2025-08-07 02:44:37,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [489.32935, 432.33624, 433.81686, 138.60408, 458.14487, 387.5726, 136.4177, 415.5122, 90.97268, 415.8989]
2025-08-07 02:44:37,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [206.0, 201.0, 189.0, 86.0, 206.0, 174.0, 91.0, 194.0, 68.0, 186.0]
2025-08-07 02:44:37,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 70/100 (estimated time remaining: 52 minutes, 43 seconds)
2025-08-07 02:46:17,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:46:19,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 228.45796 ± 201.905
2025-08-07 02:46:19,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [217.77057, 139.24095, 166.24991, 822.2319, 91.702126, 165.35149, 175.5078, 228.2151, 109.95819, 168.35164]
2025-08-07 02:46:19,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [122.0, 92.0, 104.0, 346.0, 64.0, 104.0, 104.0, 125.0, 78.0, 106.0]
2025-08-07 02:46:19,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 71/100 (estimated time remaining: 51 minutes, 8 seconds)
2025-08-07 02:47:59,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:48:00,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 191.69011 ± 117.198
2025-08-07 02:48:00,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [401.50717, 171.28008, 315.8004, 309.84814, 23.708424, 118.02625, 27.490759, 175.25514, 223.24911, 150.73563]
2025-08-07 02:48:00,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [186.0, 106.0, 157.0, 152.0, 26.0, 74.0, 33.0, 103.0, 118.0, 87.0]
2025-08-07 02:48:01,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 72/100 (estimated time remaining: 49 minutes, 15 seconds)
2025-08-07 02:49:40,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:49:42,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 269.07446 ± 128.946
2025-08-07 02:49:42,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [567.98596, 164.6385, 332.23285, 334.34103, 336.8405, 118.32356, 316.22736, 152.6153, 191.19151, 176.34814]
2025-08-07 02:49:42,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [239.0, 100.0, 181.0, 172.0, 187.0, 85.0, 170.0, 90.0, 108.0, 106.0]
2025-08-07 02:49:42,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 73/100 (estimated time remaining: 47 minutes, 35 seconds)
2025-08-07 02:51:22,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:51:25,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 393.10590 ± 205.750
2025-08-07 02:51:25,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [545.84247, 234.24117, 262.69717, 796.40686, 441.18622, 638.2868, 423.00952, 196.54362, 107.590164, 285.25507]
2025-08-07 02:51:25,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [236.0, 131.0, 142.0, 331.0, 199.0, 255.0, 191.0, 119.0, 75.0, 136.0]
2025-08-07 02:51:25,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 74/100 (estimated time remaining: 45 minutes, 52 seconds)
2025-08-07 02:53:05,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:53:07,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 258.80927 ± 159.761
2025-08-07 02:53:07,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [189.20699, 143.65573, 155.93034, 616.8136, 216.49237, 153.75603, 218.37787, 519.82666, 239.36023, 134.67265]
2025-08-07 02:53:07,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [107.0, 98.0, 99.0, 244.0, 115.0, 99.0, 121.0, 229.0, 125.0, 93.0]
2025-08-07 02:53:07,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 75/100 (estimated time remaining: 44 minutes, 13 seconds)
2025-08-07 02:54:47,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:54:50,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 425.90820 ± 316.502
2025-08-07 02:54:50,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1116.1289, 289.0959, 114.17152, 887.93256, 587.11127, 180.67432, 295.19507, 313.68494, 305.3651, 169.72263]
2025-08-07 02:54:50,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [411.0, 177.0, 80.0, 343.0, 252.0, 108.0, 183.0, 176.0, 163.0, 101.0]
2025-08-07 02:54:50,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1226 [INFO]: New best (425.91) for latency ExtremeSparseL4U32
2025-08-07 02:54:50,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 76/100 (estimated time remaining: 42 minutes, 32 seconds)
2025-08-07 02:56:31,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:56:33,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 213.67520 ± 93.522
2025-08-07 02:56:33,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [199.66791, 321.63336, 115.09826, 106.595695, 385.73105, 133.72899, 169.36356, 196.83607, 173.6487, 334.4483]
2025-08-07 02:56:33,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [120.0, 152.0, 78.0, 82.0, 198.0, 92.0, 105.0, 114.0, 99.0, 196.0]
2025-08-07 02:56:33,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 77/100 (estimated time remaining: 40 minutes, 59 seconds)
2025-08-07 02:58:12,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:58:15,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 281.31943 ± 246.860
2025-08-07 02:58:15,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [121.78524, 512.70355, 452.645, 26.602869, 139.10498, 93.58435, 204.38156, 23.248642, 420.37064, 818.7676]
2025-08-07 02:58:15,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [85.0, 207.0, 201.0, 32.0, 90.0, 63.0, 114.0, 31.0, 176.0, 276.0]
2025-08-07 02:58:15,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 78/100 (estimated time remaining: 39 minutes, 18 seconds)
2025-08-07 02:59:54,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:59:56,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 260.11719 ± 176.471
2025-08-07 02:59:56,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [164.02412, 27.485237, 30.481415, 480.9022, 218.30579, 190.54247, 300.5052, 216.525, 355.35245, 617.04803]
2025-08-07 02:59:56,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [99.0, 31.0, 32.0, 212.0, 114.0, 105.0, 165.0, 119.0, 171.0, 236.0]
2025-08-07 02:59:56,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 79/100 (estimated time remaining: 37 minutes, 30 seconds)
2025-08-07 03:01:36,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:01:38,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 278.53693 ± 206.669
2025-08-07 03:01:38,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [21.463856, 110.83732, 528.7323, 21.666641, 300.1217, 712.2688, 349.19287, 201.60799, 221.39383, 318.0839]
2025-08-07 03:01:38,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [27.0, 83.0, 227.0, 25.0, 152.0, 285.0, 180.0, 109.0, 121.0, 151.0]
2025-08-07 03:01:38,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 80/100 (estimated time remaining: 35 minutes, 44 seconds)
2025-08-07 03:03:17,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:03:19,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 242.01067 ± 233.513
2025-08-07 03:03:19,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [183.89627, 133.21071, 602.4183, 743.44183, 23.2374, 162.11385, 278.6349, 27.765848, 24.628866, 240.75877]
2025-08-07 03:03:19,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [113.0, 92.0, 245.0, 287.0, 26.0, 94.0, 148.0, 31.0, 29.0, 134.0]
2025-08-07 03:03:19,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 81/100 (estimated time remaining: 33 minutes, 57 seconds)
2025-08-07 03:04:59,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:05:02,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 380.30386 ± 218.886
2025-08-07 03:05:02,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [149.82054, 337.63803, 320.78812, 711.13916, 220.79088, 507.5344, 151.74936, 211.90994, 814.9699, 376.69824]
2025-08-07 03:05:02,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [95.0, 178.0, 168.0, 276.0, 130.0, 224.0, 95.0, 121.0, 379.0, 175.0]
2025-08-07 03:05:02,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 82/100 (estimated time remaining: 32 minutes, 14 seconds)
2025-08-07 03:06:43,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:06:45,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 276.27335 ± 122.165
2025-08-07 03:06:45,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [138.82587, 140.32237, 173.09303, 244.8239, 246.99736, 276.4619, 533.2137, 348.15335, 436.07956, 224.76227]
2025-08-07 03:06:45,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [90.0, 95.0, 109.0, 141.0, 151.0, 154.0, 240.0, 176.0, 187.0, 122.0]
2025-08-07 03:06:45,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 83/100 (estimated time remaining: 30 minutes, 39 seconds)
2025-08-07 03:08:25,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:08:27,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 319.00723 ± 140.397
2025-08-07 03:08:27,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [420.95718, 153.97318, 119.46735, 263.5459, 411.421, 625.40326, 325.34464, 359.39304, 208.49205, 302.07504]
2025-08-07 03:08:27,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [191.0, 106.0, 80.0, 144.0, 200.0, 301.0, 170.0, 178.0, 122.0, 161.0]
2025-08-07 03:08:27,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 84/100 (estimated time remaining: 28 minutes, 57 seconds)
2025-08-07 03:10:07,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:10:10,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 349.85919 ± 222.570
2025-08-07 03:10:10,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [187.24434, 517.4781, 317.96, 171.99721, 273.83655, 940.8745, 234.41385, 205.73972, 418.65274, 230.39487]
2025-08-07 03:10:10,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [105.0, 252.0, 185.0, 95.0, 144.0, 354.0, 135.0, 116.0, 214.0, 117.0]
2025-08-07 03:10:10,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 85/100 (estimated time remaining: 27 minutes, 17 seconds)
2025-08-07 03:11:50,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:11:52,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 290.05237 ± 164.415
2025-08-07 03:11:52,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [284.69598, 422.9255, 199.1866, 140.51338, 440.50256, 145.66037, 149.72005, 669.1485, 167.3973, 280.77338]
2025-08-07 03:11:52,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [141.0, 204.0, 116.0, 96.0, 203.0, 86.0, 93.0, 259.0, 96.0, 146.0]
2025-08-07 03:11:52,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 86/100 (estimated time remaining: 25 minutes, 39 seconds)
2025-08-07 03:13:33,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:13:36,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 352.79962 ± 234.151
2025-08-07 03:13:36,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [274.96225, 331.36542, 316.38208, 799.81085, 217.17265, 134.42099, 201.11072, 816.6714, 215.11127, 220.98878]
2025-08-07 03:13:36,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [136.0, 165.0, 169.0, 353.0, 127.0, 103.0, 105.0, 314.0, 122.0, 120.0]
2025-08-07 03:13:36,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 87/100 (estimated time remaining: 23 minutes, 59 seconds)
2025-08-07 03:15:15,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:15:17,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 400.83551 ± 326.141
2025-08-07 03:15:17,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [717.49506, 203.12286, 467.87814, 83.55214, 838.1099, 978.1392, 22.937431, 434.48126, 183.50415, 79.13455]
2025-08-07 03:15:17,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [266.0, 115.0, 204.0, 60.0, 278.0, 310.0, 27.0, 186.0, 103.0, 55.0]
2025-08-07 03:15:17,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 88/100 (estimated time remaining: 22 minutes, 10 seconds)
2025-08-07 03:16:57,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:17:00,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 297.51776 ± 146.261
2025-08-07 03:17:00,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [184.38284, 326.89505, 441.32632, 312.44315, 618.763, 397.3869, 187.15799, 222.79387, 147.37569, 136.65283]
2025-08-07 03:17:00,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [107.0, 174.0, 198.0, 169.0, 277.0, 202.0, 117.0, 120.0, 92.0, 85.0]
2025-08-07 03:17:00,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 89/100 (estimated time remaining: 20 minutes, 29 seconds)
2025-08-07 03:18:41,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:18:43,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 221.64163 ± 186.754
2025-08-07 03:18:43,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [196.9276, 28.209925, 32.821133, 199.68738, 319.1842, 129.47533, 711.2021, 166.17303, 133.30916, 299.4263]
2025-08-07 03:18:43,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [109.0, 31.0, 32.0, 117.0, 172.0, 89.0, 265.0, 101.0, 92.0, 172.0]
2025-08-07 03:18:43,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 90/100 (estimated time remaining: 18 minutes, 49 seconds)
2025-08-07 03:20:22,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:20:25,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 368.64984 ± 134.946
2025-08-07 03:20:25,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [142.44786, 588.7302, 341.3212, 426.71567, 411.4786, 328.56985, 380.81973, 396.63507, 524.27435, 145.50558]
2025-08-07 03:20:25,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [90.0, 240.0, 177.0, 184.0, 186.0, 173.0, 187.0, 195.0, 212.0, 87.0]
2025-08-07 03:20:25,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 91/100 (estimated time remaining: 17 minutes, 4 seconds)
2025-08-07 03:22:06,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:22:09,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 299.22263 ± 198.749
2025-08-07 03:22:09,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [273.1452, 447.98956, 466.38116, 517.9734, 27.39122, 224.68515, 238.11646, 633.2842, 140.51083, 22.749285]
2025-08-07 03:22:09,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [138.0, 203.0, 209.0, 212.0, 32.0, 122.0, 130.0, 235.0, 95.0, 26.0]
2025-08-07 03:22:09,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 92/100 (estimated time remaining: 15 minutes, 22 seconds)
2025-08-07 03:23:47,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:23:48,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 217.25366 ± 107.917
2025-08-07 03:23:48,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [124.70614, 196.58719, 86.01278, 129.43306, 217.14609, 121.30842, 361.8701, 269.87576, 225.69638, 439.90067]
2025-08-07 03:23:48,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [83.0, 112.0, 64.0, 95.0, 121.0, 84.0, 161.0, 134.0, 119.0, 193.0]
2025-08-07 03:23:48,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 93/100 (estimated time remaining: 13 minutes, 37 seconds)
2025-08-07 03:25:31,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:25:33,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 303.88266 ± 197.857
2025-08-07 03:25:33,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [176.44684, 21.905113, 677.6283, 128.40727, 505.78708, 193.09627, 475.66675, 211.27937, 453.31274, 195.29678]
2025-08-07 03:25:33,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [100.0, 24.0, 256.0, 83.0, 222.0, 116.0, 205.0, 111.0, 202.0, 109.0]
2025-08-07 03:25:33,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 94/100 (estimated time remaining: 11 minutes, 58 seconds)
2025-08-07 03:27:10,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:27:12,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 291.40909 ± 125.468
2025-08-07 03:27:12,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [312.00198, 394.87592, 139.9841, 224.0283, 220.24539, 257.80722, 80.00157, 348.6252, 435.78696, 500.73422]
2025-08-07 03:27:12,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [149.0, 171.0, 89.0, 109.0, 131.0, 138.0, 57.0, 168.0, 185.0, 218.0]
2025-08-07 03:27:12,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 11 seconds)
2025-08-07 03:28:53,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:28:55,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 227.90005 ± 135.155
2025-08-07 03:28:55,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [27.870281, 405.52518, 152.00528, 192.77554, 229.08742, 226.01631, 359.09927, 19.494812, 230.76906, 436.3574]
2025-08-07 03:28:55,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [30.0, 176.0, 96.0, 108.0, 117.0, 124.0, 187.0, 25.0, 115.0, 186.0]
2025-08-07 03:28:55,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 30 seconds)
2025-08-07 03:30:36,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:30:38,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 251.39624 ± 142.738
2025-08-07 03:30:38,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [25.05153, 130.53815, 481.44016, 281.59573, 235.51852, 295.25244, 185.31006, 145.03607, 226.68716, 507.53247]
2025-08-07 03:30:38,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [28.0, 88.0, 208.0, 144.0, 123.0, 148.0, 109.0, 94.0, 124.0, 220.0]
2025-08-07 03:30:38,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 47 seconds)
2025-08-07 03:32:17,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:32:20,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 427.45987 ± 264.108
2025-08-07 03:32:20,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [663.5461, 416.61627, 131.4663, 435.26495, 345.47906, 154.25302, 171.36255, 773.16644, 933.4652, 249.97832]
2025-08-07 03:32:20,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [253.0, 197.0, 87.0, 190.0, 158.0, 101.0, 104.0, 308.0, 338.0, 131.0]
2025-08-07 03:32:20,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1226 [INFO]: New best (427.46) for latency ExtremeSparseL4U32
2025-08-07 03:32:20,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 6 seconds)
2025-08-07 03:34:02,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:34:04,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 296.79532 ± 182.783
2025-08-07 03:34:04,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [144.62279, 122.21013, 229.09537, 390.3652, 158.02505, 748.90515, 412.1988, 242.58318, 149.31355, 370.63394]
2025-08-07 03:34:04,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [98.0, 79.0, 119.0, 174.0, 99.0, 267.0, 183.0, 137.0, 96.0, 166.0]
2025-08-07 03:34:04,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 24 seconds)
2025-08-07 03:35:44,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:35:46,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 226.01579 ± 190.647
2025-08-07 03:35:46,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [128.5477, 23.574724, 571.7406, 460.0115, 126.54156, 184.88867, 479.9942, 79.29385, 29.61016, 175.95517]
2025-08-07 03:35:46,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [85.0, 25.0, 230.0, 196.0, 90.0, 105.0, 200.0, 56.0, 33.0, 105.0]
2025-08-07 03:35:46,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 42 seconds)
2025-08-07 03:37:27,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:37:29,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 260.26886 ± 189.892
2025-08-07 03:37:29,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [76.0117, 667.3544, 160.33408, 409.65744, 206.07845, 114.44596, 357.04892, 430.83157, 26.085749, 154.8404]
2025-08-07 03:37:29,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [54.0, 255.0, 95.0, 169.0, 117.0, 79.0, 177.0, 196.0, 31.0, 94.0]
2025-08-07 03:37:29,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1251 [DEBUG]: Training session finished
