2025-08-07 06:37:35,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc5-hopper/ExtremeClogL1U23-bpql-mem24
2025-08-07 06:37:35,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc5-hopper/ExtremeClogL1U23-bpql-mem24
2025-08-07 06:37:35,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x149bd983bc10>}
2025-08-07 06:37:35,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1111 [DEBUG]: using device: cuda
2025-08-07 06:37:35,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1133 [INFO]: Creating new trainer
2025-08-07 06:37:35,477 baseline-bpql-noiseperc5-hopper:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=83, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2025-08-07 06:37:35,478 baseline-bpql-noiseperc5-hopper:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 06:37:36,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1194 [DEBUG]: Starting training session...
2025-08-07 06:37:36,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 1/100
2025-08-07 06:39:08,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:39:09,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 61.77993 ± 0.919
2025-08-07 06:39:09,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [60.51637, 61.66029, 62.160675, 62.103035, 62.477535, 62.059723, 62.014706, 60.48641, 60.745407, 63.575066]
2025-08-07 06:39:09,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [44.0, 46.0, 49.0, 46.0, 45.0, 46.0, 45.0, 45.0, 43.0, 47.0]
2025-08-07 06:39:09,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1226 [INFO]: New best (61.78) for latency ExtremeClogL1U23
2025-08-07 06:39:09,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 33 minutes, 6 seconds)
2025-08-07 06:40:48,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:40:49,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 140.58853 ± 61.777
2025-08-07 06:40:49,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [189.91716, 143.21916, 99.485245, 276.23322, 54.338478, 132.86043, 134.22604, 110.58926, 73.07206, 191.9444]
2025-08-07 06:40:49,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [120.0, 101.0, 75.0, 160.0, 44.0, 93.0, 89.0, 86.0, 61.0, 122.0]
2025-08-07 06:40:49,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1226 [INFO]: New best (140.59) for latency ExtremeClogL1U23
2025-08-07 06:40:49,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 37 minutes, 28 seconds)
2025-08-07 06:42:28,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:42:29,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 98.32903 ± 65.533
2025-08-07 06:42:29,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [65.44343, 164.67885, 66.78071, 54.253494, 34.183834, 58.832787, 95.297935, 239.34781, 169.03358, 35.43789]
2025-08-07 06:42:29,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [52.0, 101.0, 51.0, 45.0, 32.0, 50.0, 69.0, 130.0, 105.0, 33.0]
2025-08-07 06:42:29,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 37 minutes, 44 seconds)
2025-08-07 06:44:07,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:44:08,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 92.80360 ± 47.149
2025-08-07 06:44:08,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [35.19815, 148.58032, 25.378265, 63.863358, 139.02051, 151.79137, 56.003944, 122.832275, 57.008713, 128.35904]
2025-08-07 06:44:08,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [35.0, 97.0, 28.0, 53.0, 87.0, 99.0, 46.0, 88.0, 47.0, 81.0]
2025-08-07 06:44:08,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 36 minutes, 49 seconds)
2025-08-07 06:45:48,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:45:49,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 155.79579 ± 64.928
2025-08-07 06:45:49,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [278.374, 118.94656, 168.55496, 47.193375, 177.60378, 120.74183, 202.19394, 87.47164, 224.62097, 132.25688]
2025-08-07 06:45:49,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [173.0, 78.0, 113.0, 42.0, 120.0, 80.0, 120.0, 67.0, 126.0, 84.0]
2025-08-07 06:45:49,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1226 [INFO]: New best (155.80) for latency ExtremeClogL1U23
2025-08-07 06:45:49,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 36 minutes, 5 seconds)
2025-08-07 06:47:27,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:47:29,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 239.63416 ± 118.607
2025-08-07 06:47:29,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [357.04254, 371.58347, 376.01926, 171.31137, 88.41541, 392.42105, 116.728745, 257.91705, 160.60199, 104.300476]
2025-08-07 06:47:29,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [179.0, 187.0, 213.0, 117.0, 72.0, 216.0, 88.0, 154.0, 115.0, 81.0]
2025-08-07 06:47:29,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1226 [INFO]: New best (239.63) for latency ExtremeClogL1U23
2025-08-07 06:47:29,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 36 minutes, 36 seconds)
2025-08-07 06:49:07,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:49:09,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 148.84404 ± 66.086
2025-08-07 06:49:09,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [166.4157, 147.57141, 121.66863, 146.75287, 125.33344, 95.13087, 142.41467, 149.39056, 328.2683, 65.49388]
2025-08-07 06:49:09,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [106.0, 88.0, 86.0, 93.0, 88.0, 71.0, 101.0, 107.0, 173.0, 53.0]
2025-08-07 06:49:09,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 34 minutes, 53 seconds)
2025-08-07 06:50:49,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:50:50,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 143.31606 ± 81.603
2025-08-07 06:50:50,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [78.86992, 144.43408, 88.45424, 73.54272, 253.10452, 223.68118, 93.37031, 79.21836, 91.57041, 306.91476]
2025-08-07 06:50:50,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [55.0, 109.0, 64.0, 54.0, 221.0, 204.0, 69.0, 60.0, 71.0, 152.0]
2025-08-07 06:50:50,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 33 minutes, 44 seconds)
2025-08-07 06:52:29,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:52:31,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 190.93765 ± 38.803
2025-08-07 06:52:31,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [145.5702, 199.18156, 139.33891, 269.9977, 215.0095, 192.05003, 228.6929, 147.84152, 178.68195, 193.01233]
2025-08-07 06:52:31,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [108.0, 150.0, 100.0, 192.0, 149.0, 145.0, 131.0, 120.0, 119.0, 169.0]
2025-08-07 06:52:31,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 32 minutes, 34 seconds)
2025-08-07 06:54:11,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:54:13,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 160.03690 ± 73.428
2025-08-07 06:54:13,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [150.13658, 106.98629, 219.99684, 347.61215, 135.81776, 93.386955, 148.32603, 189.12042, 109.088295, 99.89775]
2025-08-07 06:54:13,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [113.0, 76.0, 153.0, 170.0, 103.0, 70.0, 92.0, 152.0, 70.0, 72.0]
2025-08-07 06:54:13,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 31 minutes, 8 seconds)
2025-08-07 06:55:52,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:55:54,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 224.27153 ± 92.824
2025-08-07 06:55:54,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [126.24518, 322.60504, 192.49872, 225.6449, 385.1863, 190.67809, 236.71356, 123.60584, 100.34815, 339.18948]
2025-08-07 06:55:54,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [80.0, 160.0, 115.0, 123.0, 197.0, 103.0, 125.0, 84.0, 75.0, 158.0]
2025-08-07 06:55:54,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 29 minutes, 54 seconds)
2025-08-07 06:57:34,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:57:35,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 168.03287 ± 94.109
2025-08-07 06:57:35,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [94.49108, 211.50493, 131.29143, 23.454718, 118.638824, 209.24615, 334.6586, 104.54843, 316.53595, 135.95862]
2025-08-07 06:57:35,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [69.0, 132.0, 91.0, 24.0, 83.0, 126.0, 155.0, 68.0, 155.0, 95.0]
2025-08-07 06:57:35,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 28 minutes, 31 seconds)
2025-08-07 06:59:15,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:59:16,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 258.81384 ± 83.641
2025-08-07 06:59:16,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [215.71745, 275.049, 135.66167, 332.554, 345.8897, 182.95773, 122.605934, 298.16727, 358.3143, 321.22153]
2025-08-07 06:59:16,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [120.0, 136.0, 82.0, 166.0, 155.0, 104.0, 77.0, 139.0, 172.0, 152.0]
2025-08-07 06:59:16,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1226 [INFO]: New best (258.81) for latency ExtremeClogL1U23
2025-08-07 06:59:16,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 26 minutes, 48 seconds)
2025-08-07 07:00:56,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:00:58,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 313.38120 ± 131.242
2025-08-07 07:00:58,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [319.84964, 172.0496, 267.66687, 319.96564, 604.5318, 460.29834, 372.45303, 152.69344, 207.29413, 257.0096]
2025-08-07 07:00:58,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [159.0, 102.0, 141.0, 152.0, 236.0, 206.0, 168.0, 88.0, 119.0, 129.0]
2025-08-07 07:00:58,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1226 [INFO]: New best (313.38) for latency ExtremeClogL1U23
2025-08-07 07:00:58,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 25 minutes, 20 seconds)
2025-08-07 07:02:39,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:02:40,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 277.10159 ± 125.438
2025-08-07 07:02:40,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [356.92297, 318.8432, 219.19846, 391.26297, 149.0386, 89.77003, 348.6913, 186.35823, 522.3694, 188.56085]
2025-08-07 07:02:40,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [170.0, 143.0, 115.0, 165.0, 90.0, 57.0, 154.0, 102.0, 194.0, 104.0]
2025-08-07 07:02:40,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 23 minutes, 50 seconds)
2025-08-07 07:04:20,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:04:22,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 312.74371 ± 151.061
2025-08-07 07:04:22,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [247.99432, 586.84686, 488.3817, 145.07858, 190.68517, 167.87407, 243.31026, 498.4775, 200.40105, 358.38776]
2025-08-07 07:04:22,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [138.0, 220.0, 226.0, 93.0, 113.0, 106.0, 123.0, 229.0, 109.0, 179.0]
2025-08-07 07:04:22,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 22 minutes, 8 seconds)
2025-08-07 07:06:03,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:06:05,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 297.59186 ± 172.536
2025-08-07 07:06:05,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [374.42816, 125.973495, 360.7529, 655.5421, 122.630196, 125.96711, 459.97028, 331.08472, 93.08637, 326.48322]
2025-08-07 07:06:05,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [168.0, 82.0, 161.0, 219.0, 78.0, 79.0, 185.0, 148.0, 65.0, 151.0]
2025-08-07 07:06:05,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 21 minutes, 4 seconds)
2025-08-07 07:07:43,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:07:45,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 323.86481 ± 160.202
2025-08-07 07:07:45,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [467.4242, 169.97296, 134.7257, 540.5352, 592.2624, 255.57353, 293.16022, 119.860695, 404.05774, 261.0752]
2025-08-07 07:07:45,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [218.0, 100.0, 82.0, 208.0, 208.0, 128.0, 141.0, 75.0, 181.0, 142.0]
2025-08-07 07:07:45,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1226 [INFO]: New best (323.86) for latency ExtremeClogL1U23
2025-08-07 07:07:45,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 19 minutes, 4 seconds)
2025-08-07 07:09:25,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:09:27,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 211.26477 ± 113.187
2025-08-07 07:09:27,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [169.6787, 416.68906, 132.6298, 156.60481, 106.73303, 422.03473, 272.2476, 163.2435, 101.568115, 171.21841]
2025-08-07 07:09:27,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [97.0, 173.0, 85.0, 99.0, 71.0, 187.0, 143.0, 98.0, 65.0, 100.0]
2025-08-07 07:09:27,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 17 minutes, 22 seconds)
2025-08-07 07:11:08,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:11:10,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 295.47287 ± 143.503
2025-08-07 07:11:10,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [119.84844, 425.7007, 231.0668, 151.64583, 318.6541, 250.25282, 446.33246, 451.98453, 75.96082, 483.2822]
2025-08-07 07:11:10,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [79.0, 181.0, 135.0, 98.0, 156.0, 134.0, 176.0, 186.0, 57.0, 231.0]
2025-08-07 07:11:10,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 15 minutes, 50 seconds)
2025-08-07 07:12:49,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:12:51,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 353.50348 ± 181.996
2025-08-07 07:12:51,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [171.55011, 371.62894, 510.35083, 587.8418, 396.5022, 173.74475, 539.9385, 548.15045, 128.29037, 107.03692]
2025-08-07 07:12:51,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [107.0, 170.0, 215.0, 228.0, 180.0, 102.0, 207.0, 210.0, 79.0, 65.0]
2025-08-07 07:12:51,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1226 [INFO]: New best (353.50) for latency ExtremeClogL1U23
2025-08-07 07:12:51,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 14 minutes, 14 seconds)
2025-08-07 07:14:31,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:14:32,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 228.91373 ± 82.043
2025-08-07 07:14:32,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [303.02133, 220.55136, 128.14742, 93.84743, 278.3768, 316.486, 181.50787, 143.85565, 306.35425, 316.98907]
2025-08-07 07:14:32,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [158.0, 127.0, 78.0, 67.0, 156.0, 156.0, 102.0, 94.0, 149.0, 162.0]
2025-08-07 07:14:32,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 11 minutes, 58 seconds)
2025-08-07 07:16:12,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:16:14,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 335.93155 ± 142.241
2025-08-07 07:16:14,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [361.33078, 315.95496, 702.15405, 451.1374, 224.068, 281.37042, 230.39136, 175.92238, 317.16913, 299.81683]
2025-08-07 07:16:14,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [176.0, 168.0, 306.0, 223.0, 123.0, 140.0, 130.0, 117.0, 156.0, 154.0]
2025-08-07 07:16:14,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 10 minutes, 43 seconds)
2025-08-07 07:17:53,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:17:55,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 225.72507 ± 102.087
2025-08-07 07:17:55,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [427.8486, 165.46265, 196.75136, 174.4238, 215.5418, 202.11818, 392.68686, 260.98737, 107.07638, 114.35386]
2025-08-07 07:17:55,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [181.0, 95.0, 112.0, 102.0, 120.0, 114.0, 206.0, 149.0, 78.0, 77.0]
2025-08-07 07:17:55,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 8 minutes, 40 seconds)
2025-08-07 07:19:35,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:19:36,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 237.79126 ± 105.517
2025-08-07 07:19:36,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [261.8752, 141.38907, 256.32462, 162.58241, 160.3369, 220.10666, 521.2035, 245.98886, 261.00302, 147.10231]
2025-08-07 07:19:36,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [140.0, 89.0, 123.0, 100.0, 97.0, 122.0, 238.0, 125.0, 131.0, 88.0]
2025-08-07 07:19:36,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 6 minutes, 38 seconds)
2025-08-07 07:21:15,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:21:17,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 343.27954 ± 173.562
2025-08-07 07:21:17,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [254.63849, 407.33417, 235.74031, 513.929, 623.99097, 176.66832, 374.67477, 574.7655, 159.39804, 111.65567]
2025-08-07 07:21:17,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [131.0, 185.0, 118.0, 270.0, 283.0, 102.0, 176.0, 211.0, 97.0, 76.0]
2025-08-07 07:21:17,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 4 minutes, 45 seconds)
2025-08-07 07:22:55,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:22:57,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 394.30438 ± 270.098
2025-08-07 07:22:57,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [386.63965, 1076.0287, 230.52544, 348.02597, 106.54646, 415.95337, 119.72053, 326.22842, 642.72003, 290.65533]
2025-08-07 07:22:57,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [175.0, 446.0, 124.0, 166.0, 68.0, 180.0, 80.0, 158.0, 253.0, 172.0]
2025-08-07 07:22:57,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1226 [INFO]: New best (394.30) for latency ExtremeClogL1U23
2025-08-07 07:22:57,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 2 minutes, 50 seconds)
2025-08-07 07:24:35,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:24:36,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 115.42794 ± 70.535
2025-08-07 07:24:36,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [79.48816, 239.1099, 75.0132, 84.12352, 85.08683, 84.48533, 76.73566, 77.579544, 272.0481, 80.60928]
2025-08-07 07:24:36,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [55.0, 122.0, 55.0, 62.0, 63.0, 57.0, 54.0, 53.0, 142.0, 59.0]
2025-08-07 07:24:36,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 28 seconds)
2025-08-07 07:26:16,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:26:18,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 292.41327 ± 141.591
2025-08-07 07:26:18,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [128.48932, 325.56253, 318.01303, 147.84149, 601.6054, 352.8634, 308.87146, 409.54816, 112.84824, 218.48961]
2025-08-07 07:26:18,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [82.0, 167.0, 171.0, 91.0, 266.0, 189.0, 157.0, 189.0, 72.0, 130.0]
2025-08-07 07:26:18,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 30/100 (estimated time remaining: 1 hour, 58 minutes, 58 seconds)
2025-08-07 07:27:55,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:27:57,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 341.11893 ± 128.703
2025-08-07 07:27:57,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [280.97192, 217.01227, 414.92365, 486.65598, 250.66985, 593.58185, 268.65457, 382.01, 141.18898, 375.5202]
2025-08-07 07:27:57,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [147.0, 128.0, 184.0, 193.0, 129.0, 270.0, 140.0, 176.0, 93.0, 173.0]
2025-08-07 07:27:58,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 56 minutes, 55 seconds)
2025-08-07 07:29:35,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:29:37,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 339.14194 ± 253.397
2025-08-07 07:29:37,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [364.38205, 81.730644, 160.61664, 120.585, 217.74155, 219.68402, 206.18933, 553.0155, 947.8956, 519.579]
2025-08-07 07:29:37,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [177.0, 56.0, 90.0, 79.0, 114.0, 116.0, 112.0, 252.0, 382.0, 275.0]
2025-08-07 07:29:37,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 54 minutes, 59 seconds)
2025-08-07 07:31:15,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:31:17,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 342.54584 ± 171.391
2025-08-07 07:31:17,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [435.6375, 404.71988, 169.41829, 192.63182, 96.22564, 251.81636, 363.66943, 382.70456, 736.711, 391.9238]
2025-08-07 07:31:17,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [215.0, 174.0, 99.0, 107.0, 70.0, 130.0, 177.0, 171.0, 259.0, 183.0]
2025-08-07 07:31:17,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 53 minutes, 22 seconds)
2025-08-07 07:32:55,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:32:57,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 324.31992 ± 179.785
2025-08-07 07:32:57,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [138.41452, 164.89029, 623.674, 182.99597, 316.0225, 423.856, 142.06573, 432.60565, 204.04585, 614.6287]
2025-08-07 07:32:57,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [84.0, 98.0, 280.0, 99.0, 169.0, 174.0, 87.0, 201.0, 105.0, 299.0]
2025-08-07 07:32:57,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 51 minutes, 49 seconds)
2025-08-07 07:34:36,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:34:38,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 465.76849 ± 202.256
2025-08-07 07:34:38,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [667.5142, 645.03595, 424.20004, 137.54633, 219.16885, 672.0504, 647.0446, 632.3285, 383.90952, 228.88678]
2025-08-07 07:34:38,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [230.0, 300.0, 187.0, 80.0, 119.0, 312.0, 216.0, 301.0, 164.0, 128.0]
2025-08-07 07:34:38,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1226 [INFO]: New best (465.77) for latency ExtremeClogL1U23
2025-08-07 07:34:38,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 50 minutes, 9 seconds)
2025-08-07 07:36:15,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:36:18,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 385.46609 ± 254.202
2025-08-07 07:36:18,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [72.90537, 454.14966, 405.86646, 557.7443, 227.95749, 252.37984, 361.33652, 1036.3116, 186.49922, 299.5103]
2025-08-07 07:36:18,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [52.0, 208.0, 200.0, 241.0, 116.0, 119.0, 181.0, 449.0, 113.0, 161.0]
2025-08-07 07:36:18,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 48 minutes, 23 seconds)
2025-08-07 07:37:55,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:37:57,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 341.80594 ± 196.409
2025-08-07 07:37:57,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [143.10384, 609.21533, 107.549164, 390.47537, 178.17035, 443.99103, 585.84955, 590.32086, 115.32684, 254.05695]
2025-08-07 07:37:57,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [84.0, 250.0, 69.0, 160.0, 101.0, 209.0, 265.0, 247.0, 75.0, 124.0]
2025-08-07 07:37:57,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 46 minutes, 36 seconds)
2025-08-07 07:39:35,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:39:37,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 400.44125 ± 141.693
2025-08-07 07:39:37,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [487.46954, 140.99944, 396.25473, 405.23407, 620.80786, 230.5681, 358.06302, 315.32495, 589.7809, 459.9101]
2025-08-07 07:39:37,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [210.0, 87.0, 179.0, 169.0, 270.0, 123.0, 158.0, 170.0, 215.0, 206.0]
2025-08-07 07:39:37,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 44 minutes, 57 seconds)
2025-08-07 07:41:16,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:41:18,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 377.74188 ± 233.517
2025-08-07 07:41:18,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [344.99188, 416.87805, 606.30457, 190.65582, 233.57953, 863.2199, 81.3946, 436.32617, 82.64013, 521.4283]
2025-08-07 07:41:18,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [157.0, 182.0, 251.0, 97.0, 125.0, 307.0, 61.0, 173.0, 60.0, 212.0]
2025-08-07 07:41:18,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 43 minutes, 28 seconds)
2025-08-07 07:42:56,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:42:58,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 331.25879 ± 206.950
2025-08-07 07:42:58,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [115.71647, 373.8643, 377.54388, 328.84763, 341.43958, 900.0376, 196.97968, 237.26178, 258.03522, 182.86174]
2025-08-07 07:42:58,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [70.0, 173.0, 177.0, 165.0, 171.0, 314.0, 105.0, 128.0, 138.0, 107.0]
2025-08-07 07:42:58,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 41 minutes, 30 seconds)
2025-08-07 07:44:35,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:44:37,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 408.12714 ± 205.705
2025-08-07 07:44:37,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [559.4173, 206.97672, 357.57047, 693.7005, 597.35583, 149.59619, 686.99896, 445.1471, 173.55363, 210.95425]
2025-08-07 07:44:37,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [235.0, 115.0, 174.0, 297.0, 249.0, 91.0, 268.0, 201.0, 105.0, 115.0]
2025-08-07 07:44:38,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 39 minutes, 57 seconds)
2025-08-07 07:46:17,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:46:19,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 499.62027 ± 237.936
2025-08-07 07:46:19,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [849.3498, 456.37048, 371.2073, 219.0698, 286.48648, 614.062, 422.42526, 807.7597, 179.05656, 790.4155]
2025-08-07 07:46:19,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [316.0, 192.0, 174.0, 116.0, 136.0, 255.0, 183.0, 315.0, 101.0, 284.0]
2025-08-07 07:46:19,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1226 [INFO]: New best (499.62) for latency ExtremeClogL1U23
2025-08-07 07:46:19,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 38 minutes, 51 seconds)
2025-08-07 07:47:56,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:47:58,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 344.48483 ± 206.837
2025-08-07 07:47:58,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [414.37302, 103.81824, 545.77765, 620.88184, 131.83934, 163.01378, 513.7616, 237.86537, 611.3403, 102.1773]
2025-08-07 07:47:58,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [185.0, 75.0, 231.0, 247.0, 80.0, 95.0, 232.0, 121.0, 262.0, 75.0]
2025-08-07 07:47:58,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 36 minutes, 53 seconds)
2025-08-07 07:49:39,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:49:41,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 409.23370 ± 260.112
2025-08-07 07:49:41,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [294.94455, 113.547935, 127.82216, 671.3475, 704.7745, 436.1285, 159.52849, 313.09326, 347.41605, 923.7341]
2025-08-07 07:49:41,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [141.0, 79.0, 78.0, 280.0, 278.0, 178.0, 95.0, 144.0, 159.0, 373.0]
2025-08-07 07:49:41,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 35 minutes, 38 seconds)
2025-08-07 07:51:15,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:51:18,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 540.29364 ± 286.452
2025-08-07 07:51:18,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [662.1787, 423.47055, 204.42519, 271.00497, 1228.906, 415.89362, 299.03656, 674.7906, 487.9422, 735.288]
2025-08-07 07:51:18,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [280.0, 183.0, 105.0, 138.0, 471.0, 185.0, 144.0, 267.0, 205.0, 266.0]
2025-08-07 07:51:18,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1226 [INFO]: New best (540.29) for latency ExtremeClogL1U23
2025-08-07 07:51:18,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 33 minutes, 26 seconds)
2025-08-07 07:52:56,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:52:59,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 399.29889 ± 243.310
2025-08-07 07:52:59,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [194.70255, 416.88214, 117.84745, 109.84709, 853.9244, 639.38855, 586.9673, 451.91174, 125.23252, 496.2851]
2025-08-07 07:52:59,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [111.0, 188.0, 78.0, 76.0, 309.0, 251.0, 255.0, 201.0, 80.0, 207.0]
2025-08-07 07:52:59,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 31 minutes, 51 seconds)
2025-08-07 07:54:37,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:54:39,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 293.45892 ± 180.498
2025-08-07 07:54:39,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [133.84218, 366.6429, 144.38948, 112.10288, 686.61346, 375.72256, 472.34558, 102.57965, 203.11464, 337.23572]
2025-08-07 07:54:39,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [84.0, 158.0, 90.0, 75.0, 259.0, 170.0, 205.0, 71.0, 105.0, 147.0]
2025-08-07 07:54:39,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 29 minutes, 57 seconds)
2025-08-07 07:56:16,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:56:18,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 322.46164 ± 216.805
2025-08-07 07:56:18,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [154.43257, 161.62766, 184.16367, 168.17787, 713.6862, 448.96167, 702.5495, 134.94304, 378.97134, 177.10275]
2025-08-07 07:56:18,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [92.0, 96.0, 104.0, 101.0, 271.0, 175.0, 260.0, 86.0, 176.0, 100.0]
2025-08-07 07:56:18,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 28 minutes, 11 seconds)
2025-08-07 07:57:55,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:57:57,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 385.26303 ± 262.403
2025-08-07 07:57:57,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [158.74513, 103.55527, 218.05612, 173.27751, 350.1145, 802.55365, 438.39523, 848.1567, 164.86816, 594.908]
2025-08-07 07:57:57,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [97.0, 74.0, 115.0, 101.0, 160.0, 296.0, 201.0, 326.0, 92.0, 247.0]
2025-08-07 07:57:57,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 25 minutes, 57 seconds)
2025-08-07 07:59:35,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:59:38,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 451.38843 ± 233.834
2025-08-07 07:59:38,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [359.51257, 912.1281, 227.9593, 164.1227, 334.47992, 550.114, 800.9153, 446.19348, 483.7686, 234.69055]
2025-08-07 07:59:38,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [171.0, 326.0, 121.0, 92.0, 151.0, 227.0, 319.0, 200.0, 222.0, 124.0]
2025-08-07 07:59:38,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 24 minutes, 58 seconds)
2025-08-07 08:01:15,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:01:17,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 381.19135 ± 227.993
2025-08-07 08:01:17,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [409.0325, 620.89386, 91.47161, 702.8254, 99.16306, 225.31718, 696.17725, 317.7728, 150.3935, 498.8662]
2025-08-07 08:01:17,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [184.0, 245.0, 63.0, 254.0, 74.0, 114.0, 263.0, 145.0, 98.0, 218.0]
2025-08-07 08:01:17,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 23 minutes, 8 seconds)
2025-08-07 08:02:55,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:02:57,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 410.20795 ± 248.316
2025-08-07 08:02:57,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [786.38293, 421.75012, 394.28503, 377.59384, 395.1695, 100.19198, 156.336, 146.4104, 420.6342, 903.32526]
2025-08-07 08:02:57,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [282.0, 185.0, 177.0, 171.0, 169.0, 66.0, 94.0, 86.0, 187.0, 359.0]
2025-08-07 08:02:57,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 21 minutes, 21 seconds)
2025-08-07 08:04:36,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:04:38,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 434.28296 ± 246.957
2025-08-07 08:04:38,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [779.52014, 126.2416, 156.46326, 515.4477, 862.7178, 434.84778, 91.63228, 399.35513, 443.5445, 533.0594]
2025-08-07 08:04:38,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [311.0, 84.0, 95.0, 224.0, 318.0, 187.0, 67.0, 183.0, 199.0, 220.0]
2025-08-07 08:04:38,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 20 minutes, 8 seconds)
2025-08-07 08:06:16,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:06:18,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 405.04779 ± 221.474
2025-08-07 08:06:18,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [426.10556, 765.2645, 281.57574, 441.30258, 170.30678, 434.44382, 754.106, 82.79525, 180.82248, 513.75525]
2025-08-07 08:06:18,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [180.0, 283.0, 134.0, 188.0, 96.0, 188.0, 310.0, 57.0, 101.0, 224.0]
2025-08-07 08:06:18,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 18 minutes, 32 seconds)
2025-08-07 08:07:57,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:07:59,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 373.80157 ± 287.321
2025-08-07 08:07:59,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [252.51877, 129.12979, 241.45724, 540.0954, 256.45374, 240.70378, 1139.2113, 460.06772, 89.5735, 388.80432]
2025-08-07 08:07:59,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [127.0, 87.0, 133.0, 232.0, 131.0, 127.0, 432.0, 202.0, 62.0, 182.0]
2025-08-07 08:07:59,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 16 minutes, 50 seconds)
2025-08-07 08:09:37,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:09:40,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 570.85022 ± 345.358
2025-08-07 08:09:40,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [653.6091, 107.18024, 859.5922, 1087.6194, 353.48813, 195.70612, 570.4222, 650.8141, 1074.4803, 155.58978]
2025-08-07 08:09:40,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [239.0, 71.0, 339.0, 414.0, 161.0, 106.0, 236.0, 257.0, 408.0, 90.0]
2025-08-07 08:09:40,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1226 [INFO]: New best (570.85) for latency ExtremeClogL1U23
2025-08-07 08:09:40,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 15 minutes, 27 seconds)
2025-08-07 08:11:17,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:11:20,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 582.02869 ± 272.415
2025-08-07 08:11:20,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [342.4693, 437.9848, 606.97156, 730.7771, 903.44165, 662.7834, 570.8887, 200.09543, 1109.7701, 255.10426]
2025-08-07 08:11:20,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [144.0, 192.0, 247.0, 253.0, 359.0, 282.0, 225.0, 106.0, 451.0, 116.0]
2025-08-07 08:11:20,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1226 [INFO]: New best (582.03) for latency ExtremeClogL1U23
2025-08-07 08:11:20,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 13 minutes, 47 seconds)
2025-08-07 08:12:58,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:13:01,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 512.73145 ± 257.158
2025-08-07 08:13:01,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [847.24054, 662.18475, 157.70485, 122.39419, 305.70282, 535.55115, 647.2873, 930.5304, 518.18915, 400.52988]
2025-08-07 08:13:01,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [346.0, 264.0, 98.0, 78.0, 136.0, 220.0, 218.0, 338.0, 228.0, 178.0]
2025-08-07 08:13:01,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 11 minutes, 59 seconds)
2025-08-07 08:14:40,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:14:43,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 580.20959 ± 276.607
2025-08-07 08:14:43,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [611.80585, 727.1454, 637.59454, 771.15375, 197.62038, 1132.7506, 741.1258, 416.08255, 388.4894, 178.32793]
2025-08-07 08:14:43,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [259.0, 244.0, 264.0, 318.0, 101.0, 457.0, 303.0, 182.0, 172.0, 99.0]
2025-08-07 08:14:43,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 10 minutes, 42 seconds)
2025-08-07 08:16:22,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:16:24,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 488.73795 ± 174.916
2025-08-07 08:16:24,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [439.06378, 472.4798, 457.5374, 632.65173, 713.5858, 634.64417, 529.5291, 307.74792, 88.520226, 611.61957]
2025-08-07 08:16:24,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [173.0, 206.0, 201.0, 252.0, 250.0, 250.0, 217.0, 133.0, 66.0, 255.0]
2025-08-07 08:16:24,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 9 minutes, 1 second)
2025-08-07 08:18:00,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:18:03,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 578.33508 ± 293.211
2025-08-07 08:18:03,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [445.61948, 507.87457, 864.7533, 594.6889, 1141.958, 774.4075, 357.63788, 738.758, 128.45522, 229.1981]
2025-08-07 08:18:03,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [194.0, 215.0, 301.0, 238.0, 382.0, 298.0, 167.0, 278.0, 84.0, 121.0]
2025-08-07 08:18:03,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 7 minutes, 1 second)
2025-08-07 08:19:41,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:19:45,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 854.92871 ± 744.347
2025-08-07 08:19:45,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [702.5949, 415.04062, 937.39667, 161.29561, 193.6164, 1464.0858, 2751.7075, 1072.7767, 344.02368, 506.749]
2025-08-07 08:19:45,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [291.0, 180.0, 336.0, 93.0, 110.0, 522.0, 1000.0, 429.0, 153.0, 215.0]
2025-08-07 08:19:45,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1226 [INFO]: New best (854.93) for latency ExtremeClogL1U23
2025-08-07 08:19:45,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 5 minutes, 37 seconds)
2025-08-07 08:21:23,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:21:26,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 465.39972 ± 270.105
2025-08-07 08:21:26,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [584.1477, 452.89584, 195.19887, 697.7522, 221.62434, 1049.4647, 195.78825, 169.69919, 631.34125, 456.0845]
2025-08-07 08:21:26,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [250.0, 185.0, 106.0, 280.0, 118.0, 410.0, 111.0, 101.0, 255.0, 188.0]
2025-08-07 08:21:26,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 4 minutes)
2025-08-07 08:23:05,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:23:08,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 703.12939 ± 344.289
2025-08-07 08:23:08,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [820.4164, 520.39075, 744.3172, 226.40805, 1308.524, 806.7136, 170.26521, 482.20944, 817.4134, 1134.6355]
2025-08-07 08:23:08,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [312.0, 215.0, 271.0, 118.0, 501.0, 339.0, 102.0, 207.0, 337.0, 426.0]
2025-08-07 08:23:08,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 2 minutes, 17 seconds)
2025-08-07 08:24:49,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:24:52,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 530.54840 ± 341.281
2025-08-07 08:24:52,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [147.03246, 1141.7021, 114.44939, 367.06238, 847.7678, 889.8347, 175.48312, 443.61035, 396.88275, 781.6588]
2025-08-07 08:24:52,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [86.0, 401.0, 74.0, 156.0, 289.0, 339.0, 99.0, 193.0, 179.0, 294.0]
2025-08-07 08:24:52,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 53 seconds)
2025-08-07 08:26:27,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:26:30,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 609.08899 ± 526.232
2025-08-07 08:26:30,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [389.83847, 1990.7786, 753.401, 199.61137, 962.1262, 206.71217, 695.64844, 434.27338, 157.10223, 301.3977]
2025-08-07 08:26:30,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [170.0, 740.0, 310.0, 111.0, 317.0, 108.0, 287.0, 182.0, 90.0, 138.0]
2025-08-07 08:26:30,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 66/100 (estimated time remaining: 59 minutes, 9 seconds)
2025-08-07 08:28:10,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:28:13,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 553.41748 ± 201.276
2025-08-07 08:28:13,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [605.0787, 887.37915, 668.36487, 596.19086, 335.30872, 590.71454, 258.36002, 507.50375, 280.42072, 804.8534]
2025-08-07 08:28:13,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [248.0, 354.0, 277.0, 251.0, 154.0, 262.0, 133.0, 220.0, 129.0, 327.0]
2025-08-07 08:28:13,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 67/100 (estimated time remaining: 57 minutes, 31 seconds)
2025-08-07 08:29:51,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:29:54,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 578.56702 ± 327.672
2025-08-07 08:29:54,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [986.61523, 181.05405, 206.70212, 614.76355, 154.05338, 625.5089, 603.91235, 403.28473, 1131.9543, 877.8219]
2025-08-07 08:29:54,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [375.0, 103.0, 110.0, 258.0, 94.0, 259.0, 248.0, 181.0, 442.0, 371.0]
2025-08-07 08:29:54,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 68/100 (estimated time remaining: 55 minutes, 54 seconds)
2025-08-07 08:31:31,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:31:34,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 651.72015 ± 401.594
2025-08-07 08:31:34,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [415.98987, 385.0188, 1135.0125, 515.48474, 1533.8547, 184.42018, 399.85263, 605.6113, 378.47916, 963.47815]
2025-08-07 08:31:34,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [176.0, 168.0, 435.0, 211.0, 600.0, 105.0, 187.0, 268.0, 174.0, 337.0]
2025-08-07 08:31:34,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 69/100 (estimated time remaining: 53 minutes, 56 seconds)
2025-08-07 08:33:12,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:33:15,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 695.15027 ± 454.298
2025-08-07 08:33:15,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [893.3792, 666.84015, 442.4985, 121.66092, 878.3749, 458.78964, 428.92172, 1742.7883, 1089.5853, 228.66371]
2025-08-07 08:33:15,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [355.0, 259.0, 186.0, 82.0, 339.0, 191.0, 184.0, 665.0, 404.0, 118.0]
2025-08-07 08:33:15,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 70/100 (estimated time remaining: 52 minutes, 2 seconds)
2025-08-07 08:34:57,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:35:00,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 527.17566 ± 321.877
2025-08-07 08:35:00,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [767.74115, 392.3685, 227.93314, 164.09573, 950.63824, 650.7849, 1091.9951, 193.20114, 205.58365, 627.41486]
2025-08-07 08:35:00,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [296.0, 172.0, 117.0, 94.0, 364.0, 262.0, 398.0, 101.0, 106.0, 250.0]
2025-08-07 08:35:00,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 71/100 (estimated time remaining: 50 minutes, 58 seconds)
2025-08-07 08:36:36,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:36:40,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 733.17902 ± 348.109
2025-08-07 08:36:40,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [187.45996, 1159.3312, 564.4579, 404.2465, 1096.2288, 921.9148, 1078.0613, 205.46967, 851.8981, 862.722]
2025-08-07 08:36:40,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [105.0, 430.0, 234.0, 173.0, 404.0, 352.0, 415.0, 110.0, 330.0, 344.0]
2025-08-07 08:36:40,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 72/100 (estimated time remaining: 49 minutes, 2 seconds)
2025-08-07 08:38:16,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:38:20,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 939.18420 ± 728.385
2025-08-07 08:38:20,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [97.856705, 255.26122, 417.30878, 916.3034, 207.07599, 1215.6656, 1258.1012, 723.0241, 1866.9324, 2434.312]
2025-08-07 08:38:20,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [69.0, 124.0, 188.0, 329.0, 109.0, 444.0, 424.0, 290.0, 628.0, 872.0]
2025-08-07 08:38:20,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1226 [INFO]: New best (939.18) for latency ExtremeClogL1U23
2025-08-07 08:38:20,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 73/100 (estimated time remaining: 47 minutes, 14 seconds)
2025-08-07 08:40:00,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:40:04,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 710.75177 ± 638.931
2025-08-07 08:40:04,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [111.05516, 306.08545, 142.30737, 1361.9243, 2276.9927, 842.4016, 673.9653, 556.8471, 687.09814, 148.84044]
2025-08-07 08:40:04,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [70.0, 151.0, 89.0, 448.0, 762.0, 324.0, 276.0, 237.0, 274.0, 86.0]
2025-08-07 08:40:04,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 74/100 (estimated time remaining: 45 minutes, 52 seconds)
2025-08-07 08:41:40,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:41:44,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 782.59680 ± 409.887
2025-08-07 08:41:44,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [904.54846, 481.0437, 99.06358, 654.9438, 405.6077, 1057.3512, 877.9695, 1619.5544, 588.4962, 1137.3889]
2025-08-07 08:41:44,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [328.0, 208.0, 70.0, 272.0, 176.0, 414.0, 331.0, 580.0, 234.0, 438.0]
2025-08-07 08:41:44,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 75/100 (estimated time remaining: 44 minutes, 6 seconds)
2025-08-07 08:43:26,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:43:29,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 654.60461 ± 469.726
2025-08-07 08:43:29,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [890.73926, 947.5632, 215.60182, 298.75577, 547.37274, 409.77682, 410.09058, 1895.9504, 492.4427, 437.75226]
2025-08-07 08:43:29,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [335.0, 358.0, 112.0, 137.0, 232.0, 181.0, 169.0, 650.0, 207.0, 188.0]
2025-08-07 08:43:29,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 76/100 (estimated time remaining: 42 minutes, 25 seconds)
2025-08-07 08:45:04,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:45:07,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 557.60144 ± 301.695
2025-08-07 08:45:07,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [926.3411, 133.27821, 402.22583, 703.23914, 486.0306, 469.8836, 662.32025, 297.51413, 308.72302, 1186.4587]
2025-08-07 08:45:07,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [347.0, 89.0, 172.0, 283.0, 207.0, 200.0, 264.0, 136.0, 148.0, 448.0]
2025-08-07 08:45:07,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 77/100 (estimated time remaining: 40 minutes, 31 seconds)
2025-08-07 08:46:44,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:46:48,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 653.96155 ± 400.859
2025-08-07 08:46:48,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [491.33078, 211.00047, 377.99783, 401.35446, 1689.5612, 668.9526, 448.07318, 675.59375, 1003.32336, 572.4275]
2025-08-07 08:46:48,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [211.0, 111.0, 168.0, 181.0, 541.0, 264.0, 187.0, 266.0, 365.0, 228.0]
2025-08-07 08:46:48,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 78/100 (estimated time remaining: 38 minutes, 53 seconds)
2025-08-07 08:48:28,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:48:31,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 737.28827 ± 538.569
2025-08-07 08:48:31,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [910.6889, 1117.2977, 103.15631, 2084.6558, 789.057, 405.73425, 655.26544, 434.51318, 195.09506, 677.41876]
2025-08-07 08:48:31,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [347.0, 442.0, 69.0, 742.0, 314.0, 175.0, 276.0, 183.0, 105.0, 267.0]
2025-08-07 08:48:31,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 79/100 (estimated time remaining: 37 minutes, 12 seconds)
2025-08-07 08:50:10,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:50:13,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 665.91803 ± 374.757
2025-08-07 08:50:13,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [675.8036, 418.5519, 839.8014, 843.4418, 288.96454, 748.29395, 248.4133, 205.36789, 890.58655, 1499.9551]
2025-08-07 08:50:13,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [267.0, 174.0, 330.0, 332.0, 135.0, 291.0, 122.0, 111.0, 352.0, 477.0]
2025-08-07 08:50:13,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 80/100 (estimated time remaining: 35 minutes, 37 seconds)
2025-08-07 08:51:52,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:51:55,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 579.09869 ± 418.936
2025-08-07 08:51:55,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [431.65155, 798.9132, 199.96846, 210.78487, 158.31502, 728.5299, 402.24124, 1615.2295, 843.5708, 401.78232]
2025-08-07 08:51:55,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [176.0, 298.0, 109.0, 112.0, 92.0, 284.0, 178.0, 616.0, 334.0, 178.0]
2025-08-07 08:51:55,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 81/100 (estimated time remaining: 33 minutes, 44 seconds)
2025-08-07 08:53:37,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:53:42,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 910.50763 ± 484.452
2025-08-07 08:53:42,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1123.9426, 687.03564, 738.01263, 747.5785, 703.652, 620.2756, 340.849, 543.27106, 1913.2344, 1687.2245]
2025-08-07 08:53:42,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [432.0, 269.0, 283.0, 308.0, 271.0, 253.0, 152.0, 224.0, 710.0, 613.0]
2025-08-07 08:53:42,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 82/100 (estimated time remaining: 32 minutes, 38 seconds)
2025-08-07 08:55:16,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:55:19,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 639.06848 ± 336.493
2025-08-07 08:55:19,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [420.49026, 419.27777, 1020.59, 469.1781, 620.17505, 1330.547, 141.78618, 383.04272, 685.1809, 900.4172]
2025-08-07 08:55:19,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [179.0, 176.0, 383.0, 193.0, 253.0, 487.0, 88.0, 161.0, 243.0, 332.0]
2025-08-07 08:55:19,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 83/100 (estimated time remaining: 30 minutes, 41 seconds)
2025-08-07 08:56:55,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:56:59,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 820.63464 ± 631.522
2025-08-07 08:56:59,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [471.2956, 729.5678, 358.32187, 1020.24365, 1376.659, 2278.6138, 16.880016, 1138.3391, 205.38162, 611.0443]
2025-08-07 08:56:59,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [193.0, 274.0, 162.0, 381.0, 488.0, 777.0, 23.0, 410.0, 107.0, 237.0]
2025-08-07 08:56:59,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 84/100 (estimated time remaining: 28 minutes, 45 seconds)
2025-08-07 08:58:38,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:58:43,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 930.74249 ± 348.103
2025-08-07 08:58:43,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [524.0446, 944.4997, 671.54083, 1657.2854, 745.99097, 547.06757, 1148.9468, 1247.0312, 1156.7869, 664.23035]
2025-08-07 08:58:43,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [204.0, 364.0, 264.0, 530.0, 258.0, 223.0, 427.0, 438.0, 384.0, 253.0]
2025-08-07 08:58:43,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 85/100 (estimated time remaining: 27 minutes, 11 seconds)
2025-08-07 09:00:20,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:00:23,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 737.79578 ± 400.993
2025-08-07 09:00:23,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [417.9481, 968.80585, 384.02902, 1623.3383, 167.824, 663.87524, 924.8295, 851.58093, 405.28168, 970.4451]
2025-08-07 09:00:23,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [180.0, 364.0, 174.0, 587.0, 101.0, 256.0, 346.0, 325.0, 179.0, 357.0]
2025-08-07 09:00:23,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 86/100 (estimated time remaining: 25 minutes, 24 seconds)
2025-08-07 09:02:10,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:02:13,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 581.36865 ± 507.606
2025-08-07 09:02:13,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [180.9953, 1388.927, 434.83392, 171.9136, 241.62494, 612.5112, 889.9468, 196.19653, 116.07375, 1580.6635]
2025-08-07 09:02:13,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [102.0, 488.0, 187.0, 96.0, 122.0, 244.0, 343.0, 108.0, 76.0, 548.0]
2025-08-07 09:02:13,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 87/100 (estimated time remaining: 23 minutes, 50 seconds)
2025-08-07 09:03:44,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:03:49,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1049.50281 ± 518.287
2025-08-07 09:03:49,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [406.09085, 495.38046, 1544.7157, 1061.3813, 695.84235, 1498.6565, 1622.9984, 1671.1371, 250.63728, 1248.1871]
2025-08-07 09:03:49,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [175.0, 217.0, 552.0, 383.0, 267.0, 526.0, 564.0, 580.0, 128.0, 456.0]
2025-08-07 09:03:49,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1226 [INFO]: New best (1049.50) for latency ExtremeClogL1U23
2025-08-07 09:03:49,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 88/100 (estimated time remaining: 22 minutes, 5 seconds)
2025-08-07 09:05:28,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:05:31,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 765.51117 ± 564.130
2025-08-07 09:05:31,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2243.1934, 388.41736, 112.61759, 645.9471, 1143.9738, 885.2425, 739.83575, 669.4408, 398.31277, 428.1304]
2025-08-07 09:05:31,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [795.0, 178.0, 75.0, 254.0, 419.0, 323.0, 283.0, 255.0, 178.0, 183.0]
2025-08-07 09:05:31,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 89/100 (estimated time remaining: 20 minutes, 29 seconds)
2025-08-07 09:07:09,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:07:13,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 798.65570 ± 406.004
2025-08-07 09:07:13,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [673.29083, 962.12146, 1137.7766, 174.42348, 386.23013, 1606.1111, 635.1116, 437.90802, 826.2714, 1147.3124]
2025-08-07 09:07:13,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [259.0, 355.0, 427.0, 94.0, 175.0, 564.0, 247.0, 188.0, 314.0, 414.0]
2025-08-07 09:07:13,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 90/100 (estimated time remaining: 18 minutes, 41 seconds)
2025-08-07 09:08:51,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:08:55,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 841.86053 ± 756.474
2025-08-07 09:08:55,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [127.590935, 711.57855, 661.313, 942.39844, 307.86093, 2838.2476, 1428.515, 188.63094, 610.61304, 601.85693]
2025-08-07 09:08:55,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [81.0, 276.0, 259.0, 350.0, 143.0, 993.0, 526.0, 102.0, 245.0, 238.0]
2025-08-07 09:08:55,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 91/100 (estimated time remaining: 17 minutes, 4 seconds)
2025-08-07 09:10:36,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:10:39,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 555.58929 ± 350.958
2025-08-07 09:10:39,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [529.97925, 775.0509, 213.40262, 123.91268, 211.29254, 931.8974, 277.922, 388.32483, 948.90564, 1155.2052]
2025-08-07 09:10:39,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [224.0, 299.0, 117.0, 78.0, 115.0, 349.0, 130.0, 167.0, 350.0, 418.0]
2025-08-07 09:10:39,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 92/100 (estimated time remaining: 15 minutes, 10 seconds)
2025-08-07 09:12:17,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:12:20,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 545.28705 ± 407.000
2025-08-07 09:12:20,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [121.9432, 922.13684, 87.77005, 789.2171, 1486.8135, 471.56613, 503.5076, 152.09933, 418.5312, 499.2855]
2025-08-07 09:12:20,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [78.0, 334.0, 58.0, 302.0, 512.0, 194.0, 205.0, 87.0, 186.0, 199.0]
2025-08-07 09:12:20,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 93/100 (estimated time remaining: 13 minutes, 37 seconds)
2025-08-07 09:13:56,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:14:01,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1042.52759 ± 1001.286
2025-08-07 09:14:01,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2923.1704, 190.0625, 165.21156, 2147.7449, 419.92648, 204.01431, 107.14732, 2352.5237, 1151.2369, 764.23706]
2025-08-07 09:14:01,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [1000.0, 108.0, 91.0, 694.0, 177.0, 107.0, 70.0, 838.0, 368.0, 269.0]
2025-08-07 09:14:01,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 94/100 (estimated time remaining: 11 minutes, 53 seconds)
2025-08-07 09:15:44,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:15:47,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 757.47058 ± 462.511
2025-08-07 09:15:47,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [104.93087, 559.51294, 1473.352, 1094.0267, 495.86142, 656.33026, 1288.7031, 168.45573, 1264.7009, 468.83215]
2025-08-07 09:15:47,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [71.0, 224.0, 490.0, 410.0, 208.0, 256.0, 456.0, 97.0, 449.0, 197.0]
2025-08-07 09:15:48,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 17 seconds)
2025-08-07 09:17:22,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:17:28,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1478.38501 ± 999.318
2025-08-07 09:17:28,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [2887.0754, 2163.174, 1199.5365, 2946.0957, 506.83994, 336.7888, 2153.34, 112.91348, 1795.5083, 682.5782]
2025-08-07 09:17:28,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [936.0, 717.0, 394.0, 1000.0, 203.0, 151.0, 724.0, 80.0, 608.0, 259.0]
2025-08-07 09:17:28,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1226 [INFO]: New best (1478.39) for latency ExtremeClogL1U23
2025-08-07 09:17:28,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 32 seconds)
2025-08-07 09:19:08,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:19:14,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1189.41040 ± 833.113
2025-08-07 09:19:14,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [428.287, 410.2605, 648.9616, 2870.6501, 1220.3578, 2620.6646, 666.9693, 1248.4386, 634.52704, 1144.9872]
2025-08-07 09:19:14,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [191.0, 177.0, 256.0, 965.0, 430.0, 882.0, 257.0, 447.0, 243.0, 414.0]
2025-08-07 09:19:14,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 51 seconds)
2025-08-07 09:20:51,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:20:55,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 921.90637 ± 858.084
2025-08-07 09:20:55,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1290.9193, 938.2092, 168.18964, 372.51984, 391.27576, 1874.6792, 214.54956, 662.19, 2981.7627, 324.76868]
2025-08-07 09:20:55,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [447.0, 350.0, 94.0, 166.0, 169.0, 647.0, 112.0, 256.0, 1000.0, 146.0]
2025-08-07 09:20:55,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 9 seconds)
2025-08-07 09:22:38,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:22:44,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1297.05225 ± 971.943
2025-08-07 09:22:44,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [1167.466, 737.7622, 155.83817, 198.66777, 681.8906, 472.69894, 1770.9739, 2657.1006, 2928.324, 2199.8]
2025-08-07 09:22:44,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [420.0, 279.0, 89.0, 102.0, 266.0, 195.0, 619.0, 936.0, 1000.0, 760.0]
2025-08-07 09:22:44,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 29 seconds)
2025-08-07 09:24:23,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:24:28,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 1165.51709 ± 697.235
2025-08-07 09:24:28,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [960.0838, 1886.7465, 241.35622, 195.31972, 1945.6901, 241.45695, 1222.1309, 1221.2085, 2012.937, 1728.2413]
2025-08-07 09:24:28,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [359.0, 655.0, 120.0, 107.0, 678.0, 120.0, 447.0, 444.0, 714.0, 610.0]
2025-08-07 09:24:28,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 44 seconds)
2025-08-07 09:26:01,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:26:05,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1221 [DEBUG]: Total Reward: 734.50079 ± 419.903
2025-08-07 09:26:05,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1222 [DEBUG]: All rewards: [223.333, 438.48587, 956.6578, 1226.4734, 154.61987, 1039.2568, 665.1767, 253.46281, 1327.3275, 1060.2144]
2025-08-07 09:26:05,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1223 [DEBUG]: All trajectory lengths: [120.0, 185.0, 365.0, 439.0, 90.0, 333.0, 258.0, 126.0, 471.0, 333.0]
2025-08-07 09:26:05,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-hopper):1251 [DEBUG]: Training session finished
