2026-01-22 23:14:27,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-hopper/DatasetOffice-bpql-mem2
2026-01-22 23:14:27,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-hopper/DatasetOffice-bpql-mem2
2026-01-22 23:14:27,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x151b9f317250>}
2026-01-22 23:14:27,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1159 [DEBUG]: using device: cuda
2026-01-22 23:14:27,472 baseline-bpql-noisy-hopper:77 [WARNING]: args.assumed_delay != args.horizon: 2 != 32
2026-01-22 23:14:27,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1181 [INFO]: Creating new trainer
2026-01-22 23:14:27,488 baseline-bpql-noisy-hopper:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=17, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2026-01-22 23:14:27,488 baseline-bpql-noisy-hopper:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-22 23:14:28,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1242 [DEBUG]: Starting training session...
2026-01-22 23:14:28,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 1/100
2026-01-22 23:15:55,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:15:56,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 154.63474 ± 17.401
2026-01-22 23:15:56,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [129.24968, 138.17342, 175.93773, 130.97179, 174.11194, 160.94798, 157.13762, 174.18994, 164.73386, 140.89326]
2026-01-22 23:15:56,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [66.0, 69.0, 83.0, 67.0, 82.0, 77.0, 76.0, 82.0, 78.0, 70.0]
2026-01-22 23:15:56,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1274 [INFO]: New best (154.63) for latency DatasetOffice
2026-01-22 23:15:56,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 25 minutes, 9 seconds)
2026-01-22 23:17:31,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:17:32,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 223.84856 ± 2.780
2026-01-22 23:17:32,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [220.78845, 229.24998, 225.57713, 220.88297, 227.90488, 221.79709, 222.60602, 224.73338, 222.16364, 222.78189]
2026-01-22 23:17:32,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [99.0, 102.0, 103.0, 99.0, 102.0, 101.0, 101.0, 100.0, 98.0, 99.0]
2026-01-22 23:17:32,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1274 [INFO]: New best (223.85) for latency DatasetOffice
2026-01-22 23:17:32,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 30 minutes, 5 seconds)
2026-01-22 23:19:08,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:19:09,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 299.55328 ± 4.957
2026-01-22 23:19:09,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [290.8629, 301.70142, 299.33255, 302.75397, 302.42966, 297.65366, 292.74384, 305.58594, 295.99124, 306.4777]
2026-01-22 23:19:09,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [160.0, 157.0, 160.0, 157.0, 157.0, 159.0, 155.0, 161.0, 161.0, 156.0]
2026-01-22 23:19:09,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1274 [INFO]: New best (299.55) for latency DatasetOffice
2026-01-22 23:19:09,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 31 minutes, 34 seconds)
2026-01-22 23:20:43,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:20:44,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 374.47455 ± 49.021
2026-01-22 23:20:44,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [326.94934, 390.4914, 317.2919, 452.5752, 392.28677, 460.85132, 347.00116, 344.48672, 324.31058, 388.50104]
2026-01-22 23:20:44,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [155.0, 150.0, 153.0, 175.0, 154.0, 173.0, 163.0, 172.0, 143.0, 156.0]
2026-01-22 23:20:44,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1274 [INFO]: New best (374.47) for latency DatasetOffice
2026-01-22 23:20:44,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 30 minutes, 32 seconds)
2026-01-22 23:22:19,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:22:20,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 285.17239 ± 1.638
2026-01-22 23:22:20,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [287.03964, 284.4241, 283.4351, 285.56662, 286.08078, 283.2121, 283.41397, 285.4672, 288.54477, 284.5395]
2026-01-22 23:22:20,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [129.0, 126.0, 124.0, 129.0, 125.0, 125.0, 126.0, 128.0, 131.0, 127.0]
2026-01-22 23:22:20,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 29 minutes, 39 seconds)
2026-01-22 23:23:56,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:23:58,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 353.61557 ± 107.783
2026-01-22 23:23:58,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [45.568428, 431.7068, 400.38007, 422.84018, 395.5111, 347.8763, 343.61136, 378.47617, 341.99655, 428.189]
2026-01-22 23:23:58,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [29.0, 178.0, 156.0, 166.0, 148.0, 139.0, 134.0, 154.0, 134.0, 162.0]
2026-01-22 23:23:58,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 31 minutes, 3 seconds)
2026-01-22 23:25:35,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:25:36,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 582.66565 ± 42.502
2026-01-22 23:25:36,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [551.15857, 595.1134, 614.6285, 590.56683, 690.7332, 543.49585, 540.83466, 560.5144, 564.8833, 574.7276]
2026-01-22 23:25:36,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [193.0, 201.0, 218.0, 203.0, 243.0, 188.0, 189.0, 197.0, 195.0, 198.0]
2026-01-22 23:25:36,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1274 [INFO]: New best (582.67) for latency DatasetOffice
2026-01-22 23:25:36,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 30 minutes, 15 seconds)
2026-01-22 23:27:12,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:27:14,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 531.55554 ± 28.931
2026-01-22 23:27:14,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [561.4226, 549.71045, 506.6203, 550.9707, 523.5332, 523.43835, 530.7618, 585.0198, 496.37375, 487.7053]
2026-01-22 23:27:14,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [195.0, 200.0, 195.0, 191.0, 198.0, 191.0, 207.0, 228.0, 195.0, 186.0]
2026-01-22 23:27:14,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 28 minutes, 45 seconds)
2026-01-22 23:28:55,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:28:57,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 878.07117 ± 341.832
2026-01-22 23:28:57,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [546.24457, 907.375, 1034.9967, 1810.6942, 670.70593, 761.73645, 921.94745, 821.25275, 653.6736, 652.0854]
2026-01-22 23:28:57,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [185.0, 301.0, 340.0, 688.0, 235.0, 261.0, 311.0, 276.0, 219.0, 216.0]
2026-01-22 23:28:57,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1274 [INFO]: New best (878.07) for latency DatasetOffice
2026-01-22 23:28:57,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 29 minutes, 33 seconds)
2026-01-22 23:30:33,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:30:35,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 682.80286 ± 11.031
2026-01-22 23:30:35,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [697.7964, 694.3129, 675.6944, 688.7479, 691.74927, 681.5723, 678.90704, 674.99786, 658.19666, 686.0533]
2026-01-22 23:30:35,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [227.0, 221.0, 217.0, 218.0, 219.0, 217.0, 217.0, 216.0, 210.0, 217.0]
2026-01-22 23:30:35,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 28 minutes, 21 seconds)
2026-01-22 23:32:10,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:32:10,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 225.84468 ± 195.250
2026-01-22 23:32:10,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [454.1509, 485.22412, 33.06913, 39.634632, 32.033638, 554.7552, 282.19006, 58.33299, 113.6407, 205.41545]
2026-01-22 23:32:10,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [154.0, 161.0, 27.0, 43.0, 36.0, 178.0, 109.0, 36.0, 80.0, 103.0]
2026-01-22 23:32:10,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 26 minutes, 8 seconds)
2026-01-22 23:33:50,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:33:52,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1062.28687 ± 353.844
2026-01-22 23:33:52,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1600.6101, 603.49133, 978.91235, 727.50806, 660.1993, 923.0509, 1520.1141, 1230.8098, 887.8235, 1490.3511]
2026-01-22 23:33:52,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [508.0, 196.0, 317.0, 236.0, 211.0, 292.0, 482.0, 388.0, 280.0, 469.0]
2026-01-22 23:33:52,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1274 [INFO]: New best (1062.29) for latency DatasetOffice
2026-01-22 23:33:52,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 25 minutes, 33 seconds)
2026-01-22 23:35:30,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:35:32,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 700.39392 ± 47.230
2026-01-22 23:35:32,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [801.93005, 662.556, 642.0689, 737.29474, 656.1557, 665.7975, 717.93787, 668.9765, 731.97125, 719.2515]
2026-01-22 23:35:32,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [261.0, 227.0, 223.0, 245.0, 220.0, 225.0, 237.0, 224.0, 240.0, 234.0]
2026-01-22 23:35:32,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 24 minutes, 14 seconds)
2026-01-22 23:37:09,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:37:11,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 662.60834 ± 11.365
2026-01-22 23:37:11,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [652.6872, 677.6772, 684.5739, 652.56085, 662.3202, 661.2107, 660.58307, 666.9833, 643.7399, 663.7474]
2026-01-22 23:37:11,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [211.0, 218.0, 224.0, 218.0, 220.0, 218.0, 219.0, 218.0, 213.0, 219.0]
2026-01-22 23:37:11,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 21 minutes, 38 seconds)
2026-01-22 23:38:48,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:38:50,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 778.84290 ± 183.247
2026-01-22 23:38:50,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [717.7862, 1313.4541, 687.2345, 683.1811, 726.7246, 833.32324, 681.21436, 691.1517, 711.85406, 742.50525]
2026-01-22 23:38:50,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [242.0, 418.0, 231.0, 222.0, 244.0, 269.0, 221.0, 224.0, 246.0, 247.0]
2026-01-22 23:38:50,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 20 minutes, 20 seconds)
2026-01-22 23:40:28,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:40:31,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1013.52063 ± 411.538
2026-01-22 23:40:31,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1126.131, 1194.6831, 262.17236, 806.43365, 1886.3356, 671.9911, 1154.7028, 1154.7086, 703.1061, 1174.9412]
2026-01-22 23:40:31,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [383.0, 406.0, 120.0, 291.0, 624.0, 255.0, 408.0, 391.0, 263.0, 412.0]
2026-01-22 23:40:31,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 20 minutes, 14 seconds)
2026-01-22 23:42:06,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:42:11,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1574.67529 ± 627.443
2026-01-22 23:42:11,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [177.18538, 984.88715, 2623.6958, 1955.645, 1971.3562, 1649.5784, 1361.3883, 1953.7894, 1666.9855, 1402.242]
2026-01-22 23:42:11,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [87.0, 318.0, 811.0, 627.0, 657.0, 518.0, 438.0, 622.0, 520.0, 443.0]
2026-01-22 23:42:11,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1274 [INFO]: New best (1574.68) for latency DatasetOffice
2026-01-22 23:42:11,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 17 minutes, 50 seconds)
2026-01-22 23:43:50,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:43:53,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 925.86365 ± 515.240
2026-01-22 23:43:53,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [912.1663, 849.5512, 1916.2552, 794.3384, 413.93524, 580.47174, 1863.8396, 404.76492, 582.9119, 940.401]
2026-01-22 23:43:53,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [289.0, 261.0, 610.0, 266.0, 151.0, 183.0, 598.0, 157.0, 228.0, 298.0]
2026-01-22 23:43:53,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 17 minutes, 3 seconds)
2026-01-22 23:45:29,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:45:33,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1225.15063 ± 472.740
2026-01-22 23:45:33,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1283.9521, 1308.4673, 533.41815, 1051.3245, 1278.3053, 472.26218, 1393.2169, 1959.1199, 1014.9211, 1956.5198]
2026-01-22 23:45:33,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [414.0, 422.0, 213.0, 393.0, 411.0, 180.0, 441.0, 610.0, 384.0, 617.0]
2026-01-22 23:45:33,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 15 minutes, 26 seconds)
2026-01-22 23:47:09,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:47:12,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1043.84973 ± 178.335
2026-01-22 23:47:12,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [876.16, 1259.2728, 882.7208, 846.1782, 926.49286, 941.61975, 1169.896, 1377.1028, 968.27374, 1190.78]
2026-01-22 23:47:12,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [286.0, 407.0, 288.0, 276.0, 299.0, 306.0, 378.0, 447.0, 315.0, 386.0]
2026-01-22 23:47:12,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 13 minutes, 47 seconds)
2026-01-22 23:48:50,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:48:52,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 834.70007 ± 415.110
2026-01-22 23:48:52,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1129.0867, 745.809, 981.38165, 992.1028, 25.69761, 1249.2406, 1144.2501, 1071.1299, 943.3163, 64.985855]
2026-01-22 23:48:52,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [379.0, 241.0, 317.0, 317.0, 22.0, 405.0, 365.0, 340.0, 302.0, 42.0]
2026-01-22 23:48:52,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 11 minutes, 50 seconds)
2026-01-22 23:50:28,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:50:33,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1370.22729 ± 280.963
2026-01-22 23:50:33,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1286.4915, 1288.5554, 1574.8019, 1525.9261, 1278.6656, 1578.7173, 1981.527, 1057.0675, 1067.2228, 1063.2993]
2026-01-22 23:50:33,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [416.0, 404.0, 500.0, 499.0, 422.0, 514.0, 671.0, 345.0, 350.0, 362.0]
2026-01-22 23:50:33,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 10 minutes, 33 seconds)
2026-01-22 23:52:11,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:52:15,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 968.33643 ± 486.017
2026-01-22 23:52:15,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [10.084096, 1601.7407, 1155.2418, 714.7424, 1146.0056, 727.7329, 585.56537, 1691.6844, 1323.3988, 727.16797]
2026-01-22 23:52:15,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [12.0, 528.0, 392.0, 264.0, 382.0, 264.0, 225.0, 543.0, 429.0, 272.0]
2026-01-22 23:52:15,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 8 minutes, 46 seconds)
2026-01-22 23:53:49,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:53:52,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1310.72546 ± 365.804
2026-01-22 23:53:52,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1402.5707, 732.4927, 1854.5563, 1615.7279, 1396.123, 1416.0175, 961.1253, 709.07916, 1620.9824, 1398.5791]
2026-01-22 23:53:52,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [458.0, 272.0, 589.0, 527.0, 454.0, 467.0, 329.0, 261.0, 541.0, 466.0]
2026-01-22 23:53:52,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 6 minutes, 30 seconds)
2026-01-22 23:55:25,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:55:28,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1114.91418 ± 241.185
2026-01-22 23:55:28,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [914.3777, 1410.6187, 1243.7963, 779.1515, 1256.2657, 1110.0511, 898.89374, 869.3445, 1559.5073, 1107.1339]
2026-01-22 23:55:28,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [300.0, 449.0, 412.0, 287.0, 423.0, 367.0, 296.0, 280.0, 495.0, 362.0]
2026-01-22 23:55:28,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 4 minutes, 3 seconds)
2026-01-22 23:57:04,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:57:06,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 451.07407 ± 660.610
2026-01-22 23:57:06,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1903.989, 44.692562, 41.204445, 1370.6954, 45.085495, 29.661703, 44.393387, 939.1144, 48.827293, 43.076912]
2026-01-22 23:57:06,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [581.0, 32.0, 30.0, 422.0, 32.0, 24.0, 42.0, 318.0, 34.0, 31.0]
2026-01-22 23:57:06,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 1 minute, 44 seconds)
2026-01-22 23:58:40,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:58:43,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 983.13440 ± 497.261
2026-01-22 23:58:43,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1427.95, 79.74773, 52.156765, 1182.2031, 1355.8998, 934.9883, 896.417, 1423.4185, 1041.1582, 1437.4044]
2026-01-22 23:58:43,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [441.0, 50.0, 63.0, 378.0, 418.0, 296.0, 281.0, 437.0, 327.0, 447.0]
2026-01-22 23:58:43,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 28/100 (estimated time remaining: 1 hour, 59 minutes, 13 seconds)
2026-01-23 00:00:20,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:00:24,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1322.77344 ± 296.233
2026-01-23 00:00:24,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [964.27216, 1154.118, 1726.356, 1226.372, 1723.0396, 1212.8208, 1226.1351, 937.30304, 1270.4042, 1786.9131]
2026-01-23 00:00:24,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [309.0, 355.0, 536.0, 389.0, 540.0, 385.0, 390.0, 298.0, 402.0, 553.0]
2026-01-23 00:00:24,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 29/100 (estimated time remaining: 1 hour, 57 minutes, 31 seconds)
2026-01-23 00:01:58,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:02:01,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 929.90900 ± 513.358
2026-01-23 00:02:01,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [943.3856, 1269.2399, 933.8182, 986.45905, 1473.6277, 1691.8026, 35.90828, 16.919043, 1018.89703, 929.0323]
2026-01-23 00:02:01,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [297.0, 395.0, 295.0, 314.0, 462.0, 518.0, 30.0, 19.0, 322.0, 294.0]
2026-01-23 00:02:01,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 30/100 (estimated time remaining: 1 hour, 55 minutes, 33 seconds)
2026-01-23 00:03:37,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:03:41,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1407.37231 ± 582.235
2026-01-23 00:03:41,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [25.140076, 1950.8401, 1336.5851, 1331.4384, 1244.1583, 1307.122, 1327.1486, 1888.9169, 2350.337, 1312.0367]
2026-01-23 00:03:41,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [24.0, 609.0, 409.0, 410.0, 386.0, 410.0, 415.0, 589.0, 725.0, 408.0]
2026-01-23 00:03:41,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 54 minutes, 57 seconds)
2026-01-23 00:05:15,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:05:17,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1025.30347 ± 332.040
2026-01-23 00:05:17,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [2018.5276, 937.8695, 904.7608, 916.88885, 865.08514, 930.70374, 921.37506, 924.26385, 954.8432, 878.7173]
2026-01-23 00:05:17,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [626.0, 297.0, 278.0, 294.0, 259.0, 289.0, 280.0, 280.0, 302.0, 263.0]
2026-01-23 00:05:17,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 53 minutes, 5 seconds)
2026-01-23 00:06:54,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:06:58,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1474.26672 ± 718.380
2026-01-23 00:06:58,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1485.0503, 1151.373, 1042.3589, 1173.557, 1209.9178, 1330.6437, 713.6467, 3205.2856, 1009.1676, 2421.6648]
2026-01-23 00:06:58,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [452.0, 346.0, 315.0, 354.0, 370.0, 401.0, 244.0, 972.0, 309.0, 739.0]
2026-01-23 00:06:58,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 52 minutes, 8 seconds)
2026-01-23 00:08:36,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:08:37,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 435.48999 ± 435.949
2026-01-23 00:08:37,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [962.1759, 936.0195, 936.53973, 1018.49316, 82.77019, 46.639824, 258.90872, 40.386784, 23.64708, 49.319206]
2026-01-23 00:08:37,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [307.0, 278.0, 289.0, 315.0, 51.0, 44.0, 128.0, 51.0, 23.0, 50.0]
2026-01-23 00:08:37,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 50 minutes, 6 seconds)
2026-01-23 00:10:07,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:10:12,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1499.99939 ± 211.127
2026-01-23 00:10:12,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1178.0553, 1680.8973, 1318.2153, 1700.0963, 1436.6581, 1771.3137, 1685.5162, 1472.5974, 1594.9884, 1161.6561]
2026-01-23 00:10:12,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [370.0, 508.0, 405.0, 512.0, 478.0, 545.0, 510.0, 454.0, 487.0, 401.0]
2026-01-23 00:10:12,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 48 minutes)
2026-01-23 00:11:48,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:11:51,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 855.91602 ± 385.429
2026-01-23 00:11:51,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [941.98035, 932.0902, 950.64343, 896.70245, 10.938427, 1061.399, 1328.9124, 959.7605, 267.54303, 1209.1909]
2026-01-23 00:11:51,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [286.0, 285.0, 295.0, 273.0, 14.0, 322.0, 403.0, 299.0, 110.0, 395.0]
2026-01-23 00:11:51,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 46 minutes, 10 seconds)
2026-01-23 00:13:25,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:13:27,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 915.96936 ± 288.420
2026-01-23 00:13:27,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1039.7947, 724.3948, 1351.9252, 1101.0228, 800.35065, 1180.3938, 931.77814, 232.71654, 946.3718, 850.9443]
2026-01-23 00:13:27,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [317.0, 246.0, 407.0, 334.0, 240.0, 363.0, 281.0, 101.0, 291.0, 257.0]
2026-01-23 00:13:28,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 44 minutes, 35 seconds)
2026-01-23 00:15:05,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:15:09,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1074.14771 ± 990.873
2026-01-23 00:15:09,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1442.0787, 1747.2084, 1177.5154, 3201.8953, 1495.2053, 1530.8129, 44.549824, 23.204163, 22.94346, 56.06419]
2026-01-23 00:15:09,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [436.0, 568.0, 365.0, 1000.0, 464.0, 492.0, 26.0, 29.0, 21.0, 52.0]
2026-01-23 00:15:09,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 43 minutes, 10 seconds)
2026-01-23 00:16:45,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:16:48,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1204.15601 ± 174.603
2026-01-23 00:16:48,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1536.205, 972.55975, 1111.6263, 1188.7559, 1148.6171, 1492.0315, 1229.6555, 1111.4983, 1014.53204, 1236.0785]
2026-01-23 00:16:48,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [471.0, 301.0, 335.0, 371.0, 343.0, 470.0, 380.0, 332.0, 308.0, 384.0]
2026-01-23 00:16:48,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 41 minutes, 25 seconds)
2026-01-23 00:18:23,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:18:28,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1909.83240 ± 593.561
2026-01-23 00:18:28,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1601.2242, 2668.8003, 1950.9138, 1609.9805, 3300.9788, 1301.0635, 1494.535, 2053.4353, 1472.9553, 1644.4375]
2026-01-23 00:18:28,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [486.0, 814.0, 588.0, 489.0, 994.0, 422.0, 453.0, 632.0, 485.0, 499.0]
2026-01-23 00:18:28,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1274 [INFO]: New best (1909.83) for latency DatasetOffice
2026-01-23 00:18:28,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 40 minutes, 51 seconds)
2026-01-23 00:20:03,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:20:06,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1310.75098 ± 779.866
2026-01-23 00:20:06,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [348.31604, 1322.0342, 1135.1832, 2102.3394, 1246.9926, 904.4976, 1736.7732, 3058.5122, 346.95657, 905.90424]
2026-01-23 00:20:06,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [141.0, 409.0, 388.0, 639.0, 387.0, 307.0, 527.0, 1000.0, 141.0, 306.0]
2026-01-23 00:20:06,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 39 minutes, 4 seconds)
2026-01-23 00:21:39,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:21:43,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1144.56982 ± 877.886
2026-01-23 00:21:43,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [229.13795, 55.488445, 98.2339, 3062.2485, 2025.9557, 1044.5919, 1116.5746, 1332.403, 1054.6732, 1426.391]
2026-01-23 00:21:43,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [105.0, 47.0, 77.0, 1000.0, 614.0, 318.0, 339.0, 422.0, 320.0, 431.0]
2026-01-23 00:21:43,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 37 minutes, 22 seconds)
2026-01-23 00:23:21,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:23:26,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1761.21814 ± 791.274
2026-01-23 00:23:26,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [969.97876, 1249.2389, 1258.6116, 2067.4294, 1519.0656, 2102.945, 1183.2798, 3142.5513, 3162.0566, 957.0254]
2026-01-23 00:23:26,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [295.0, 381.0, 387.0, 660.0, 459.0, 670.0, 352.0, 1000.0, 1000.0, 301.0]
2026-01-23 00:23:26,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 36 minutes, 6 seconds)
2026-01-23 00:25:01,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:25:04,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 950.17108 ± 558.300
2026-01-23 00:25:04,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [214.91055, 37.011875, 904.33264, 1126.8911, 914.4544, 2159.3591, 1220.1616, 1203.1777, 1076.1805, 645.23145]
2026-01-23 00:25:04,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [93.0, 36.0, 311.0, 335.0, 315.0, 655.0, 370.0, 360.0, 324.0, 243.0]
2026-01-23 00:25:04,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 34 minutes, 8 seconds)
2026-01-23 00:26:38,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:26:42,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1392.62329 ± 982.576
2026-01-23 00:26:42,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [943.7574, 2621.6501, 3142.2937, 2470.3206, 953.8862, 1107.4739, 1247.0933, 1263.0159, 34.25353, 142.48804]
2026-01-23 00:26:42,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [291.0, 814.0, 1000.0, 760.0, 297.0, 334.0, 378.0, 413.0, 34.0, 110.0]
2026-01-23 00:26:42,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 32 minutes, 13 seconds)
2026-01-23 00:28:18,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:28:23,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1849.93262 ± 681.057
2026-01-23 00:28:23,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1160.3329, 2316.1677, 839.2278, 2491.2722, 1849.5616, 3199.5828, 2214.7563, 1238.4147, 1430.6351, 1759.3746]
2026-01-23 00:28:23,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [347.0, 702.0, 253.0, 774.0, 577.0, 1000.0, 671.0, 404.0, 432.0, 571.0]
2026-01-23 00:28:23,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 31 minutes, 2 seconds)
2026-01-23 00:29:57,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:30:00,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1084.87756 ± 282.390
2026-01-23 00:30:00,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [967.03436, 968.9956, 954.78394, 957.4061, 845.6541, 1535.1726, 1740.48, 969.81287, 958.04626, 951.38983]
2026-01-23 00:30:00,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [293.0, 303.0, 288.0, 286.0, 252.0, 464.0, 524.0, 304.0, 295.0, 293.0]
2026-01-23 00:30:00,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 29 minutes, 34 seconds)
2026-01-23 00:31:33,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:31:38,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1528.45496 ± 926.334
2026-01-23 00:31:38,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [54.263363, 55.648808, 1646.0687, 3045.1387, 1490.2775, 1755.8787, 2124.8132, 2049.6538, 2320.0396, 742.76843]
2026-01-23 00:31:38,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [39.0, 38.0, 500.0, 1000.0, 453.0, 534.0, 657.0, 616.0, 728.0, 244.0]
2026-01-23 00:31:38,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 26 minutes, 50 seconds)
2026-01-23 00:33:12,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:33:16,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1272.17407 ± 398.901
2026-01-23 00:33:16,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1335.6145, 1515.0557, 1600.6058, 2175.152, 776.681, 1098.3286, 982.88385, 1205.8881, 1249.9214, 781.6089]
2026-01-23 00:33:16,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [422.0, 460.0, 489.0, 662.0, 253.0, 327.0, 305.0, 365.0, 393.0, 255.0]
2026-01-23 00:33:16,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 25 minutes, 19 seconds)
2026-01-23 00:34:47,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:34:51,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1408.92542 ± 390.670
2026-01-23 00:34:51,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1205.5111, 1081.5964, 2084.2974, 1642.1676, 1609.3048, 938.04205, 1078.941, 1250.0486, 1153.3551, 2045.9891]
2026-01-23 00:34:51,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [364.0, 326.0, 630.0, 496.0, 491.0, 282.0, 326.0, 387.0, 349.0, 625.0]
2026-01-23 00:34:51,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 23 minutes, 7 seconds)
2026-01-23 00:36:29,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:36:32,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1095.74634 ± 476.529
2026-01-23 00:36:32,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1125.9811, 962.9196, 31.658445, 1224.811, 1219.5914, 789.6874, 1119.8195, 1187.1343, 2081.2556, 1214.6057]
2026-01-23 00:36:32,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [335.0, 292.0, 25.0, 382.0, 363.0, 257.0, 333.0, 354.0, 632.0, 365.0]
2026-01-23 00:36:32,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 21 minutes, 29 seconds)
2026-01-23 00:38:07,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:38:09,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1071.08118 ± 169.260
2026-01-23 00:38:09,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [988.3598, 1131.7666, 1553.5278, 1024.7433, 974.11664, 1039.1842, 981.61176, 975.7823, 955.61865, 1086.1003]
2026-01-23 00:38:09,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [302.0, 335.0, 471.0, 312.0, 302.0, 316.0, 305.0, 293.0, 291.0, 325.0]
2026-01-23 00:38:09,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 19 minutes, 52 seconds)
2026-01-23 00:39:42,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:39:44,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1064.50342 ± 154.151
2026-01-23 00:39:44,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1071.9911, 938.3474, 955.01526, 962.16125, 1148.6265, 1482.3822, 1017.63434, 958.9013, 1099.5095, 1010.4652]
2026-01-23 00:39:44,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [324.0, 294.0, 295.0, 298.0, 343.0, 446.0, 310.0, 297.0, 329.0, 311.0]
2026-01-23 00:39:44,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 17 minutes, 52 seconds)
2026-01-23 00:41:22,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:41:30,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2245.32300 ± 768.775
2026-01-23 00:41:30,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [2579.4836, 2578.5908, 3006.705, 1926.9708, 3117.7395, 3157.8396, 561.6032, 1612.9072, 2026.1742, 1885.2172]
2026-01-23 00:41:30,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [810.0, 795.0, 1000.0, 598.0, 1000.0, 990.0, 210.0, 491.0, 632.0, 569.0]
2026-01-23 00:41:30,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1274 [INFO]: New best (2245.32) for latency DatasetOffice
2026-01-23 00:41:30,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 17 minutes, 23 seconds)
2026-01-23 00:43:02,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:43:07,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1617.06824 ± 1108.039
2026-01-23 00:43:07,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1509.7302, 157.52379, 55.50384, 1961.2112, 2900.4353, 2891.095, 2127.2666, 1275.9229, 3047.8274, 244.16731]
2026-01-23 00:43:07,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [452.0, 76.0, 36.0, 586.0, 891.0, 868.0, 649.0, 390.0, 1000.0, 102.0]
2026-01-23 00:43:07,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 16 minutes, 3 seconds)
2026-01-23 00:44:46,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:44:50,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1515.93884 ± 700.230
2026-01-23 00:44:50,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1023.1145, 1226.1277, 3189.7437, 1040.1609, 1127.0422, 1022.1234, 1310.7742, 943.3459, 2074.4229, 2202.5347]
2026-01-23 00:44:50,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [311.0, 378.0, 1000.0, 314.0, 335.0, 314.0, 397.0, 299.0, 619.0, 667.0]
2026-01-23 00:44:50,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 14 minutes, 46 seconds)
2026-01-23 00:46:22,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:46:25,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1269.43750 ± 342.406
2026-01-23 00:46:25,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1518.3713, 1512.6246, 1691.5166, 1235.2428, 653.6838, 1031.8496, 1810.3151, 1157.9442, 1158.5011, 924.3251]
2026-01-23 00:46:25,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [461.0, 460.0, 507.0, 383.0, 232.0, 316.0, 544.0, 344.0, 344.0, 314.0]
2026-01-23 00:46:25,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 12 minutes, 44 seconds)
2026-01-23 00:48:00,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:48:04,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1605.37622 ± 893.868
2026-01-23 00:48:04,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1066.4818, 634.8951, 3035.5195, 3082.515, 959.5061, 768.4812, 1450.6556, 1237.9479, 1177.5094, 2640.2498]
2026-01-23 00:48:04,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [322.0, 226.0, 1000.0, 1000.0, 299.0, 233.0, 465.0, 374.0, 349.0, 793.0]
2026-01-23 00:48:04,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 11 minutes, 40 seconds)
2026-01-23 00:49:42,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:49:44,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 842.97089 ± 629.318
2026-01-23 00:49:44,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1664.9542, 1529.0856, 1522.2913, 1249.7697, 981.9968, 969.41003, 411.04565, 51.743095, 22.32623, 27.08716]
2026-01-23 00:49:44,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [495.0, 470.0, 467.0, 384.0, 296.0, 288.0, 160.0, 31.0, 20.0, 28.0]
2026-01-23 00:49:44,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 9 minutes, 13 seconds)
2026-01-23 00:51:23,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:51:28,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1456.93933 ± 716.811
2026-01-23 00:51:28,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [951.6474, 1201.3994, 1455.9266, 2632.8281, 2334.513, 1526.1428, 1240.8179, 1103.7609, 2104.0635, 18.294207]
2026-01-23 00:51:28,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [293.0, 358.0, 436.0, 801.0, 710.0, 470.0, 371.0, 329.0, 650.0, 18.0]
2026-01-23 00:51:28,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 8 minutes, 29 seconds)
2026-01-23 00:53:00,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:53:04,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1344.49841 ± 640.461
2026-01-23 00:53:04,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [2344.258, 634.73315, 1555.1005, 968.745, 2590.234, 769.0174, 1205.6588, 1269.5752, 643.848, 1463.8147]
2026-01-23 00:53:04,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [745.0, 228.0, 458.0, 303.0, 824.0, 262.0, 359.0, 378.0, 229.0, 437.0]
2026-01-23 00:53:04,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 5 minutes, 44 seconds)
2026-01-23 00:54:38,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:54:41,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1010.04822 ± 101.592
2026-01-23 00:54:41,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [900.4157, 902.83484, 1153.6156, 1201.6013, 1016.0929, 1007.41235, 940.60535, 1079.5575, 897.77966, 1000.5687]
2026-01-23 00:54:41,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [267.0, 270.0, 353.0, 356.0, 297.0, 302.0, 280.0, 322.0, 266.0, 298.0]
2026-01-23 00:54:41,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 4 minutes, 30 seconds)
2026-01-23 00:56:17,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:56:19,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 572.15369 ± 564.967
2026-01-23 00:56:19,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1608.042, 447.45792, 41.88337, 1463.6594, 427.20844, 49.937603, 1047.1506, 491.84067, 44.321407, 100.035164]
2026-01-23 00:56:19,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [487.0, 167.0, 32.0, 451.0, 161.0, 59.0, 333.0, 182.0, 43.0, 87.0]
2026-01-23 00:56:19,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 2 minutes, 39 seconds)
2026-01-23 00:57:54,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:57:59,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1936.97888 ± 912.938
2026-01-23 00:57:59,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [874.9284, 3103.466, 3199.7803, 905.8384, 889.1542, 3084.5623, 2091.7288, 1178.5026, 2303.57, 1738.2583]
2026-01-23 00:57:59,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [306.0, 1000.0, 1000.0, 314.0, 309.0, 1000.0, 630.0, 396.0, 737.0, 568.0]
2026-01-23 00:57:59,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 1 minute, 4 seconds)
2026-01-23 00:59:42,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:59:46,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1687.61108 ± 749.158
2026-01-23 00:59:46,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [2622.2588, 973.5624, 2362.55, 1831.4032, 1194.7876, 2864.2852, 2174.4272, 938.3592, 596.66455, 1317.8129]
2026-01-23 00:59:46,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [795.0, 299.0, 729.0, 591.0, 375.0, 900.0, 655.0, 294.0, 218.0, 384.0]
2026-01-23 00:59:46,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 65/100 (estimated time remaining: 59 minutes, 48 seconds)
2026-01-23 01:01:17,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:01:21,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1523.71777 ± 1049.185
2026-01-23 01:01:21,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [2043.6335, 1519.5428, 3119.444, 1285.8551, 1505.1066, 2330.3818, 2902.301, 484.51688, 18.727146, 27.669352]
2026-01-23 01:01:21,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [615.0, 456.0, 1000.0, 391.0, 450.0, 699.0, 888.0, 180.0, 17.0, 29.0]
2026-01-23 01:01:21,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 66/100 (estimated time remaining: 58 minutes, 3 seconds)
2026-01-23 01:02:59,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:03:03,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1416.90259 ± 911.544
2026-01-23 01:03:03,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3106.2935, 1549.3254, 1472.5844, 521.3022, 3049.2603, 1445.0479, 593.23236, 1159.457, 594.6734, 677.84863]
2026-01-23 01:03:03,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 480.0, 452.0, 198.0, 992.0, 444.0, 224.0, 399.0, 221.0, 243.0]
2026-01-23 01:03:03,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 67/100 (estimated time remaining: 56 minutes, 52 seconds)
2026-01-23 01:04:40,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:04:49,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2515.71411 ± 807.257
2026-01-23 01:04:49,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [2318.7551, 2495.8054, 1851.9187, 2430.575, 3105.619, 3098.5464, 3164.44, 3125.023, 3101.6975, 464.76056]
2026-01-23 01:04:49,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [707.0, 763.0, 573.0, 751.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 173.0]
2026-01-23 01:04:49,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1274 [INFO]: New best (2515.71) for latency DatasetOffice
2026-01-23 01:04:49,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 68/100 (estimated time remaining: 56 minutes, 3 seconds)
2026-01-23 01:06:19,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:06:23,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1296.86194 ± 1298.321
2026-01-23 01:06:23,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3117.8313, 1594.7858, 402.68524, 2971.5354, 3111.2634, 1653.3812, 19.65604, 6.6112475, 33.10197, 57.767456]
2026-01-23 01:06:23,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 530.0, 153.0, 940.0, 1000.0, 543.0, 22.0, 9.0, 28.0, 44.0]
2026-01-23 01:06:23,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 69/100 (estimated time remaining: 53 minutes, 43 seconds)
2026-01-23 01:08:05,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:08:07,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 832.22717 ± 305.710
2026-01-23 01:08:07,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1321.2394, 619.4207, 626.5258, 850.3086, 1483.5258, 923.3591, 610.3351, 603.11176, 618.78424, 665.66144]
2026-01-23 01:08:07,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [384.0, 225.0, 228.0, 304.0, 467.0, 316.0, 226.0, 223.0, 226.0, 200.0]
2026-01-23 01:08:07,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 70/100 (estimated time remaining: 51 minutes, 46 seconds)
2026-01-23 01:09:39,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:09:46,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2213.86328 ± 1077.999
2026-01-23 01:09:46,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3164.554, 1508.641, 3138.6067, 3136.9404, 811.45013, 14.122693, 3051.6926, 3192.402, 2065.055, 2055.1685]
2026-01-23 01:09:46,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 454.0, 1000.0, 1000.0, 282.0, 14.0, 1000.0, 977.0, 640.0, 640.0]
2026-01-23 01:09:46,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 71/100 (estimated time remaining: 50 minutes, 26 seconds)
2026-01-23 01:11:23,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:11:28,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1672.06384 ± 1217.771
2026-01-23 01:11:28,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3115.5562, 2642.0623, 1107.8378, 15.634136, 1506.8525, 3071.1328, 3135.017, 1800.9418, 313.99103, 11.612423]
2026-01-23 01:11:28,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 810.0, 331.0, 16.0, 486.0, 1000.0, 1000.0, 554.0, 127.0, 14.0]
2026-01-23 01:11:28,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 72/100 (estimated time remaining: 48 minutes, 48 seconds)
2026-01-23 01:13:04,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:13:09,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1976.40686 ± 661.086
2026-01-23 01:13:09,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [2316.72, 1885.6907, 2233.6125, 2841.3916, 1465.957, 1756.9952, 913.9506, 1163.5607, 3128.4001, 2057.7913]
2026-01-23 01:13:09,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [713.0, 595.0, 686.0, 901.0, 487.0, 545.0, 315.0, 396.0, 1000.0, 626.0]
2026-01-23 01:13:09,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 73/100 (estimated time remaining: 46 minutes, 45 seconds)
2026-01-23 01:14:42,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:14:50,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2669.50952 ± 593.074
2026-01-23 01:14:50,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1463.7698, 3109.8574, 2360.386, 3085.0764, 3233.4788, 3070.911, 3080.199, 1818.7426, 3068.615, 2404.0593]
2026-01-23 01:14:50,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [467.0, 1000.0, 723.0, 1000.0, 988.0, 1000.0, 1000.0, 550.0, 1000.0, 735.0]
2026-01-23 01:14:50,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1274 [INFO]: New best (2669.51) for latency DatasetOffice
2026-01-23 01:14:50,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 74/100 (estimated time remaining: 45 minutes, 36 seconds)
2026-01-23 01:16:24,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:16:31,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2368.80127 ± 862.057
2026-01-23 01:16:31,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3171.2266, 3176.3608, 1023.5565, 2865.774, 2713.636, 3226.688, 1364.239, 1718.4911, 1280.245, 3147.7944]
2026-01-23 01:16:31,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 349.0, 866.0, 820.0, 1000.0, 411.0, 518.0, 389.0, 1000.0]
2026-01-23 01:16:31,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 75/100 (estimated time remaining: 43 minutes, 38 seconds)
2026-01-23 01:18:06,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:18:15,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2628.88037 ± 1025.859
2026-01-23 01:18:15,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [88.1424, 3059.5503, 3122.6477, 3137.927, 3137.1924, 3121.7393, 1186.986, 3162.4314, 3119.6716, 3152.514]
2026-01-23 01:18:15,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [83.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 402.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:18:15,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 76/100 (estimated time remaining: 42 minutes, 28 seconds)
2026-01-23 01:19:51,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:19:56,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2035.57983 ± 1000.341
2026-01-23 01:19:56,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3124.7078, 771.12054, 9.694928, 2199.0308, 2322.7034, 1270.171, 3126.156, 2135.3892, 2283.3608, 3113.465]
2026-01-23 01:19:56,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 274.0, 12.0, 704.0, 722.0, 400.0, 1000.0, 658.0, 694.0, 957.0]
2026-01-23 01:19:56,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 77/100 (estimated time remaining: 40 minutes, 39 seconds)
2026-01-23 01:21:34,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:21:40,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1871.43457 ± 828.675
2026-01-23 01:21:40,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1237.258, 138.33907, 2685.326, 2115.3098, 3087.2163, 1984.7458, 1612.3492, 2828.2168, 1447.3575, 1578.2291]
2026-01-23 01:21:40,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [390.0, 70.0, 842.0, 663.0, 1000.0, 645.0, 510.0, 915.0, 441.0, 476.0]
2026-01-23 01:21:40,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 78/100 (estimated time remaining: 39 minutes, 6 seconds)
2026-01-23 01:23:14,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:23:22,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2863.00342 ± 636.989
2026-01-23 01:23:22,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3158.2292, 3139.0596, 3141.618, 3108.4143, 3131.185, 3115.4165, 2808.138, 2903.2007, 3142.6826, 982.0873]
2026-01-23 01:23:22,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 896.0, 924.0, 1000.0, 306.0]
2026-01-23 01:23:22,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1274 [INFO]: New best (2863.00) for latency DatasetOffice
2026-01-23 01:23:22,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 79/100 (estimated time remaining: 37 minutes, 33 seconds)
2026-01-23 01:25:00,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:25:08,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2875.61865 ± 651.498
2026-01-23 01:25:08,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3170.8042, 3163.8652, 3214.7107, 3185.9856, 3153.6753, 3212.082, 3164.7356, 1136.302, 3169.3235, 2184.703]
2026-01-23 01:25:08,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 343.0, 1000.0, 694.0]
2026-01-23 01:25:08,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1274 [INFO]: New best (2875.62) for latency DatasetOffice
2026-01-23 01:25:08,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 80/100 (estimated time remaining: 36 minutes, 13 seconds)
2026-01-23 01:26:43,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:26:46,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1181.19214 ± 1008.482
2026-01-23 01:26:46,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [2220.445, 1337.2863, 351.7489, 1833.5907, 1132.4766, 3135.4685, 1672.7407, 39.49691, 64.16815, 24.498802]
2026-01-23 01:26:46,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [667.0, 417.0, 158.0, 560.0, 381.0, 1000.0, 543.0, 51.0, 62.0, 24.0]
2026-01-23 01:26:46,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 81/100 (estimated time remaining: 34 minutes, 3 seconds)
2026-01-23 01:28:22,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:28:27,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1943.32910 ± 585.396
2026-01-23 01:28:27,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1487.2483, 2744.3162, 1641.7705, 1113.5297, 2285.451, 1370.7462, 2905.409, 1456.4795, 2066.586, 2361.754]
2026-01-23 01:28:27,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [466.0, 836.0, 500.0, 387.0, 701.0, 462.0, 904.0, 452.0, 625.0, 720.0]
2026-01-23 01:28:27,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 82/100 (estimated time remaining: 32 minutes, 19 seconds)
2026-01-23 01:30:09,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:30:15,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1863.25232 ± 1025.775
2026-01-23 01:30:15,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [633.209, 26.25053, 1635.5564, 1292.3466, 3178.6804, 2304.1511, 3125.0542, 3115.083, 1447.6923, 1874.499]
2026-01-23 01:30:15,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [231.0, 29.0, 501.0, 401.0, 1000.0, 705.0, 1000.0, 1000.0, 434.0, 561.0]
2026-01-23 01:30:15,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 83/100 (estimated time remaining: 30 minutes, 55 seconds)
2026-01-23 01:31:46,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:31:52,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1811.56763 ± 892.795
2026-01-23 01:31:52,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1186.1458, 310.05774, 2048.9448, 1895.0863, 3177.1628, 2979.9407, 1213.0579, 1704.3231, 888.9106, 2712.0466]
2026-01-23 01:31:52,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [401.0, 131.0, 629.0, 582.0, 1000.0, 943.0, 449.0, 568.0, 311.0, 822.0]
2026-01-23 01:31:52,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 84/100 (estimated time remaining: 28 minutes, 53 seconds)
2026-01-23 01:33:25,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:33:31,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2019.35449 ± 646.386
2026-01-23 01:33:31,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [2419.834, 1848.4717, 802.25494, 1879.3872, 1447.7683, 3062.376, 2632.5322, 2697.1743, 1567.6698, 1836.0756]
2026-01-23 01:33:31,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [742.0, 557.0, 245.0, 571.0, 431.0, 940.0, 800.0, 811.0, 479.0, 554.0]
2026-01-23 01:33:31,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 85/100 (estimated time remaining: 26 minutes, 49 seconds)
2026-01-23 01:35:09,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:35:15,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2151.24951 ± 865.333
2026-01-23 01:35:15,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1529.2937, 3167.7783, 1694.7434, 3142.789, 1863.1819, 3142.2405, 3155.4802, 1512.7721, 1598.5167, 705.6984]
2026-01-23 01:35:15,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [473.0, 1000.0, 528.0, 1000.0, 573.0, 1000.0, 1000.0, 473.0, 481.0, 251.0]
2026-01-23 01:35:15,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 86/100 (estimated time remaining: 25 minutes, 25 seconds)
2026-01-23 01:36:50,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:36:55,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1750.07593 ± 944.585
2026-01-23 01:36:55,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [2537.9656, 1727.6486, 1889.1896, 34.072006, 1730.1427, 2775.3794, 840.7137, 598.86975, 3125.6003, 2241.1782]
2026-01-23 01:36:55,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [822.0, 570.0, 631.0, 29.0, 570.0, 903.0, 301.0, 222.0, 1000.0, 736.0]
2026-01-23 01:36:55,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 87/100 (estimated time remaining: 23 minutes, 41 seconds)
2026-01-23 01:38:31,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:38:33,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 743.40155 ± 856.185
2026-01-23 01:38:33,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [49.863003, 846.10205, 204.22191, 234.29216, 42.387512, 32.332867, 43.779804, 1803.9406, 2260.2761, 1916.8191]
2026-01-23 01:38:33,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [59.0, 317.0, 120.0, 107.0, 35.0, 34.0, 43.0, 570.0, 694.0, 586.0]
2026-01-23 01:38:33,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 88/100 (estimated time remaining: 21 minutes, 34 seconds)
2026-01-23 01:40:12,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:40:15,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1401.28394 ± 324.760
2026-01-23 01:40:15,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1478.1521, 1273.2443, 980.42395, 1656.2097, 1357.5912, 1103.6495, 1550.6624, 2113.1748, 1499.6915, 1000.0401]
2026-01-23 01:40:15,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [442.0, 387.0, 301.0, 496.0, 407.0, 330.0, 471.0, 637.0, 448.0, 295.0]
2026-01-23 01:40:15,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 89/100 (estimated time remaining: 20 minutes, 7 seconds)
2026-01-23 01:41:51,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:41:57,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2083.00098 ± 1173.357
2026-01-23 01:41:57,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3155.8232, 2074.8533, 3138.4436, 98.29097, 3082.4502, 1825.7388, 3108.857, 435.0812, 793.80115, 3116.6694]
2026-01-23 01:41:57,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 637.0, 1000.0, 59.0, 1000.0, 567.0, 1000.0, 167.0, 264.0, 1000.0]
2026-01-23 01:41:57,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 90/100 (estimated time remaining: 18 minutes, 32 seconds)
2026-01-23 01:43:30,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:43:34,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1144.93396 ± 1382.870
2026-01-23 01:43:34,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3198.1357, 3128.1355, 3111.29, 1569.2015, 73.85018, 11.141478, 30.430185, 258.39825, 17.388906, 51.3675]
2026-01-23 01:43:34,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 510.0, 42.0, 13.0, 25.0, 167.0, 17.0, 58.0]
2026-01-23 01:43:34,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 91/100 (estimated time remaining: 16 minutes, 38 seconds)
2026-01-23 01:45:12,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:45:19,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2443.07568 ± 938.532
2026-01-23 01:45:19,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3044.9048, 14.49152, 2861.9468, 3093.1946, 3139.0854, 1594.0819, 2309.7283, 2949.027, 3116.0686, 2308.2285]
2026-01-23 01:45:19,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [927.0, 18.0, 922.0, 1000.0, 1000.0, 530.0, 714.0, 895.0, 1000.0, 707.0]
2026-01-23 01:45:19,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 92/100 (estimated time remaining: 15 minutes, 7 seconds)
2026-01-23 01:46:55,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:47:03,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2334.43286 ± 1191.174
2026-01-23 01:47:03,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3116.4595, 2334.8396, 3198.5154, 3144.871, 3125.902, 12.3139, 151.80196, 3122.6453, 1993.7955, 3143.186]
2026-01-23 01:47:03,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 704.0, 1000.0, 1000.0, 1000.0, 13.0, 85.0, 1000.0, 598.0, 1000.0]
2026-01-23 01:47:03,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 93/100 (estimated time remaining: 13 minutes, 35 seconds)
2026-01-23 01:48:40,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:48:42,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 580.76788 ± 1065.641
2026-01-23 01:48:42,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [2219.8289, 16.996262, 9.919131, 50.68644, 49.232964, 126.90714, 101.02706, 58.781082, 49.628494, 3124.6711]
2026-01-23 01:48:42,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [723.0, 18.0, 11.0, 30.0, 37.0, 73.0, 82.0, 49.0, 32.0, 1000.0]
2026-01-23 01:48:42,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 94/100 (estimated time remaining: 11 minutes, 48 seconds)
2026-01-23 01:50:13,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:50:20,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2497.39014 ± 968.003
2026-01-23 01:50:20,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3250.5664, 2174.3406, 2983.3413, 3113.8916, 3051.0393, 1664.0006, 2352.1404, 22.530031, 3163.191, 3198.8586]
2026-01-23 01:50:20,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [991.0, 656.0, 944.0, 1000.0, 929.0, 550.0, 760.0, 20.0, 984.0, 1000.0]
2026-01-23 01:50:20,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 3 seconds)
2026-01-23 01:51:56,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:52:04,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2795.01611 ± 668.402
2026-01-23 01:52:04,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3101.4148, 3153.158, 3117.4402, 3110.9355, 3124.9395, 3081.9683, 1428.6849, 3164.1409, 1490.5752, 3176.9033]
2026-01-23 01:52:04,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 435.0, 1000.0, 460.0, 1000.0]
2026-01-23 01:52:04,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 30 seconds)
2026-01-23 01:53:42,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:53:48,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1825.63062 ± 1254.158
2026-01-23 01:53:48,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3139.1956, 2105.3794, 2039.2767, 3136.866, 1820.8727, 3163.8455, 2727.9683, 49.48221, 26.557903, 46.86218]
2026-01-23 01:53:48,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 652.0, 618.0, 1000.0, 562.0, 959.0, 864.0, 34.0, 25.0, 31.0]
2026-01-23 01:53:48,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 46 seconds)
2026-01-23 01:55:21,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:55:29,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2534.22119 ± 1101.369
2026-01-23 01:55:29,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3130.519, 2737.119, 3063.847, 3117.9087, 17.494247, 3135.5498, 715.8757, 3108.3936, 3196.8977, 3118.605]
2026-01-23 01:55:29,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 833.0, 940.0, 1000.0, 17.0, 1000.0, 247.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:55:29,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 3 seconds)
2026-01-23 01:57:10,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:57:19,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2951.44116 ± 415.449
2026-01-23 01:57:19,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3121.468, 3122.499, 3116.8887, 1735.8478, 3119.2422, 3143.158, 2814.616, 3111.6294, 3126.251, 3102.8147]
2026-01-23 01:57:19,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 570.0, 1000.0, 1000.0, 867.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:57:19,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1274 [INFO]: New best (2951.44) for latency DatasetOffice
2026-01-23 01:57:19,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 26 seconds)
2026-01-23 01:58:48,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:58:53,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1831.25269 ± 1273.832
2026-01-23 01:58:53,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3119.3618, 3106.1216, 3125.6233, 1741.0742, 3113.1348, 2182.5276, 42.925053, 1765.7227, 76.42689, 39.609177]
2026-01-23 01:58:53,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 570.0, 1000.0, 712.0, 32.0, 552.0, 49.0, 55.0]
2026-01-23 01:58:53,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 42 seconds)
2026-01-23 02:00:31,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:00:39,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2549.26514 ± 1075.610
2026-01-23 02:00:39,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [14.057271, 2785.9932, 3155.5046, 869.234, 3103.7434, 3126.469, 3106.744, 3096.7969, 3122.884, 3111.2222]
2026-01-23 02:00:39,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [14.0, 862.0, 1000.0, 308.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:00:39,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1299 [DEBUG]: Training session finished
