2026-01-22 23:14:21,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-hopper/DatasetOffice-bpql-mem1  
2026-01-22 23:14:21,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-hopper/DatasetOffice-bpql-mem1  
2026-01-22 23:14:21,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x1461359b9450>}
2026-01-22 23:14:21,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1159 [DEBUG]: using device: cuda
2026-01-22 23:14:22,058 baseline-bpql-noisy-hopper:77 [WARNING]: args.assumed_delay != args.horizon: 1 != 32
2026-01-22 23:14:22,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1181 [INFO]: Creating new trainer
2026-01-22 23:14:22,075 baseline-bpql-noisy-hopper:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=14, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2026-01-22 23:14:22,075 baseline-bpql-noisy-hopper:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-22 23:14:22,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1242 [DEBUG]: Starting training session...
2026-01-22 23:14:22,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 1/100
2026-01-22 23:15:47,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:15:47,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 38.78828 ± 45.024
2026-01-22 23:15:47,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [24.688955, 173.44833, 24.566368, 26.29892, 26.790716, 26.164371, 16.107725, 26.894056, 18.206005, 24.71741]
2026-01-22 23:15:47,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [17.0, 85.0, 17.0, 18.0, 18.0, 18.0, 13.0, 18.0, 14.0, 17.0]
2026-01-22 23:15:47,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1274 [INFO]: New best (38.79) for latency DatasetOffice
2026-01-22 23:15:47,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 19 minutes, 13 seconds)
2026-01-22 23:17:19,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:17:20,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 16.47710 ± 1.062
2026-01-22 23:17:20,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [16.403538, 15.852834, 19.455717, 16.557722, 16.842665, 15.556666, 16.174541, 15.969738, 16.26859, 15.689012]
2026-01-22 23:17:20,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [49.0, 47.0, 48.0, 47.0, 47.0, 48.0, 47.0, 47.0, 47.0, 47.0]
2026-01-22 23:17:20,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 24 minutes, 43 seconds)
2026-01-22 23:18:53,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:18:55,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 305.29251 ± 62.591
2026-01-22 23:18:55,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [297.77856, 406.32486, 314.78778, 326.47482, 312.3172, 290.61142, 156.29855, 359.62643, 331.21765, 257.4881]
2026-01-22 23:18:55,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [134.0, 211.0, 159.0, 141.0, 135.0, 146.0, 79.0, 181.0, 144.0, 118.0]
2026-01-22 23:18:55,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1274 [INFO]: New best (305.29) for latency DatasetOffice
2026-01-22 23:18:55,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 26 minutes, 41 seconds)
2026-01-22 23:20:30,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:20:33,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 507.89005 ± 221.298
2026-01-22 23:20:33,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [304.74915, 476.85672, 540.52094, 641.2499, 1115.0798, 426.05478, 372.60007, 417.59082, 391.5853, 392.6129]
2026-01-22 23:20:33,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [222.0, 368.0, 362.0, 463.0, 1000.0, 249.0, 247.0, 238.0, 212.0, 209.0]
2026-01-22 23:20:33,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1274 [INFO]: New best (507.89) for latency DatasetOffice
2026-01-22 23:20:33,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 28 minutes, 9 seconds)
2026-01-22 23:22:06,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:22:09,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 445.97137 ± 109.596
2026-01-22 23:22:09,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [522.17426, 314.6641, 562.44586, 319.0185, 303.66882, 318.2991, 491.52496, 564.55536, 531.07324, 532.2898]
2026-01-22 23:22:09,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [389.0, 152.0, 432.0, 147.0, 151.0, 150.0, 363.0, 432.0, 398.0, 400.0]
2026-01-22 23:22:09,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 27 minutes, 37 seconds)
2026-01-22 23:23:41,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:23:43,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 323.26929 ± 231.152
2026-01-22 23:23:43,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [269.47818, 236.27124, 250.45705, 262.70062, 235.4655, 1015.93616, 237.93488, 239.50877, 238.65633, 246.28427]
2026-01-22 23:23:43,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [118.0, 105.0, 111.0, 116.0, 105.0, 1000.0, 103.0, 106.0, 104.0, 111.0]
2026-01-22 23:23:43,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 29 minutes, 5 seconds)
2026-01-22 23:25:19,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:25:22,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 404.29617 ± 223.269
2026-01-22 23:25:22,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1021.14374, 345.8091, 373.11484, 425.6, 224.75789, 365.8877, 407.63907, 156.70381, 276.30203, 446.00336]
2026-01-22 23:25:22,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 166.0, 190.0, 407.0, 104.0, 184.0, 378.0, 154.0, 128.0, 280.0]
2026-01-22 23:25:22,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 29 minutes, 25 seconds)
2026-01-22 23:26:52,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:26:56,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 584.72321 ± 237.794
2026-01-22 23:26:56,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [864.8878, 797.5322, 863.63434, 567.00964, 639.42706, 584.53845, 13.445102, 579.1314, 408.31348, 529.3125]
2026-01-22 23:26:56,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [674.0, 608.0, 673.0, 378.0, 449.0, 572.0, 16.0, 391.0, 214.0, 339.0]
2026-01-22 23:26:56,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1274 [INFO]: New best (584.72) for latency DatasetOffice
2026-01-22 23:26:56,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 27 minutes, 33 seconds)
2026-01-22 23:28:28,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:28:29,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 303.58539 ± 25.247
2026-01-22 23:28:29,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [278.29358, 282.51355, 311.63272, 333.16846, 276.736, 336.10614, 275.2362, 329.80493, 329.28445, 283.0778]
2026-01-22 23:28:29,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [116.0, 118.0, 142.0, 131.0, 115.0, 134.0, 115.0, 131.0, 130.0, 118.0]
2026-01-22 23:28:29,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 24 minutes, 21 seconds)
2026-01-22 23:30:01,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:30:02,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 319.58884 ± 38.990
2026-01-22 23:30:02,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [382.2407, 364.91245, 334.47665, 298.08704, 334.00845, 290.44888, 267.31033, 254.25099, 335.3004, 334.85272]
2026-01-22 23:30:02,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [155.0, 139.0, 130.0, 122.0, 130.0, 118.0, 112.0, 113.0, 130.0, 130.0]
2026-01-22 23:30:02,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 22 minutes, 5 seconds)
2026-01-22 23:31:36,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:31:37,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 341.94208 ± 9.681
2026-01-22 23:31:37,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [334.65054, 346.0003, 333.97748, 326.62488, 340.1755, 352.50266, 362.20123, 339.20282, 337.7445, 346.34094]
2026-01-22 23:31:37,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [137.0, 139.0, 136.0, 137.0, 138.0, 141.0, 146.0, 140.0, 137.0, 141.0]
2026-01-22 23:31:37,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 20 minutes, 48 seconds)
2026-01-22 23:33:09,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:33:10,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 332.06744 ± 55.530
2026-01-22 23:33:10,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [329.8883, 327.35355, 346.45816, 330.22202, 413.2757, 353.5084, 356.53214, 181.78151, 350.98874, 330.6661]
2026-01-22 23:33:10,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [133.0, 132.0, 139.0, 133.0, 156.0, 140.0, 141.0, 85.0, 156.0, 133.0]
2026-01-22 23:33:10,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 17 minutes, 21 seconds)
2026-01-22 23:34:43,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:34:44,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 363.73798 ± 7.348
2026-01-22 23:34:44,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [361.4974, 362.5305, 363.39917, 355.945, 384.34387, 365.194, 360.40207, 364.6673, 360.51334, 358.8871]
2026-01-22 23:34:44,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [143.0, 143.0, 142.0, 139.0, 159.0, 144.0, 142.0, 146.0, 143.0, 142.0]
2026-01-22 23:34:44,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 15 minutes, 51 seconds)
2026-01-22 23:36:17,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:36:18,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 359.60748 ± 5.677
2026-01-22 23:36:18,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [371.62128, 365.87836, 358.58252, 356.4438, 357.4717, 359.21478, 359.59082, 348.8351, 361.02417, 357.41263]
2026-01-22 23:36:18,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [140.0, 142.0, 137.0, 137.0, 138.0, 139.0, 139.0, 138.0, 138.0, 137.0]
2026-01-22 23:36:18,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 14 minutes, 28 seconds)
2026-01-22 23:37:51,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:37:52,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 373.14404 ± 3.520
2026-01-22 23:37:52,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [373.95807, 374.46912, 373.0841, 376.56467, 365.61868, 373.60602, 375.83337, 367.85074, 377.3452, 373.11026]
2026-01-22 23:37:52,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [145.0, 146.0, 144.0, 146.0, 142.0, 145.0, 148.0, 144.0, 146.0, 144.0]
2026-01-22 23:37:52,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 13 minutes, 9 seconds)
2026-01-22 23:39:25,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:39:26,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 356.34082 ± 16.944
2026-01-22 23:39:26,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [336.8238, 365.1022, 368.56915, 351.97116, 359.8241, 363.8856, 365.39694, 314.23126, 366.5791, 371.02487]
2026-01-22 23:39:26,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [133.0, 140.0, 139.0, 146.0, 138.0, 139.0, 139.0, 123.0, 141.0, 142.0]
2026-01-22 23:39:26,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 11 minutes, 10 seconds)
2026-01-22 23:40:58,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:40:59,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 360.33942 ± 4.483
2026-01-22 23:40:59,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [361.45398, 364.46408, 363.2357, 355.3017, 351.37753, 356.77158, 367.18976, 359.6722, 363.23276, 360.69537]
2026-01-22 23:40:59,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [139.0, 139.0, 139.0, 137.0, 135.0, 138.0, 140.0, 138.0, 140.0, 139.0]
2026-01-22 23:40:59,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 9 minutes, 50 seconds)
2026-01-22 23:42:33,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:42:34,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 365.83978 ± 6.031
2026-01-22 23:42:34,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [358.0342, 363.34064, 371.76047, 367.11816, 373.01575, 362.28296, 354.42944, 370.22852, 372.63278, 365.5546]
2026-01-22 23:42:34,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [136.0, 137.0, 140.0, 139.0, 139.0, 138.0, 138.0, 141.0, 141.0, 139.0]
2026-01-22 23:42:34,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 8 minutes, 26 seconds)
2026-01-22 23:44:07,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:44:08,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 404.76846 ± 8.389
2026-01-22 23:44:08,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [414.03476, 390.57257, 407.02472, 397.1582, 405.8236, 413.69318, 409.3584, 406.34573, 391.07053, 412.60278]
2026-01-22 23:44:08,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [152.0, 147.0, 152.0, 150.0, 152.0, 155.0, 154.0, 154.0, 150.0, 155.0]
2026-01-22 23:44:08,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 6 minutes, 56 seconds)
2026-01-22 23:45:40,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:45:41,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 410.44101 ± 12.389
2026-01-22 23:45:41,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [404.27512, 408.44418, 413.04715, 405.373, 403.56567, 414.7722, 394.60815, 404.79272, 411.77277, 443.75916]
2026-01-22 23:45:41,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [154.0, 152.0, 156.0, 153.0, 152.0, 159.0, 155.0, 151.0, 154.0, 164.0]
2026-01-22 23:45:41,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 5 minutes, 3 seconds)
2026-01-22 23:47:14,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:47:16,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 408.12921 ± 12.985
2026-01-22 23:47:16,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [399.16568, 400.81833, 428.36758, 417.67422, 406.2062, 402.6377, 406.48773, 432.1014, 389.64395, 398.18884]
2026-01-22 23:47:16,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [147.0, 148.0, 154.0, 153.0, 150.0, 149.0, 153.0, 155.0, 146.0, 149.0]
2026-01-22 23:47:16,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 3 minutes, 42 seconds)
2026-01-22 23:48:49,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:48:50,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 363.08475 ± 9.045
2026-01-22 23:48:50,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [361.30823, 380.9815, 358.04416, 359.97745, 359.55408, 358.45486, 379.6484, 351.22256, 360.32617, 361.33017]
2026-01-22 23:48:50,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [149.0, 145.0, 148.0, 150.0, 151.0, 150.0, 150.0, 147.0, 149.0, 148.0]
2026-01-22 23:48:50,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 2 minutes, 22 seconds)
2026-01-22 23:50:23,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:50:24,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 470.51324 ± 74.934
2026-01-22 23:50:24,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [455.13388, 449.16672, 414.24368, 454.586, 436.91586, 447.27878, 450.93204, 468.20444, 691.56757, 437.1034]
2026-01-22 23:50:24,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [161.0, 160.0, 153.0, 161.0, 158.0, 162.0, 160.0, 165.0, 223.0, 156.0]
2026-01-22 23:50:24,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 34 seconds)
2026-01-22 23:51:56,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:51:57,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 384.12311 ± 149.124
2026-01-22 23:51:57,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [349.39, 635.8159, 426.07288, 416.32233, 239.01535, 35.19961, 459.99792, 405.03973, 423.10693, 451.27017]
2026-01-22 23:51:57,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [137.0, 231.0, 156.0, 153.0, 103.0, 25.0, 174.0, 149.0, 152.0, 163.0]
2026-01-22 23:51:57,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 25/100 (estimated time remaining: 1 hour, 58 minutes, 47 seconds)
2026-01-22 23:53:31,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:53:32,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 634.93750 ± 37.408
2026-01-22 23:53:32,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [642.21344, 659.2328, 541.5488, 603.2218, 636.16174, 656.8563, 665.3033, 681.16394, 626.2315, 637.4413]
2026-01-22 23:53:32,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [203.0, 207.0, 179.0, 193.0, 202.0, 207.0, 209.0, 214.0, 199.0, 202.0]
2026-01-22 23:53:32,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1274 [INFO]: New best (634.94) for latency DatasetOffice
2026-01-22 23:53:32,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 26/100 (estimated time remaining: 1 hour, 57 minutes, 48 seconds)
2026-01-22 23:55:05,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:55:08,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 832.30402 ± 380.094
2026-01-22 23:55:08,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1170.494, 126.091934, 1200.4625, 1150.9003, 909.7552, 610.47565, 796.623, 1146.266, 1016.5186, 195.45355]
2026-01-22 23:55:08,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [364.0, 67.0, 394.0, 355.0, 295.0, 236.0, 311.0, 363.0, 328.0, 95.0]
2026-01-22 23:55:08,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1274 [INFO]: New best (832.30) for latency DatasetOffice
2026-01-22 23:55:08,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 27/100 (estimated time remaining: 1 hour, 56 minutes, 30 seconds)
2026-01-22 23:56:40,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:56:42,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 706.38184 ± 112.950
2026-01-22 23:56:42,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [762.8342, 406.28134, 666.56525, 792.3489, 770.36847, 765.48926, 747.9489, 734.7376, 621.3744, 795.8695]
2026-01-22 23:56:42,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [237.0, 166.0, 215.0, 246.0, 240.0, 239.0, 231.0, 232.0, 204.0, 245.0]
2026-01-22 23:56:42,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 28/100 (estimated time remaining: 1 hour, 54 minutes, 57 seconds)
2026-01-22 23:58:16,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:58:19,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1031.36414 ± 402.878
2026-01-22 23:58:19,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1737.5123, 1315.6211, 690.4146, 966.6481, 548.3272, 731.04755, 781.4899, 1106.5431, 1698.6122, 737.42566]
2026-01-22 23:58:19,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [595.0, 493.0, 261.0, 359.0, 231.0, 280.0, 291.0, 390.0, 607.0, 277.0]
2026-01-22 23:58:19,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1274 [INFO]: New best (1031.36) for latency DatasetOffice
2026-01-22 23:58:19,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 29/100 (estimated time remaining: 1 hour, 54 minutes, 1 second)
2026-01-22 23:59:53,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:59:57,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1361.42798 ± 682.786
2026-01-22 23:59:57,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1369.5808, 1323.0168, 1390.9529, 1272.2491, 764.05835, 399.80716, 1529.2048, 1182.9486, 3174.31, 1208.1503]
2026-01-22 23:59:57,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [437.0, 423.0, 441.0, 444.0, 272.0, 166.0, 489.0, 391.0, 994.0, 388.0]
2026-01-22 23:59:57,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1274 [INFO]: New best (1361.43) for latency DatasetOffice
2026-01-22 23:59:57,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 30/100 (estimated time remaining: 1 hour, 53 minutes, 42 seconds)
2026-01-23 00:01:38,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:01:47,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2835.71802 ± 24.170
2026-01-23 00:01:47,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [2837.213, 2801.256, 2827.4214, 2854.1758, 2813.5117, 2816.7876, 2879.404, 2812.6414, 2859.7673, 2855.001]
2026-01-23 00:01:47,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:01:47,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1274 [INFO]: New best (2835.72) for latency DatasetOffice
2026-01-23 00:01:47,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 55 minutes, 30 seconds)
2026-01-23 00:03:16,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:03:23,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1933.73560 ± 1068.179
2026-01-23 00:03:23,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [2595.1143, 2636.4287, 2610.6711, 2659.0967, 169.06029, 32.57466, 772.01746, 2609.6436, 2621.3767, 2631.3738]
2026-01-23 00:03:23,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 84.0, 29.0, 315.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:03:23,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 53 minutes, 55 seconds)
2026-01-23 00:04:55,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:04:59,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1333.87476 ± 1112.033
2026-01-23 00:04:59,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1557.867, 31.453281, 798.75146, 61.81769, 516.3391, 143.4206, 2772.7588, 1863.7559, 2772.0042, 2820.58]
2026-01-23 00:04:59,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [571.0, 36.0, 335.0, 41.0, 182.0, 136.0, 1000.0, 679.0, 1000.0, 997.0]
2026-01-23 00:04:59,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 52 minutes, 40 seconds)
2026-01-23 00:06:36,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:06:42,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1809.87341 ± 742.638
2026-01-23 00:06:42,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [2629.2566, 1811.2126, 1609.3784, 2422.061, 1828.707, 938.8212, 682.691, 2625.4153, 845.0961, 2706.0967]
2026-01-23 00:06:42,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 624.0, 613.0, 920.0, 652.0, 359.0, 261.0, 1000.0, 348.0, 1000.0]
2026-01-23 00:06:42,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 52 minutes, 21 seconds)
2026-01-23 00:08:15,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:08:19,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1403.94800 ± 889.111
2026-01-23 00:08:19,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1606.2804, 1729.5488, 794.838, 694.3438, 85.13206, 672.963, 1652.5117, 2919.88, 2856.1152, 1027.867]
2026-01-23 00:08:19,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [484.0, 561.0, 260.0, 220.0, 48.0, 238.0, 499.0, 1000.0, 1000.0, 369.0]
2026-01-23 00:08:19,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 50 minutes, 18 seconds)
2026-01-23 00:09:58,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:09:59,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 164.55864 ± 371.684
2026-01-23 00:09:59,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1278.562, 35.035908, 24.539904, 54.79536, 76.13537, 51.048466, 30.444244, 15.501887, 38.435104, 41.088238]
2026-01-23 00:09:59,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [439.0, 35.0, 25.0, 73.0, 52.0, 35.0, 24.0, 18.0, 41.0, 42.0]
2026-01-23 00:09:59,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 46 minutes, 26 seconds)
2026-01-23 00:11:33,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:11:37,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1104.96228 ± 450.943
2026-01-23 00:11:37,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [907.0129, 823.27167, 1781.4438, 1281.0607, 884.078, 770.41223, 860.1517, 814.55536, 2121.0037, 806.633]
2026-01-23 00:11:37,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [276.0, 249.0, 594.0, 483.0, 268.0, 238.0, 261.0, 246.0, 723.0, 246.0]
2026-01-23 00:11:37,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 45 minutes, 15 seconds)
2026-01-23 00:13:05,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:13:10,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1564.33008 ± 491.782
2026-01-23 00:13:10,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1415.6562, 1429.9863, 1094.1599, 1256.0518, 1279.3251, 2815.8613, 1418.2811, 1312.7825, 1487.4971, 2133.699]
2026-01-23 00:13:10,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [433.0, 438.0, 339.0, 383.0, 391.0, 868.0, 443.0, 399.0, 453.0, 645.0]
2026-01-23 00:13:10,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 42 minutes, 57 seconds)
2026-01-23 00:14:43,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:14:46,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 959.99255 ± 445.128
2026-01-23 00:14:46,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [980.2921, 1183.003, 7.9441366, 916.91565, 1027.3365, 390.15762, 1372.5663, 1158.949, 1653.7832, 908.97845]
2026-01-23 00:14:46,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [302.0, 364.0, 9.0, 296.0, 311.0, 140.0, 421.0, 357.0, 505.0, 273.0]
2026-01-23 00:14:46,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 39 minutes, 56 seconds)
2026-01-23 00:16:20,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:16:22,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 778.70984 ± 599.076
2026-01-23 00:16:22,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1844.1648, 1565.5372, 959.49725, 40.03528, 1114.7357, 710.27466, 41.654938, 936.8942, 78.3672, 495.93726]
2026-01-23 00:16:22,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [552.0, 485.0, 327.0, 35.0, 348.0, 255.0, 50.0, 328.0, 45.0, 165.0]
2026-01-23 00:16:22,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 38 minutes, 16 seconds)
2026-01-23 00:18:02,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:18:09,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2167.18823 ± 786.495
2026-01-23 00:18:09,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1411.2094, 3025.9878, 1242.8302, 1929.3105, 2976.112, 1819.0964, 836.6123, 2988.6572, 2977.3684, 2464.698]
2026-01-23 00:18:09,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [450.0, 1000.0, 401.0, 595.0, 1000.0, 619.0, 307.0, 1000.0, 1000.0, 836.0]
2026-01-23 00:18:09,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 38 minutes, 5 seconds)
2026-01-23 00:19:39,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:19:45,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1879.23767 ± 892.965
2026-01-23 00:19:45,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [2849.3066, 2602.0452, 2798.56, 1785.6523, 2392.7905, 1110.6016, 842.1674, 1265.0779, 2830.4187, 315.75623]
2026-01-23 00:19:45,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 922.0, 1000.0, 568.0, 854.0, 410.0, 261.0, 457.0, 1000.0, 138.0]
2026-01-23 00:19:45,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 36 minutes)
2026-01-23 00:21:21,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:21:23,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 702.29553 ± 489.640
2026-01-23 00:21:23,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1014.72015, 1069.4407, 1099.2205, 938.2459, 1226.8278, 1197.7773, 349.84894, 33.432835, 41.15758, 52.28349]
2026-01-23 00:21:23,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [323.0, 330.0, 337.0, 292.0, 375.0, 370.0, 144.0, 27.0, 29.0, 48.0]
2026-01-23 00:21:23,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 35 minutes, 18 seconds)
2026-01-23 00:22:54,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:22:59,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1902.59204 ± 1038.806
2026-01-23 00:22:59,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1938.6257, 3013.88, 877.12366, 1471.3848, 1262.469, 3043.238, 3120.137, 1290.5732, 15.583204, 2992.9058]
2026-01-23 00:22:59,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [599.0, 1000.0, 282.0, 463.0, 402.0, 1000.0, 1000.0, 437.0, 18.0, 1000.0]
2026-01-23 00:22:59,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 33 minutes, 48 seconds)
2026-01-23 00:24:36,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:24:41,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1689.01392 ± 676.619
2026-01-23 00:24:41,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3142.6624, 731.48224, 1994.028, 1359.1482, 1227.3416, 1450.0856, 2548.302, 1390.1923, 1834.5837, 1212.3134]
2026-01-23 00:24:41,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 252.0, 615.0, 414.0, 375.0, 446.0, 779.0, 424.0, 554.0, 372.0]
2026-01-23 00:24:41,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 33 minutes, 9 seconds)
2026-01-23 00:26:12,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:26:16,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1234.63586 ± 648.223
2026-01-23 00:26:16,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1191.3607, 1161.5548, 2286.7407, 1185.1825, 915.6139, 658.9975, 1198.4175, 294.62885, 926.12695, 2527.7354]
2026-01-23 00:26:16,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [365.0, 355.0, 692.0, 363.0, 288.0, 207.0, 368.0, 124.0, 308.0, 784.0]
2026-01-23 00:26:16,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 29 minutes, 12 seconds)
2026-01-23 00:27:48,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:27:50,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 970.52081 ± 302.661
2026-01-23 00:27:50,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [998.733, 1030.088, 690.3015, 1406.6886, 1094.1979, 1067.8295, 1204.0576, 888.33606, 1090.9174, 234.05864]
2026-01-23 00:27:50,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [304.0, 312.0, 220.0, 425.0, 329.0, 320.0, 369.0, 269.0, 327.0, 103.0]
2026-01-23 00:27:50,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 27 minutes, 21 seconds)
2026-01-23 00:29:25,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:29:30,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1612.16821 ± 911.173
2026-01-23 00:29:30,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3112.2698, 2046.5571, 1325.3507, 1764.1205, 247.589, 786.13806, 1812.1826, 1172.3295, 748.3361, 3106.8074]
2026-01-23 00:29:30,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 624.0, 442.0, 571.0, 109.0, 254.0, 593.0, 399.0, 267.0, 1000.0]
2026-01-23 00:29:30,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 26 minutes, 6 seconds)
2026-01-23 00:31:02,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:31:08,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1887.73499 ± 1098.279
2026-01-23 00:31:08,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [946.63727, 3013.6917, 3003.1123, 1667.2712, 366.92026, 626.4941, 3028.408, 2979.5383, 2642.13, 603.1457]
2026-01-23 00:31:08,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [301.0, 1000.0, 1000.0, 535.0, 155.0, 231.0, 1000.0, 1000.0, 826.0, 210.0]
2026-01-23 00:31:08,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 24 minutes, 38 seconds)
2026-01-23 00:32:44,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:32:45,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 434.46011 ± 537.230
2026-01-23 00:32:45,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1497.5802, 295.2416, 32.212128, 71.48283, 45.160286, 40.68711, 194.41417, 41.36674, 804.82886, 1321.6273]
2026-01-23 00:32:45,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [462.0, 129.0, 29.0, 41.0, 44.0, 40.0, 102.0, 43.0, 286.0, 401.0]
2026-01-23 00:32:45,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 22 minutes, 15 seconds)
2026-01-23 00:34:18,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:34:22,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1298.71704 ± 855.746
2026-01-23 00:34:22,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1447.6434, 2787.3376, 704.5928, 1766.7855, 273.94867, 2245.7263, 1455.9476, 1801.9984, 469.37054, 33.819683]
2026-01-23 00:34:22,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [445.0, 861.0, 243.0, 543.0, 118.0, 688.0, 447.0, 543.0, 180.0, 30.0]
2026-01-23 00:34:22,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 20 minutes, 59 seconds)
2026-01-23 00:35:56,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:36:05,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2731.91968 ± 680.012
2026-01-23 00:36:05,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3022.3662, 3030.4863, 2970.9094, 3040.528, 744.2805, 2996.276, 3024.374, 2999.7341, 2500.0642, 2990.1785]
2026-01-23 00:36:05,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 269.0, 1000.0, 1000.0, 1000.0, 833.0, 1000.0]
2026-01-23 00:36:05,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 20 minutes, 47 seconds)
2026-01-23 00:37:38,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:37:40,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 823.39520 ± 1065.902
2026-01-23 00:37:40,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [31.710093, 44.23522, 3003.049, 40.200787, 1501.3556, 2419.251, 999.01666, 49.69854, 57.34094, 88.09439]
2026-01-23 00:37:40,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [29.0, 31.0, 1000.0, 29.0, 467.0, 736.0, 346.0, 64.0, 43.0, 52.0]
2026-01-23 00:37:40,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 18 minutes, 26 seconds)
2026-01-23 00:39:16,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:39:20,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1455.94751 ± 549.072
2026-01-23 00:39:20,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [2649.3218, 1189.736, 1853.7488, 1695.3973, 1261.5902, 1422.3127, 1700.9083, 1425.6879, 706.77124, 654.00085]
2026-01-23 00:39:20,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [808.0, 368.0, 563.0, 517.0, 386.0, 434.0, 521.0, 431.0, 244.0, 207.0]
2026-01-23 00:39:20,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 17 minutes, 6 seconds)
2026-01-23 00:40:55,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:41:00,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1474.77271 ± 806.163
2026-01-23 00:41:00,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1956.5167, 32.3268, 1212.0979, 1667.0607, 1467.1162, 6.796058, 2488.9246, 1914.6025, 2274.2668, 1728.0199]
2026-01-23 00:41:00,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [592.0, 25.0, 365.0, 505.0, 474.0, 8.0, 783.0, 580.0, 691.0, 526.0]
2026-01-23 00:41:00,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 15 minutes, 49 seconds)
2026-01-23 00:42:28,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:42:34,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1781.62341 ± 894.105
2026-01-23 00:42:34,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [779.035, 1664.1554, 2797.2224, 1436.6084, 2819.0706, 386.99103, 1569.6072, 981.3165, 3172.6204, 2209.6064]
2026-01-23 00:42:34,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [240.0, 510.0, 846.0, 439.0, 859.0, 149.0, 495.0, 321.0, 1000.0, 667.0]
2026-01-23 00:42:34,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 13 minutes, 46 seconds)
2026-01-23 00:44:07,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:44:09,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 992.17316 ± 282.499
2026-01-23 00:44:09,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [923.7429, 890.305, 842.734, 904.78, 1709.6847, 674.6655, 900.8945, 871.40625, 889.09045, 1314.4281]
2026-01-23 00:44:09,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [281.0, 270.0, 253.0, 274.0, 523.0, 230.0, 273.0, 262.0, 265.0, 400.0]
2026-01-23 00:44:09,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 11 minutes, 3 seconds)
2026-01-23 00:45:44,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:45:53,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2785.88721 ± 626.687
2026-01-23 00:45:53,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1588.9098, 3095.895, 3093.602, 3105.8987, 1478.1238, 3095.18, 3099.1855, 3097.5017, 3098.7837, 3105.7927]
2026-01-23 00:45:53,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [498.0, 1000.0, 1000.0, 1000.0, 464.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:45:53,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 10 minutes, 35 seconds)
2026-01-23 00:47:23,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:47:30,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2423.04224 ± 742.488
2026-01-23 00:47:30,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3084.9053, 3110.4685, 1135.594, 3079.5938, 1510.1581, 3073.3809, 2517.528, 1871.226, 3110.7925, 1736.7744]
2026-01-23 00:47:30,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 347.0, 1000.0, 474.0, 1000.0, 815.0, 641.0, 1000.0, 523.0]
2026-01-23 00:47:30,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 8 minutes, 36 seconds)
2026-01-23 00:49:10,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:49:13,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 822.64124 ± 947.198
2026-01-23 00:49:13,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1780.5521, 246.08798, 60.634354, 1746.375, 48.464, 949.7311, 438.63037, 33.398224, 31.87702, 2890.6616]
2026-01-23 00:49:13,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [554.0, 113.0, 42.0, 598.0, 66.0, 331.0, 183.0, 37.0, 38.0, 1000.0]
2026-01-23 00:49:13,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 7 minutes, 27 seconds)
2026-01-23 00:50:39,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:50:47,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2448.67236 ± 657.574
2026-01-23 00:50:47,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3076.87, 3027.1338, 3039.9656, 3063.165, 1420.3269, 3054.4849, 1921.6711, 2077.309, 2367.4805, 1438.3179]
2026-01-23 00:50:47,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 480.0, 1000.0, 614.0, 691.0, 799.0, 482.0]
2026-01-23 00:50:47,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 5 minutes, 46 seconds)
2026-01-23 00:52:23,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:52:29,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2022.83960 ± 1032.432
2026-01-23 00:52:29,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1032.0935, 3048.3606, 3079.044, 2034.9132, 735.055, 250.88692, 3100.1658, 3113.459, 2329.2532, 1505.1635]
2026-01-23 00:52:29,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [352.0, 1000.0, 1000.0, 635.0, 252.0, 115.0, 1000.0, 1000.0, 770.0, 498.0]
2026-01-23 00:52:29,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 4 minutes, 58 seconds)
2026-01-23 00:53:58,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:54:00,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 823.63428 ± 1004.598
2026-01-23 00:54:00,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1918.2166, 1608.9271, 202.71117, 19.830107, 27.549364, 65.85691, 170.01797, 128.06767, 1015.44574, 3079.7202]
2026-01-23 00:54:00,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [600.0, 502.0, 92.0, 19.0, 26.0, 71.0, 85.0, 65.0, 370.0, 1000.0]
2026-01-23 00:54:00,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 1 minute, 42 seconds)
2026-01-23 00:55:33,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:55:36,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1238.65344 ± 491.141
2026-01-23 00:55:36,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [2504.6372, 1470.4655, 1645.7161, 1028.0387, 910.4768, 997.61005, 964.44604, 1040.2947, 750.41205, 1074.4366]
2026-01-23 00:55:36,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [766.0, 454.0, 496.0, 307.0, 273.0, 301.0, 296.0, 314.0, 236.0, 323.0]
2026-01-23 00:55:36,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 64/100 (estimated time remaining: 59 minutes, 55 seconds)
2026-01-23 00:57:08,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:57:11,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1161.80481 ± 224.089
2026-01-23 00:57:11,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1171.8296, 1137.399, 1187.8983, 574.5535, 1316.5371, 1447.4291, 1354.1292, 1189.6854, 1052.1566, 1186.4303]
2026-01-23 00:57:11,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [355.0, 332.0, 348.0, 184.0, 388.0, 468.0, 424.0, 361.0, 312.0, 344.0]
2026-01-23 00:57:11,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 65/100 (estimated time remaining: 57 minutes, 21 seconds)
2026-01-23 00:58:42,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:58:46,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1626.53845 ± 938.161
2026-01-23 00:58:46,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [952.4966, 1071.2361, 216.3661, 1951.2096, 3093.606, 1749.0262, 3062.8176, 1726.3552, 2050.5674, 391.70364]
2026-01-23 00:58:46,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [318.0, 367.0, 101.0, 604.0, 1000.0, 539.0, 1000.0, 576.0, 665.0, 166.0]
2026-01-23 00:58:46,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 66/100 (estimated time remaining: 55 minutes, 56 seconds)
2026-01-23 01:00:18,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:00:26,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2580.57275 ± 863.157
2026-01-23 01:00:26,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3081.4658, 2639.4175, 3118.6562, 1383.8668, 3084.5789, 3088.7378, 3102.893, 2731.886, 3092.3503, 481.87772]
2026-01-23 01:00:26,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 863.0, 1000.0, 468.0, 1000.0, 1000.0, 1000.0, 891.0, 1000.0, 185.0]
2026-01-23 01:00:26,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 67/100 (estimated time remaining: 54 minutes, 4 seconds)
2026-01-23 01:01:59,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:02:05,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2140.59131 ± 840.591
2026-01-23 01:02:05,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3012.3523, 1217.3993, 1691.8716, 3104.9385, 2252.9595, 1416.7832, 3067.643, 3227.542, 1171.4087, 1243.0183]
2026-01-23 01:02:05,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [980.0, 387.0, 555.0, 1000.0, 689.0, 425.0, 1000.0, 1000.0, 350.0, 376.0]
2026-01-23 01:02:05,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 68/100 (estimated time remaining: 53 minutes, 19 seconds)
2026-01-23 01:03:36,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:03:43,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2229.02393 ± 1383.958
2026-01-23 01:03:43,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [23.023985, 3114.7402, 3124.3467, 3151.0894, 301.29953, 29.27102, 3155.6667, 3126.1223, 3142.7378, 3121.941]
2026-01-23 01:03:43,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [29.0, 1000.0, 1000.0, 1000.0, 126.0, 31.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:03:43,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 69/100 (estimated time remaining: 51 minutes, 55 seconds)
2026-01-23 01:05:15,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:05:21,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1895.59644 ± 1213.700
2026-01-23 01:05:21,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [954.8465, 1140.6106, 789.3059, 3135.8582, 3132.6936, 3140.6523, 682.9264, 3141.914, 12.524071, 2824.6345]
2026-01-23 01:05:21,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [289.0, 340.0, 274.0, 1000.0, 1000.0, 1000.0, 214.0, 1000.0, 13.0, 895.0]
2026-01-23 01:05:21,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 70/100 (estimated time remaining: 50 minutes, 35 seconds)
2026-01-23 01:06:52,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:06:54,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 757.18005 ± 39.309
2026-01-23 01:06:54,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [733.4483, 734.3398, 862.93054, 756.76965, 741.7711, 790.7754, 752.807, 737.755, 728.57886, 732.62445]
2026-01-23 01:06:54,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [232.0, 230.0, 256.0, 236.0, 232.0, 243.0, 234.0, 228.0, 227.0, 226.0]
2026-01-23 01:06:54,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 71/100 (estimated time remaining: 48 minutes, 44 seconds)
2026-01-23 01:08:25,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:08:27,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 918.50275 ± 45.182
2026-01-23 01:08:27,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [910.74054, 890.72394, 871.4256, 981.7749, 934.7044, 906.4997, 1015.3772, 884.32404, 869.50305, 919.954]
2026-01-23 01:08:27,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [270.0, 267.0, 261.0, 304.0, 280.0, 270.0, 312.0, 264.0, 259.0, 277.0]
2026-01-23 01:08:27,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 72/100 (estimated time remaining: 46 minutes, 27 seconds)
2026-01-23 01:10:00,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:10:06,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1816.01636 ± 1228.539
2026-01-23 01:10:06,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3097.1238, 3122.0034, 3129.8599, 753.4156, 3075.222, 2274.8264, 75.26367, 1730.2217, 44.0905, 858.13776]
2026-01-23 01:10:06,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 267.0, 1000.0, 730.0, 46.0, 580.0, 43.0, 328.0]
2026-01-23 01:10:06,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 73/100 (estimated time remaining: 44 minutes, 52 seconds)
2026-01-23 01:11:35,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:11:38,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 879.88037 ± 372.886
2026-01-23 01:11:38,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [57.585354, 1696.3701, 952.95996, 892.33765, 821.62836, 931.1364, 753.4667, 934.298, 972.13763, 786.88385]
2026-01-23 01:11:38,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [33.0, 518.0, 283.0, 264.0, 250.0, 277.0, 232.0, 278.0, 291.0, 242.0]
2026-01-23 01:11:38,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 74/100 (estimated time remaining: 42 minutes, 45 seconds)
2026-01-23 01:13:11,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:13:19,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2733.63867 ± 708.041
2026-01-23 01:13:19,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3078.5266, 1990.3097, 3079.7593, 3083.1118, 3083.4, 842.02405, 3103.0557, 2964.8062, 3024.811, 3086.5835]
2026-01-23 01:13:19,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 605.0, 1000.0, 1000.0, 1000.0, 299.0, 1000.0, 969.0, 1000.0, 1000.0]
2026-01-23 01:13:19,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 75/100 (estimated time remaining: 41 minutes, 29 seconds)
2026-01-23 01:14:50,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:14:52,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 969.41974 ± 46.999
2026-01-23 01:14:52,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1046.8654, 947.50226, 954.5146, 1015.1457, 911.9483, 992.61804, 980.9731, 991.96533, 875.82416, 976.8407]
2026-01-23 01:14:52,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [309.0, 281.0, 282.0, 301.0, 271.0, 299.0, 290.0, 297.0, 262.0, 292.0]
2026-01-23 01:14:52,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 76/100 (estimated time remaining: 39 minutes, 52 seconds)
2026-01-23 01:16:30,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:16:33,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1100.76697 ± 732.514
2026-01-23 01:16:33,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [592.6664, 892.29663, 995.1719, 3153.053, 845.87897, 990.9625, 902.03644, 1472.8047, 428.60135, 734.19836]
2026-01-23 01:16:33,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [205.0, 265.0, 298.0, 1000.0, 301.0, 298.0, 268.0, 438.0, 157.0, 245.0]
2026-01-23 01:16:33,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 77/100 (estimated time remaining: 38 minutes, 55 seconds)
2026-01-23 01:18:06,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:18:09,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1150.56787 ± 253.234
2026-01-23 01:18:09,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1188.9816, 996.9477, 1844.3613, 1140.1256, 995.25006, 994.54224, 1025.9989, 910.2941, 1264.5463, 1144.632]
2026-01-23 01:18:09,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [346.0, 303.0, 551.0, 335.0, 300.0, 300.0, 305.0, 280.0, 377.0, 333.0]
2026-01-23 01:18:09,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 78/100 (estimated time remaining: 37 minutes, 2 seconds)
2026-01-23 01:19:40,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:19:44,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1675.03247 ± 1038.227
2026-01-23 01:19:44,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [926.613, 3138.7888, 1027.236, 1002.91895, 1216.5874, 947.04376, 3163.6646, 253.3408, 1928.1765, 3145.9548]
2026-01-23 01:19:44,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [277.0, 1000.0, 303.0, 306.0, 352.0, 282.0, 1000.0, 108.0, 640.0, 1000.0]
2026-01-23 01:19:44,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 79/100 (estimated time remaining: 35 minutes, 41 seconds)
2026-01-23 01:21:12,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:21:16,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1415.27930 ± 1168.898
2026-01-23 01:21:16,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [2434.3982, 3088.059, 2336.1807, 1844.9183, 49.759712, 999.7982, 419.8378, 70.27313, 48.825115, 2860.7422]
2026-01-23 01:21:16,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [788.0, 1000.0, 741.0, 640.0, 56.0, 352.0, 177.0, 72.0, 53.0, 898.0]
2026-01-23 01:21:16,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 80/100 (estimated time remaining: 33 minutes, 22 seconds)
2026-01-23 01:22:47,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:22:55,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2673.29102 ± 537.048
2026-01-23 01:22:55,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3155.004, 3128.232, 3146.2288, 2465.169, 3092.2812, 1946.1492, 2873.5706, 1794.9823, 2000.3157, 3130.9775]
2026-01-23 01:22:55,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 797.0, 1000.0, 634.0, 921.0, 586.0, 659.0, 1000.0]
2026-01-23 01:22:55,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 81/100 (estimated time remaining: 32 minutes, 11 seconds)
2026-01-23 01:24:34,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:24:40,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2014.33337 ± 1091.607
2026-01-23 01:24:40,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3164.6406, 2191.409, 394.9926, 261.37015, 674.8577, 2422.941, 3157.244, 2547.533, 2179.3735, 3148.9734]
2026-01-23 01:24:40,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 722.0, 154.0, 114.0, 235.0, 734.0, 1000.0, 806.0, 659.0, 1000.0]
2026-01-23 01:24:40,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 82/100 (estimated time remaining: 30 minutes, 48 seconds)
2026-01-23 01:26:04,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:26:07,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 956.40802 ± 1252.136
2026-01-23 01:26:07,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1529.9583, 26.088745, 31.309526, 51.960167, 81.5991, 90.07245, 69.0697, 3312.9055, 3168.4768, 1202.6396]
2026-01-23 01:26:07,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [498.0, 52.0, 32.0, 48.0, 50.0, 49.0, 88.0, 1000.0, 1000.0, 361.0]
2026-01-23 01:26:07,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 83/100 (estimated time remaining: 28 minutes, 41 seconds)
2026-01-23 01:27:46,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:27:51,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1823.97815 ± 1159.923
2026-01-23 01:27:51,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [740.24695, 744.8805, 3136.2998, 179.09943, 3142.8044, 1120.3832, 1804.5737, 3161.327, 985.66223, 3224.505]
2026-01-23 01:27:51,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [228.0, 227.0, 1000.0, 86.0, 1000.0, 330.0, 550.0, 1000.0, 302.0, 1000.0]
2026-01-23 01:27:51,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 84/100 (estimated time remaining: 27 minutes, 34 seconds)
2026-01-23 01:29:20,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:29:26,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2144.55542 ± 825.191
2026-01-23 01:29:26,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1676.1997, 3231.8828, 3157.5686, 2337.4724, 985.02527, 2231.3748, 789.0711, 3142.1323, 2150.0012, 1744.8254]
2026-01-23 01:29:26,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [520.0, 1000.0, 1000.0, 739.0, 307.0, 710.0, 238.0, 1000.0, 654.0, 518.0]
2026-01-23 01:29:26,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 85/100 (estimated time remaining: 26 minutes, 7 seconds)
2026-01-23 01:30:54,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:30:56,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 655.73193 ± 1010.562
2026-01-23 01:30:56,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3195.536, 1709.6964, 148.952, 6.7128835, 56.472538, 93.365074, 37.192226, 84.74554, 71.439804, 1153.2067]
2026-01-23 01:30:56,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 511.0, 74.0, 8.0, 61.0, 59.0, 23.0, 54.0, 72.0, 379.0]
2026-01-23 01:30:56,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 86/100 (estimated time remaining: 24 minutes, 2 seconds)
2026-01-23 01:32:30,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:32:35,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1828.36548 ± 1250.930
2026-01-23 01:32:35,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3152.9214, 3180.8176, 1218.9739, 1040.6294, 3186.1133, 973.38586, 3158.768, 24.696716, 2317.765, 29.582905]
2026-01-23 01:32:35,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 371.0, 321.0, 986.0, 299.0, 1000.0, 20.0, 754.0, 23.0]
2026-01-23 01:32:35,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 87/100 (estimated time remaining: 22 minutes, 10 seconds)
2026-01-23 01:34:06,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:34:10,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1411.89758 ± 789.319
2026-01-23 01:34:10,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1024.597, 8.858531, 1746.0365, 1172.8486, 1893.5785, 3173.9185, 1668.1371, 1134.6598, 1551.0236, 745.3171]
2026-01-23 01:34:10,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [337.0, 10.0, 548.0, 346.0, 562.0, 1000.0, 498.0, 335.0, 465.0, 250.0]
2026-01-23 01:34:10,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 88/100 (estimated time remaining: 20 minutes, 55 seconds)
2026-01-23 01:35:48,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:35:55,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2242.08789 ± 956.833
2026-01-23 01:35:55,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1635.1781, 1534.3304, 3113.9592, 2282.953, 700.6073, 3000.0083, 761.54034, 3040.1548, 3174.759, 3177.3882]
2026-01-23 01:35:55,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [535.0, 498.0, 1000.0, 722.0, 246.0, 912.0, 269.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:35:55,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 89/100 (estimated time remaining: 19 minutes, 20 seconds)
2026-01-23 01:37:18,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:37:25,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2486.17456 ± 976.533
2026-01-23 01:37:25,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3132.9766, 3126.4304, 1139.4023, 331.33997, 3127.7942, 3123.1213, 3126.1633, 3127.574, 1787.7488, 2839.1978]
2026-01-23 01:37:25,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 388.0, 134.0, 1000.0, 1000.0, 1000.0, 1000.0, 583.0, 919.0]
2026-01-23 01:37:25,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 90/100 (estimated time remaining: 17 minutes, 34 seconds)
2026-01-23 01:39:01,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:39:09,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2772.86084 ± 604.779
2026-01-23 01:39:09,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3173.3513, 3187.3801, 3172.1777, 2226.9585, 1677.044, 1724.5415, 3191.1125, 3193.4866, 3009.2195, 3173.3374]
2026-01-23 01:39:09,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 720.0, 496.0, 540.0, 1000.0, 1000.0, 903.0, 1000.0]
2026-01-23 01:39:09,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 91/100 (estimated time remaining: 16 minutes, 26 seconds)
2026-01-23 01:40:37,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:40:43,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1729.09741 ± 1379.828
2026-01-23 01:40:43,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3177.2825, 3180.836, 2703.3254, 349.46152, 2929.8218, 3105.4006, 1750.2869, 19.424828, 34.507442, 40.626987]
2026-01-23 01:40:43,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 981.0, 861.0, 161.0, 927.0, 1000.0, 563.0, 22.0, 36.0, 35.0]
2026-01-23 01:40:43,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 92/100 (estimated time remaining: 14 minutes, 37 seconds)
2026-01-23 01:42:17,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:42:23,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1950.00879 ± 1184.442
2026-01-23 01:42:23,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3132.0056, 3143.3992, 3122.519, 201.06459, 3173.0688, 380.64166, 369.87247, 1568.1488, 2058.0786, 2351.2898]
2026-01-23 01:42:23,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [952.0, 1000.0, 1000.0, 93.0, 1000.0, 152.0, 149.0, 518.0, 659.0, 765.0]
2026-01-23 01:42:23,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 93/100 (estimated time remaining: 13 minutes, 9 seconds)
2026-01-23 01:43:57,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:44:05,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2409.43896 ± 1147.364
2026-01-23 01:44:05,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [12.403743, 3075.0254, 3170.3105, 3158.2808, 3155.7163, 3156.0684, 3100.6035, 1247.9707, 3143.4993, 874.50885]
2026-01-23 01:44:05,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [16.0, 1000.0, 1000.0, 1000.0, 998.0, 1000.0, 979.0, 392.0, 1000.0, 260.0]
2026-01-23 01:44:05,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 94/100 (estimated time remaining: 11 minutes, 25 seconds)
2026-01-23 01:45:35,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:45:42,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2560.60889 ± 1005.404
2026-01-23 01:45:42,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3156.1582, 741.20013, 526.44165, 3160.4937, 3145.5671, 3159.6807, 3180.194, 2204.16, 3182.0522, 3150.1404]
2026-01-23 01:45:42,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 262.0, 196.0, 1000.0, 1000.0, 1000.0, 1000.0, 665.0, 1000.0, 1000.0]
2026-01-23 01:45:42,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 95/100 (estimated time remaining: 9 minutes, 56 seconds)
2026-01-23 01:47:13,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:47:19,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2211.80322 ± 993.570
2026-01-23 01:47:19,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3165.4517, 1422.359, 3170.799, 1899.9788, 1365.5726, 3178.5225, 907.4412, 3182.9346, 719.1841, 3105.7883]
2026-01-23 01:47:19,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 465.0, 1000.0, 610.0, 446.0, 1000.0, 268.0, 1000.0, 250.0, 1000.0]
2026-01-23 01:47:19,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 10 seconds)
2026-01-23 01:48:50,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:48:58,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2650.21021 ± 833.882
2026-01-23 01:48:58,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3152.5806, 2106.681, 3146.407, 3138.1523, 3148.45, 3159.2625, 3154.0442, 1709.1643, 642.67633, 3144.6843]
2026-01-23 01:48:58,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 681.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 559.0, 213.0, 1000.0]
2026-01-23 01:48:58,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 36 seconds)
2026-01-23 01:50:29,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:50:31,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 839.38220 ± 1194.567
2026-01-23 01:50:31,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [2883.169, 3165.4802, 1682.3915, 51.252094, 68.95211, 52.20801, 20.280113, 16.084955, 88.593956, 365.41016]
2026-01-23 01:50:31,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [933.0, 1000.0, 548.0, 69.0, 48.0, 51.0, 18.0, 18.0, 68.0, 174.0]
2026-01-23 01:50:32,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 98/100 (estimated time remaining: 4 minutes, 52 seconds)
2026-01-23 01:52:04,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:52:07,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1162.57800 ± 786.514
2026-01-23 01:52:07,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1222.3287, 117.817215, 3193.4822, 1027.1912, 1035.6034, 994.7913, 1651.1781, 382.09885, 1042.1669, 959.1219]
2026-01-23 01:52:07,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [368.0, 64.0, 1000.0, 306.0, 309.0, 299.0, 484.0, 146.0, 311.0, 291.0]
2026-01-23 01:52:07,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 13 seconds)
2026-01-23 01:53:39,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:53:45,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2054.82886 ± 1096.404
2026-01-23 01:53:45,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [811.2866, 3140.4036, 708.36584, 1232.6548, 3065.7954, 3109.0444, 1490.7693, 670.9877, 3153.8276, 3165.154]
2026-01-23 01:53:45,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [283.0, 1000.0, 249.0, 381.0, 968.0, 1000.0, 453.0, 236.0, 1000.0, 1000.0]
2026-01-23 01:53:45,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 36 seconds)
2026-01-23 01:55:24,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:55:32,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2945.48047 ± 523.540
2026-01-23 01:55:32,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3150.0496, 1385.1823, 3138.666, 3150.7385, 3134.2554, 3098.7048, 3152.3186, 2946.7842, 3136.113, 3161.9915]
2026-01-23 01:55:32,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 421.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 906.0, 1000.0, 1000.0]
2026-01-23 01:55:32,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1274 [INFO]: New best (2945.48) for latency DatasetOffice
2026-01-23 01:55:32,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1299 [DEBUG]: Training session finished
