2026-01-22 23:14:11,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-hopper/DatasetOffice-bpql-mem5 
2026-01-22 23:14:11,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-hopper/DatasetOffice-bpql-mem5 
2026-01-22 23:14:11,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x15130d143610>}
2026-01-22 23:14:11,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1159 [DEBUG]: using device: cuda
2026-01-22 23:14:11,507 baseline-bpql-noisy-hopper:77 [WARNING]: args.assumed_delay != args.horizon: 5 != 32
2026-01-22 23:14:11,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1181 [INFO]: Creating new trainer
2026-01-22 23:14:11,528 baseline-bpql-noisy-hopper:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=26, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=3, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(3,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2.]]), shift: tensor([[-1., -1., -1.]]))
)
2026-01-22 23:14:11,528 baseline-bpql-noisy-hopper:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=14, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-22 23:14:12,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1242 [DEBUG]: Starting training session...
2026-01-22 23:14:12,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 1/100
2026-01-22 23:15:37,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:15:38,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 52.92936 ± 2.038
2026-01-22 23:15:38,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [50.29949, 56.470345, 52.46161, 50.52787, 55.236, 50.72939, 52.661503, 55.27458, 53.23762, 52.39529]
2026-01-22 23:15:38,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [29.0, 32.0, 30.0, 29.0, 31.0, 29.0, 30.0, 31.0, 30.0, 30.0]
2026-01-22 23:15:38,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1274 [INFO]: New best (52.93) for latency DatasetOffice
2026-01-22 23:15:38,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 21 minutes, 41 seconds)
2026-01-22 23:17:11,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:17:11,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 43.44480 ± 8.921
2026-01-22 23:17:11,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [54.706684, 50.206657, 35.24363, 38.874985, 41.15271, 59.090485, 33.370342, 31.110071, 41.86372, 48.828705]
2026-01-22 23:17:11,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [88.0, 80.0, 70.0, 69.0, 74.0, 90.0, 63.0, 65.0, 74.0, 84.0]
2026-01-22 23:17:11,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 26 minutes, 40 seconds)
2026-01-22 23:18:45,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:18:46,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 186.69394 ± 34.452
2026-01-22 23:18:46,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [202.6153, 213.80882, 215.71745, 215.43224, 197.43358, 114.20897, 187.01202, 214.9959, 133.46861, 172.2465]
2026-01-22 23:18:46,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [121.0, 123.0, 128.0, 121.0, 116.0, 86.0, 113.0, 127.0, 92.0, 108.0]
2026-01-22 23:18:46,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1274 [INFO]: New best (186.69) for latency DatasetOffice
2026-01-22 23:18:46,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 27 minutes, 44 seconds)
2026-01-22 23:20:22,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:20:26,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 596.94202 ± 323.399
2026-01-22 23:20:26,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1048.6733, 471.91998, 456.01782, 221.91129, 515.30414, 1101.475, 372.93808, 1068.5101, 446.1035, 266.5671]
2026-01-22 23:20:26,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 420.0, 258.0, 134.0, 221.0, 1000.0, 224.0, 1000.0, 225.0, 139.0]
2026-01-22 23:20:26,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1274 [INFO]: New best (596.94) for latency DatasetOffice
2026-01-22 23:20:26,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 29 minutes, 48 seconds)
2026-01-22 23:21:59,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:22:01,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 329.48590 ± 85.608
2026-01-22 23:22:01,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [495.43842, 281.63867, 412.63184, 425.33817, 322.0836, 352.16077, 238.05508, 284.33157, 219.92844, 263.25232]
2026-01-22 23:22:01,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [197.0, 138.0, 169.0, 174.0, 154.0, 161.0, 131.0, 138.0, 127.0, 135.0]
2026-01-22 23:22:01,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 28 minutes, 25 seconds)
2026-01-22 23:23:33,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:23:35,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 431.12982 ± 73.696
2026-01-22 23:23:35,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [348.67865, 516.1977, 488.3964, 424.918, 493.63425, 488.22702, 343.8213, 346.5885, 517.0791, 343.75754]
2026-01-22 23:23:35,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [149.0, 190.0, 179.0, 162.0, 183.0, 179.0, 150.0, 151.0, 188.0, 151.0]
2026-01-22 23:23:35,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 29 minutes, 31 seconds)
2026-01-22 23:25:11,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:25:14,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 546.11359 ± 226.594
2026-01-22 23:25:14,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [552.90594, 707.6734, 304.45685, 826.0185, 172.07375, 620.2258, 370.85724, 790.86774, 797.03906, 319.01733]
2026-01-22 23:25:14,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [235.0, 314.0, 264.0, 411.0, 161.0, 453.0, 184.0, 376.0, 358.0, 273.0]
2026-01-22 23:25:14,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 29 minutes, 29 seconds)
2026-01-22 23:26:47,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:26:50,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 582.81903 ± 162.453
2026-01-22 23:26:50,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [739.1483, 493.61856, 497.85904, 236.91006, 633.56433, 725.7958, 758.7332, 474.37183, 504.36343, 763.8261]
2026-01-22 23:26:50,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [282.0, 227.0, 232.0, 115.0, 243.0, 271.0, 278.0, 226.0, 226.0, 295.0]
2026-01-22 23:26:50,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 28 minutes, 16 seconds)
2026-01-22 23:28:22,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:28:24,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 345.46939 ± 171.334
2026-01-22 23:28:24,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [308.7368, 857.57367, 278.92664, 291.53705, 299.02252, 270.83896, 277.91132, 272.94, 280.25192, 316.9551]
2026-01-22 23:28:24,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [141.0, 308.0, 135.0, 134.0, 135.0, 137.0, 132.0, 134.0, 136.0, 139.0]
2026-01-22 23:28:24,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 24 minutes, 48 seconds)
2026-01-22 23:29:58,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:30:01,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 889.93054 ± 358.203
2026-01-22 23:30:01,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1453.2695, 874.6579, 929.4961, 581.4203, 1460.482, 550.7252, 503.32288, 672.6753, 1273.6393, 599.6173]
2026-01-22 23:30:01,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [531.0, 283.0, 285.0, 214.0, 567.0, 208.0, 192.0, 235.0, 493.0, 226.0]
2026-01-22 23:30:01,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1274 [INFO]: New best (889.93) for latency DatasetOffice
2026-01-22 23:30:01,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 24 minutes, 7 seconds)
2026-01-22 23:31:36,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:31:40,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1176.15125 ± 632.373
2026-01-22 23:31:40,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [2259.2942, 911.99664, 1333.8352, 238.71745, 265.15475, 990.7972, 1903.8716, 815.2816, 1309.119, 1733.445]
2026-01-22 23:31:40,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [786.0, 291.0, 448.0, 104.0, 117.0, 356.0, 664.0, 259.0, 447.0, 606.0]
2026-01-22 23:31:40,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1274 [INFO]: New best (1176.15) for latency DatasetOffice
2026-01-22 23:31:40,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 23 minutes, 48 seconds)
2026-01-22 23:33:14,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:33:16,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 712.06726 ± 402.944
2026-01-22 23:33:16,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [248.49855, 1248.2981, 1049.8278, 685.00525, 243.97057, 1162.516, 964.6958, 241.08853, 1017.7727, 258.9999]
2026-01-22 23:33:16,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [143.0, 420.0, 353.0, 226.0, 140.0, 389.0, 309.0, 143.0, 353.0, 138.0]
2026-01-22 23:33:16,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 21 minutes, 30 seconds)
2026-01-22 23:34:52,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:34:57,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1721.14624 ± 436.335
2026-01-22 23:34:57,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [2023.6346, 1672.8729, 1525.5149, 1755.7161, 1535.9036, 2026.8269, 2462.335, 747.64594, 1460.7023, 2000.3121]
2026-01-22 23:34:57,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [669.0, 553.0, 507.0, 575.0, 535.0, 647.0, 847.0, 233.0, 474.0, 650.0]
2026-01-22 23:34:57,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1274 [INFO]: New best (1721.15) for latency DatasetOffice
2026-01-22 23:34:57,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 21 minutes, 22 seconds)
2026-01-22 23:36:31,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:36:35,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1176.18640 ± 584.078
2026-01-22 23:36:35,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [718.23303, 976.9437, 973.91473, 2651.252, 943.5749, 1817.8229, 721.4561, 1277.9701, 975.99023, 704.70605]
2026-01-22 23:36:35,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [231.0, 319.0, 297.0, 883.0, 301.0, 602.0, 233.0, 431.0, 339.0, 235.0]
2026-01-22 23:36:35,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 20 minutes, 48 seconds)
2026-01-22 23:38:08,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:38:10,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 908.51794 ± 513.792
2026-01-22 23:38:10,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1220.1827, 992.43744, 1299.8307, 1060.1252, 18.021122, 854.0427, 967.0237, 104.49337, 1846.1879, 722.8346]
2026-01-22 23:38:10,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [430.0, 302.0, 435.0, 347.0, 16.0, 285.0, 307.0, 56.0, 592.0, 231.0]
2026-01-22 23:38:10,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 18 minutes, 37 seconds)
2026-01-22 23:39:47,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:39:50,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 919.46484 ± 877.536
2026-01-22 23:39:50,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [2768.0237, 1278.2395, 1836.0509, 103.208405, 60.39681, 1539.4164, 197.33865, 35.221436, 929.1201, 447.63318]
2026-01-22 23:39:50,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [916.0, 439.0, 608.0, 59.0, 39.0, 512.0, 91.0, 43.0, 352.0, 188.0]
2026-01-22 23:39:50,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 17 minutes, 16 seconds)
2026-01-22 23:41:24,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:41:28,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1368.12512 ± 478.665
2026-01-22 23:41:28,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [627.35345, 908.2357, 2330.8809, 1823.8295, 1107.7098, 1409.7495, 1191.77, 1635.8479, 991.28424, 1654.5897]
2026-01-22 23:41:28,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [232.0, 282.0, 735.0, 579.0, 354.0, 455.0, 397.0, 527.0, 306.0, 581.0]
2026-01-22 23:41:28,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 16 minutes, 4 seconds)
2026-01-22 23:43:01,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:43:09,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2038.06799 ± 784.903
2026-01-22 23:43:09,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [2766.5967, 2759.6133, 2787.7024, 2815.932, 1031.1652, 2000.8499, 2677.561, 951.8165, 1636.0856, 953.3576]
2026-01-22 23:43:09,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 325.0, 728.0, 1000.0, 317.0, 625.0, 317.0]
2026-01-22 23:43:09,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1274 [INFO]: New best (2038.07) for latency DatasetOffice
2026-01-22 23:43:09,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 14 minutes, 19 seconds)
2026-01-22 23:44:48,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:44:52,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1281.62134 ± 1282.427
2026-01-22 23:44:52,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [2688.4158, 2840.0325, 2852.6257, 2829.5813, 1139.032, 23.327463, 44.3501, 8.668982, 24.429693, 365.74994]
2026-01-22 23:44:52,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 967.0, 406.0, 21.0, 55.0, 11.0, 41.0, 160.0]
2026-01-22 23:44:52,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 14 minutes, 16 seconds)
2026-01-22 23:46:23,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:46:30,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2390.75122 ± 834.193
2026-01-22 23:46:30,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1219.0122, 3084.799, 3109.566, 3002.0947, 3076.1775, 1119.6149, 3040.0483, 3056.3618, 1520.5238, 1679.3134]
2026-01-22 23:46:30,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [387.0, 1000.0, 1000.0, 1000.0, 1000.0, 348.0, 1000.0, 1000.0, 480.0, 519.0]
2026-01-22 23:46:30,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1274 [INFO]: New best (2390.75) for latency DatasetOffice
2026-01-22 23:46:30,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 13 minutes, 17 seconds)
2026-01-22 23:48:05,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:48:13,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2495.96533 ± 710.589
2026-01-22 23:48:13,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [2071.4797, 2733.613, 3131.9456, 3010.5505, 2622.0823, 2961.458, 875.8045, 2951.8242, 1592.7546, 3008.1418]
2026-01-22 23:48:13,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [648.0, 856.0, 1000.0, 1000.0, 821.0, 1000.0, 295.0, 1000.0, 491.0, 1000.0]
2026-01-22 23:48:13,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1274 [INFO]: New best (2495.97) for latency DatasetOffice
2026-01-22 23:48:13,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 12 minutes, 20 seconds)
2026-01-22 23:49:48,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:49:49,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 355.42584 ± 924.928
2026-01-22 23:49:49,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3129.0981, 46.99489, 33.20198, 86.30051, 17.79345, 74.71243, 89.88697, 18.143063, 19.652512, 38.474396]
2026-01-22 23:49:49,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [991.0, 36.0, 26.0, 48.0, 18.0, 50.0, 55.0, 19.0, 20.0, 31.0]
2026-01-22 23:49:49,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 10 minutes, 16 seconds)
2026-01-22 23:51:29,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:51:36,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2281.47241 ± 996.437
2026-01-22 23:51:36,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [2313.805, 1441.4904, 877.4157, 3121.53, 1226.6226, 924.91, 3163.9553, 3189.41, 3321.279, 3234.307]
2026-01-22 23:51:36,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [720.0, 433.0, 268.0, 1000.0, 390.0, 286.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-22 23:51:36,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 10 minutes, 8 seconds)
2026-01-22 23:53:07,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:53:12,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1704.54468 ± 950.768
2026-01-22 23:53:12,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3070.2131, 3128.6743, 1778.0352, 907.9469, 916.2964, 927.90765, 928.6778, 1392.9911, 899.303, 3095.403]
2026-01-22 23:53:12,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 559.0, 278.0, 280.0, 282.0, 286.0, 455.0, 273.0, 1000.0]
2026-01-22 23:53:12,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 6 minutes, 29 seconds)
2026-01-22 23:54:54,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:55:01,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2233.99951 ± 1112.182
2026-01-22 23:55:01,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [2654.784, 858.6928, 3014.2676, 80.316414, 789.77045, 3004.2625, 3137.844, 3039.2083, 2688.5278, 3072.3235]
2026-01-22 23:55:01,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [853.0, 251.0, 1000.0, 47.0, 235.0, 991.0, 1000.0, 1000.0, 886.0, 1000.0]
2026-01-22 23:55:01,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 7 minutes, 39 seconds)
2026-01-22 23:56:29,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:56:35,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2230.80933 ± 970.201
2026-01-22 23:56:35,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3186.412, 1708.321, 2994.1968, 3233.7812, 3228.5562, 1010.3836, 1115.8445, 3251.7832, 1071.3906, 1507.4237]
2026-01-22 23:56:35,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 527.0, 895.0, 1000.0, 1000.0, 308.0, 371.0, 1000.0, 331.0, 461.0]
2026-01-22 23:56:35,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 4 minutes, 2 seconds)
2026-01-22 23:58:09,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:58:17,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2578.43555 ± 972.408
2026-01-22 23:58:17,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3065.1052, 3013.8213, 3079.5403, 3072.178, 69.34388, 3015.5298, 3022.2498, 3044.139, 1379.803, 3022.6462]
2026-01-22 23:58:17,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 43.0, 1000.0, 1000.0, 1000.0, 434.0, 1000.0]
2026-01-22 23:58:17,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1274 [INFO]: New best (2578.44) for latency DatasetOffice
2026-01-22 23:58:17,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 3 minutes, 40 seconds)
2026-01-23 00:00:00,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:00:04,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1210.81226 ± 1308.206
2026-01-23 00:00:04,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3113.551, 999.4251, 160.78404, 2862.749, 3158.5327, 1696.4318, 17.588678, 15.447607, 8.65961, 74.95328]
2026-01-23 00:00:04,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 300.0, 76.0, 932.0, 1000.0, 555.0, 18.0, 16.0, 11.0, 57.0]
2026-01-23 00:00:04,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 2 minutes, 3 seconds)
2026-01-23 00:01:40,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:01:46,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2062.73096 ± 1078.292
2026-01-23 00:01:46,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3115.9788, 3103.2478, 1228.5065, 3126.0518, 3165.396, 1089.9734, 923.9589, 3151.5618, 1057.8744, 664.759]
2026-01-23 00:01:46,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 371.0, 1000.0, 1000.0, 322.0, 281.0, 1000.0, 315.0, 229.0]
2026-01-23 00:01:46,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 1 minute, 48 seconds)
2026-01-23 00:03:21,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:03:29,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2457.56787 ± 1085.972
2026-01-23 00:03:29,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [896.9688, 3151.7068, 60.00967, 3120.666, 3125.6533, 3147.3933, 3076.6404, 3137.6206, 3107.213, 1751.8053]
2026-01-23 00:03:29,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [309.0, 1000.0, 40.0, 1000.0, 967.0, 1000.0, 1000.0, 1000.0, 1000.0, 527.0]
2026-01-23 00:03:29,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 58 minutes, 31 seconds)
2026-01-23 00:04:56,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:05:01,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1674.99353 ± 986.278
2026-01-23 00:05:01,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1329.2627, 1898.2715, 2277.7786, 944.3252, 673.932, 11.235246, 3236.075, 1477.3942, 1672.7721, 3228.8882]
2026-01-23 00:05:01,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [395.0, 562.0, 682.0, 291.0, 229.0, 13.0, 1000.0, 448.0, 500.0, 1000.0]
2026-01-23 00:05:01,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 56 minutes, 13 seconds)
2026-01-23 00:06:39,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:06:43,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1604.43237 ± 685.588
2026-01-23 00:06:43,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3018.44, 1215.1411, 1112.5554, 1207.8019, 2622.9224, 1620.6276, 969.4754, 907.79895, 1359.8811, 2009.6797]
2026-01-23 00:06:43,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [896.0, 379.0, 333.0, 370.0, 778.0, 481.0, 285.0, 276.0, 399.0, 607.0]
2026-01-23 00:06:43,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 54 minutes, 39 seconds)
2026-01-23 00:08:18,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:08:25,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2325.87451 ± 1061.286
2026-01-23 00:08:25,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3155.6606, 3158.8662, 3162.1526, 3158.4705, 958.31067, 1852.4558, 3140.2607, 843.9772, 3178.395, 650.1975]
2026-01-23 00:08:25,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 304.0, 558.0, 1000.0, 278.0, 1000.0, 203.0]
2026-01-23 00:08:25,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 51 minutes, 48 seconds)
2026-01-23 00:10:01,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:10:08,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2255.11572 ± 801.498
2026-01-23 00:10:08,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [2640.538, 747.2483, 1994.5377, 3030.0146, 3065.1035, 3080.0632, 3052.7068, 1456.5468, 1428.5519, 2055.8457]
2026-01-23 00:10:08,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [831.0, 249.0, 679.0, 1000.0, 1000.0, 1000.0, 1000.0, 497.0, 427.0, 665.0]
2026-01-23 00:10:08,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 50 minutes, 17 seconds)
2026-01-23 00:11:42,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:11:48,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1819.44116 ± 959.724
2026-01-23 00:11:48,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3207.419, 913.3992, 2213.0276, 1129.5067, 3184.7974, 934.57074, 3136.8748, 1067.8663, 1454.3452, 952.6053]
2026-01-23 00:11:48,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 286.0, 678.0, 353.0, 1000.0, 295.0, 1000.0, 323.0, 445.0, 298.0]
2026-01-23 00:11:48,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 48 minutes, 8 seconds)
2026-01-23 00:13:28,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:13:37,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2937.20508 ± 647.099
2026-01-23 00:13:37,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3180.0527, 3117.9385, 997.69366, 3093.6472, 3128.8384, 3167.2456, 3168.6956, 3170.6062, 3176.8352, 3170.4954]
2026-01-23 00:13:37,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 327.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:13:37,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1274 [INFO]: New best (2937.21) for latency DatasetOffice
2026-01-23 00:13:37,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 50 minutes, 14 seconds)
2026-01-23 00:15:09,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:15:14,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1625.50562 ± 494.285
2026-01-23 00:15:14,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1370.1396, 1435.85, 2630.572, 1622.6885, 567.5837, 1776.6658, 1649.254, 1916.2897, 1871.0919, 1414.9207]
2026-01-23 00:15:14,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [403.0, 445.0, 791.0, 513.0, 186.0, 530.0, 487.0, 574.0, 555.0, 424.0]
2026-01-23 00:15:14,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 47 minutes, 13 seconds)
2026-01-23 00:16:51,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:16:58,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2507.41187 ± 785.759
2026-01-23 00:16:58,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1450.2592, 1455.7089, 3171.9824, 2947.286, 3145.3884, 1375.5809, 3163.1821, 3226.85, 3148.6516, 1989.2278]
2026-01-23 00:16:58,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [472.0, 437.0, 1000.0, 891.0, 1000.0, 409.0, 1000.0, 1000.0, 976.0, 595.0]
2026-01-23 00:16:58,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 46 minutes, 9 seconds)
2026-01-23 00:18:25,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:18:31,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2104.82861 ± 1079.670
2026-01-23 00:18:31,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3182.2466, 3165.5554, 1242.3306, 3149.5393, 1062.2708, 1136.1329, 3131.0845, 532.2576, 3212.365, 1234.5016]
2026-01-23 00:18:31,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 368.0, 1000.0, 324.0, 341.0, 1000.0, 178.0, 1000.0, 374.0]
2026-01-23 00:18:31,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 42 minutes, 26 seconds)
2026-01-23 00:20:06,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:20:14,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2444.15308 ± 1136.411
2026-01-23 00:20:14,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3232.3533, 3272.7744, 3203.3599, 3231.6455, 1094.4445, 3280.542, 1200.5309, 3186.078, 2689.6653, 50.13768]
2026-01-23 00:20:14,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 327.0, 1000.0, 357.0, 1000.0, 828.0, 39.0]
2026-01-23 00:20:14,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 41 minutes, 8 seconds)
2026-01-23 00:21:48,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:21:56,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2709.80835 ± 984.604
2026-01-23 00:21:56,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3182.4727, 3162.1812, 3136.4292, 3174.558, 3143.6025, 3183.9023, 1806.2103, 15.810123, 3143.1882, 3149.731]
2026-01-23 00:21:56,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 586.0, 15.0, 1000.0, 1000.0]
2026-01-23 00:21:56,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 38 minutes, 3 seconds)
2026-01-23 00:23:35,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:23:43,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2680.80249 ± 1066.546
2026-01-23 00:23:43,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3174.5942, 3202.4773, 3177.418, 3238.4675, 3190.422, 3196.559, 3198.3777, 107.69878, 3244.2344, 1077.7745]
2026-01-23 00:23:43,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 59.0, 1000.0, 324.0]
2026-01-23 00:23:43,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 38 minutes, 31 seconds)
2026-01-23 00:25:09,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:25:15,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2143.31104 ± 1402.075
2026-01-23 00:25:15,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3237.2876, 3150.163, 3188.8135, 3198.6719, 3242.0798, 3208.8262, 1992.2567, 59.478195, 75.655014, 79.87865]
2026-01-23 00:25:15,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 634.0, 36.0, 47.0, 56.0]
2026-01-23 00:25:15,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 34 minutes, 22 seconds)
2026-01-23 00:26:50,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:27:00,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 3103.83081 ± 271.216
2026-01-23 00:27:00,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3197.4321, 3169.9373, 3179.2595, 3165.5332, 2293.9758, 3176.135, 3176.4822, 3202.4812, 3221.8137, 3255.2573]
2026-01-23 00:27:00,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 689.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:27:00,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1274 [INFO]: New best (3103.83) for latency DatasetOffice
2026-01-23 00:27:00,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 34 minutes, 50 seconds)
2026-01-23 00:28:32,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:28:41,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2897.17334 ± 878.146
2026-01-23 00:28:41,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3198.641, 3203.96, 3225.2288, 263.1477, 3187.337, 3170.2268, 3170.978, 3187.2478, 3186.0237, 3178.9434]
2026-01-23 00:28:41,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 116.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:28:41,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 32 minutes, 57 seconds)
2026-01-23 00:30:14,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:30:19,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1722.56287 ± 1234.988
2026-01-23 00:30:19,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1349.4556, 3302.3713, 635.77386, 3235.5588, 2314.1335, 37.898655, 1885.0548, 39.334827, 1068.9379, 3357.109]
2026-01-23 00:30:19,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [403.0, 1000.0, 222.0, 1000.0, 720.0, 28.0, 606.0, 49.0, 360.0, 1000.0]
2026-01-23 00:30:19,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 30 minutes, 34 seconds)
2026-01-23 00:31:50,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:31:59,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2889.09302 ± 726.422
2026-01-23 00:31:59,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3106.5234, 3153.4717, 3162.3328, 3119.2427, 3119.912, 3120.8347, 3135.5662, 3131.8267, 3130.8813, 710.33716]
2026-01-23 00:31:59,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 243.0]
2026-01-23 00:31:59,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 27 minutes, 34 seconds)
2026-01-23 00:33:36,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:33:44,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2698.84888 ± 1088.113
2026-01-23 00:33:44,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3200.9634, 3208.6018, 3220.054, 3248.654, 3259.452, 887.20184, 3273.7573, 3246.2908, 3241.2998, 202.21445]
2026-01-23 00:33:44,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 282.0, 1000.0, 1000.0, 1000.0, 92.0]
2026-01-23 00:33:45,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 28 minutes, 17 seconds)
2026-01-23 00:35:17,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:35:26,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2883.99878 ± 581.177
2026-01-23 00:35:26,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [2610.563, 3254.291, 2047.769, 3237.5889, 1575.0438, 3090.3022, 3290.5935, 3350.3508, 3110.0864, 3273.3975]
2026-01-23 00:35:26,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [781.0, 1000.0, 603.0, 1000.0, 469.0, 914.0, 1000.0, 1000.0, 911.0, 1000.0]
2026-01-23 00:35:26,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 26 minutes, 2 seconds)
2026-01-23 00:36:54,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:37:03,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 3021.17725 ± 437.631
2026-01-23 00:37:03,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [2936.6055, 3272.183, 3282.4585, 3284.6877, 3277.9465, 2316.9011, 3243.7158, 3272.6343, 3284.8533, 2039.7854]
2026-01-23 00:37:03,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [871.0, 1000.0, 1000.0, 1000.0, 1000.0, 677.0, 1000.0, 1000.0, 1000.0, 605.0]
2026-01-23 00:37:03,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 23 minutes, 44 seconds)
2026-01-23 00:38:44,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:38:50,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1988.70605 ± 1389.679
2026-01-23 00:38:50,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3230.0479, 3320.1226, 3205.16, 1772.6364, 3198.2874, 3275.9263, 1738.128, 25.212759, 55.89271, 65.64634]
2026-01-23 00:38:50,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 534.0, 978.0, 1000.0, 549.0, 25.0, 35.0, 47.0]
2026-01-23 00:38:50,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 23 minutes, 26 seconds)
2026-01-23 00:40:23,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:40:32,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 3133.18384 ± 398.156
2026-01-23 00:40:32,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3289.3767, 3248.7642, 3230.1787, 3281.184, 3260.5952, 3280.9612, 1941.1959, 3215.4238, 3296.9912, 3287.1655]
2026-01-23 00:40:32,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 602.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:40:32,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1274 [INFO]: New best (3133.18) for latency DatasetOffice
2026-01-23 00:40:32,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 22 minutes, 8 seconds)
2026-01-23 00:41:58,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:42:07,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 3018.27344 ± 497.482
2026-01-23 00:42:07,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3174.702, 3171.951, 3209.4417, 3192.5254, 3204.6426, 3179.0642, 3162.9094, 1526.4276, 3188.3652, 3172.7053]
2026-01-23 00:42:07,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 493.0, 1000.0, 1000.0]
2026-01-23 00:42:07,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 18 minutes, 43 seconds)
2026-01-23 00:43:41,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:43:46,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1580.19446 ± 1383.758
2026-01-23 00:43:46,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3199.9954, 3227.8584, 3216.2888, 75.98919, 2046.9691, 2759.311, 1138.549, 43.422234, 80.164604, 13.396131]
2026-01-23 00:43:46,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 47.0, 617.0, 842.0, 381.0, 29.0, 45.0, 14.0]
2026-01-23 00:43:46,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 16 minutes, 44 seconds)
2026-01-23 00:45:19,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:45:27,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2877.64111 ± 700.405
2026-01-23 00:45:27,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3229.5193, 3206.496, 1329.1624, 3232.122, 3212.5828, 1638.6436, 3249.9907, 3222.6604, 3214.974, 3240.257]
2026-01-23 00:45:27,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 432.0, 1000.0, 1000.0, 523.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:45:27,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 15 minutes, 37 seconds)
2026-01-23 00:47:00,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:47:10,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 3307.74731 ± 40.134
2026-01-23 00:47:10,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3311.0125, 3288.469, 3313.2346, 3244.3154, 3309.8809, 3256.8972, 3307.2383, 3315.554, 3331.0537, 3399.817]
2026-01-23 00:47:10,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:47:10,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1274 [INFO]: New best (3307.75) for latency DatasetOffice
2026-01-23 00:47:10,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 13 minutes, 19 seconds)
2026-01-23 00:48:43,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:48:47,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1504.41382 ± 1275.621
2026-01-23 00:48:47,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3261.9485, 2812.159, 3280.6868, 2294.399, 112.91437, 1766.1389, 51.43224, 994.2056, 453.44806, 16.805552]
2026-01-23 00:48:47,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 856.0, 1000.0, 719.0, 93.0, 553.0, 55.0, 334.0, 186.0, 18.0]
2026-01-23 00:48:47,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 10 minutes, 55 seconds)
2026-01-23 00:50:20,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:50:30,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 3178.01562 ± 178.117
2026-01-23 00:50:30,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3194.4172, 3310.059, 3261.6309, 3181.8076, 3178.63, 3286.6921, 3255.9465, 3253.822, 3197.346, 2659.8052]
2026-01-23 00:50:30,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 824.0]
2026-01-23 00:50:30,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 10 minutes, 23 seconds)
2026-01-23 00:52:10,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:52:17,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2324.51025 ± 1155.243
2026-01-23 00:52:17,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3262.2566, 1027.9413, 996.64264, 3236.1685, 3242.0566, 3280.092, 691.8065, 937.48895, 3293.6233, 3277.0276]
2026-01-23 00:52:17,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 340.0, 300.0, 1000.0, 1000.0, 1000.0, 236.0, 300.0, 1000.0, 1000.0]
2026-01-23 00:52:17,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 9 minutes, 49 seconds)
2026-01-23 00:53:45,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:53:53,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2579.57007 ± 1003.146
2026-01-23 00:53:53,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3259.647, 3271.4055, 598.17584, 3296.3306, 3361.6917, 2694.4739, 3267.2046, 3290.4585, 1068.0725, 1688.2424]
2026-01-23 00:53:53,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 183.0, 1000.0, 1000.0, 793.0, 1000.0, 1000.0, 318.0, 497.0]
2026-01-23 00:53:53,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 7 minutes, 23 seconds)
2026-01-23 00:55:26,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:55:33,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2418.04028 ± 928.291
2026-01-23 00:55:33,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1383.8666, 3273.8733, 1139.8322, 2883.9402, 1896.8945, 3350.3015, 2726.0198, 937.1673, 3280.9905, 3307.517]
2026-01-23 00:55:33,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [405.0, 1000.0, 339.0, 840.0, 560.0, 1000.0, 815.0, 283.0, 1000.0, 1000.0]
2026-01-23 00:55:33,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 5 minutes, 22 seconds)
2026-01-23 00:57:01,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:57:07,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1925.44458 ± 1128.077
2026-01-23 00:57:07,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3227.0776, 3236.2195, 3249.2322, 1108.4875, 578.79065, 902.41705, 1140.7794, 1962.6361, 617.45526, 3231.35]
2026-01-23 00:57:07,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 332.0, 210.0, 303.0, 379.0, 624.0, 221.0, 1000.0]
2026-01-23 00:57:07,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 3 minutes, 20 seconds)
2026-01-23 00:58:41,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:58:45,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1414.82349 ± 1146.793
2026-01-23 00:58:45,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [2194.8013, 60.758686, 1870.2362, 48.694187, 1046.9036, 462.32285, 878.3531, 983.2297, 3311.8784, 3291.057]
2026-01-23 00:58:45,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [674.0, 43.0, 601.0, 55.0, 352.0, 180.0, 265.0, 296.0, 1000.0, 1000.0]
2026-01-23 00:58:45,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 1 minute, 7 seconds)
2026-01-23 01:00:25,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:00:32,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2682.08398 ± 1025.881
2026-01-23 01:00:32,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3260.458, 3297.8174, 3229.336, 3288.6897, 3299.1604, 3225.5986, 1935.5989, 12.881761, 3260.4858, 2010.8135]
2026-01-23 01:00:32,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 606.0, 13.0, 1000.0, 593.0]
2026-01-23 01:00:32,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 65/100 (estimated time remaining: 59 minutes, 26 seconds)
2026-01-23 01:02:04,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:02:12,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2723.79590 ± 1092.549
2026-01-23 01:02:12,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3266.5984, 3267.53, 3289.6836, 3294.4856, 700.13605, 3264.4495, 3268.8577, 3296.3142, 387.3713, 3202.5332]
2026-01-23 01:02:12,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 240.0, 1000.0, 1000.0, 1000.0, 147.0, 1000.0]
2026-01-23 01:02:12,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 66/100 (estimated time remaining: 58 minutes, 17 seconds)
2026-01-23 01:03:43,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:03:51,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2947.59009 ± 975.734
2026-01-23 01:03:51,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [20.430761, 3265.6619, 3268.5444, 3273.815, 3273.0142, 3269.075, 3281.3894, 3278.561, 3279.2104, 3266.197]
2026-01-23 01:03:51,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [19.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:03:51,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 67/100 (estimated time remaining: 56 minutes, 28 seconds)
2026-01-23 01:05:24,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:05:33,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 3289.13867 ± 2.892
2026-01-23 01:05:33,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3282.7815, 3289.8706, 3292.047, 3289.9373, 3289.7432, 3286.6328, 3287.6772, 3290.5325, 3293.848, 3288.318]
2026-01-23 01:05:33,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:05:33,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 68/100 (estimated time remaining: 55 minutes, 39 seconds)
2026-01-23 01:07:03,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:07:08,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1661.71350 ± 1422.941
2026-01-23 01:07:08,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3307.2705, 3255.396, 1210.1691, 13.117812, 3276.6228, 3261.864, 1840.9319, 77.75751, 284.94287, 89.06312]
2026-01-23 01:07:08,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 367.0, 14.0, 1000.0, 1000.0, 577.0, 55.0, 131.0, 50.0]
2026-01-23 01:07:08,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 69/100 (estimated time remaining: 53 minutes, 34 seconds)
2026-01-23 01:08:42,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:08:47,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1811.93787 ± 943.941
2026-01-23 01:08:47,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3017.0173, 1406.6417, 3317.019, 1147.4435, 982.93005, 1244.2437, 1058.7803, 1672.687, 957.25336, 3315.364]
2026-01-23 01:08:47,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [890.0, 415.0, 1000.0, 338.0, 298.0, 372.0, 315.0, 493.0, 288.0, 1000.0]
2026-01-23 01:08:47,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 70/100 (estimated time remaining: 51 minutes, 5 seconds)
2026-01-23 01:10:24,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:10:33,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 3012.74292 ± 493.780
2026-01-23 01:10:33,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [2217.5222, 3194.859, 1885.5999, 3304.0005, 3289.8435, 3017.676, 3298.1602, 3303.596, 3303.382, 3312.7888]
2026-01-23 01:10:33,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [697.0, 1000.0, 560.0, 1000.0, 1000.0, 890.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:10:33,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 71/100 (estimated time remaining: 50 minutes, 3 seconds)
2026-01-23 01:12:06,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:12:14,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2627.68311 ± 1281.625
2026-01-23 01:12:14,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3318.1594, 3255.8584, 3284.273, 3202.1023, 3279.9268, 3289.7336, 3261.1274, 3255.4832, 76.236694, 53.932]
2026-01-23 01:12:14,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 81.0, 33.0]
2026-01-23 01:12:14,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 72/100 (estimated time remaining: 48 minutes, 36 seconds)
2026-01-23 01:13:42,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:13:49,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2570.56299 ± 863.372
2026-01-23 01:13:49,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3236.7222, 1935.0793, 1657.8802, 3339.4043, 3249.006, 3230.053, 1122.7651, 3232.008, 3261.7002, 1441.0112]
2026-01-23 01:13:49,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 614.0, 530.0, 1000.0, 1000.0, 1000.0, 350.0, 1000.0, 1000.0, 426.0]
2026-01-23 01:13:49,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 73/100 (estimated time remaining: 46 minutes, 17 seconds)
2026-01-23 01:15:23,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:15:31,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2817.49756 ± 907.310
2026-01-23 01:15:31,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1076.6423, 3318.7954, 3297.2695, 3206.179, 3202.2083, 3268.9153, 934.2426, 3286.834, 3285.8418, 3298.0469]
2026-01-23 01:15:31,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [328.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 291.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:15:31,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 74/100 (estimated time remaining: 45 minutes, 18 seconds)
2026-01-23 01:17:03,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:17:07,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1286.52148 ± 1289.419
2026-01-23 01:17:07,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1713.8302, 84.832695, 61.654747, 165.24124, 140.82307, 37.982838, 3271.5073, 1653.5392, 2484.5444, 3251.2598]
2026-01-23 01:17:07,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [551.0, 48.0, 53.0, 80.0, 71.0, 60.0, 1000.0, 492.0, 731.0, 1000.0]
2026-01-23 01:17:07,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 75/100 (estimated time remaining: 43 minutes, 20 seconds)
2026-01-23 01:18:42,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:18:48,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1939.45862 ± 1190.287
2026-01-23 01:18:48,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1479.4521, 923.7505, 1768.5088, 3354.1567, 1402.0709, 613.6238, 58.81777, 3314.2368, 3102.539, 3377.4307]
2026-01-23 01:18:48,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [439.0, 282.0, 527.0, 1000.0, 413.0, 213.0, 35.0, 1000.0, 911.0, 989.0]
2026-01-23 01:18:48,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 76/100 (estimated time remaining: 41 minutes, 15 seconds)
2026-01-23 01:20:24,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:20:33,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 3086.97021 ± 442.111
2026-01-23 01:20:33,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3277.9612, 3262.317, 1804.158, 3313.5579, 3232.2737, 3272.8533, 2900.311, 3250.9146, 3290.3826, 3264.971]
2026-01-23 01:20:33,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 542.0, 1000.0, 1000.0, 1000.0, 884.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:20:33,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 77/100 (estimated time remaining: 39 minutes, 55 seconds)
2026-01-23 01:22:01,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:22:06,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1567.85767 ± 1501.746
2026-01-23 01:22:06,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3294.0005, 3277.9934, 3294.4724, 2732.6333, 2677.6592, 8.854241, 29.68516, 112.44463, 88.295296, 162.53928]
2026-01-23 01:22:06,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 802.0, 822.0, 12.0, 27.0, 65.0, 60.0, 83.0]
2026-01-23 01:22:06,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 78/100 (estimated time remaining: 38 minutes, 4 seconds)
2026-01-23 01:23:45,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:23:53,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2977.02197 ± 914.928
2026-01-23 01:23:53,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3256.359, 3272.2053, 3306.62, 3302.9927, 233.13463, 3304.31, 3243.0918, 3311.4954, 3285.8804, 3254.13]
2026-01-23 01:23:53,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 101.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:23:53,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 79/100 (estimated time remaining: 36 minutes, 50 seconds)
2026-01-23 01:25:21,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:25:28,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2541.74414 ± 658.375
2026-01-23 01:25:28,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [2942.4468, 2051.8757, 3104.5754, 2925.4985, 1288.618, 1742.4005, 3131.0542, 2058.068, 2878.9927, 3293.9119]
2026-01-23 01:25:28,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [868.0, 610.0, 907.0, 872.0, 390.0, 534.0, 921.0, 611.0, 837.0, 1000.0]
2026-01-23 01:25:28,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 80/100 (estimated time remaining: 35 minutes, 4 seconds)
2026-01-23 01:26:59,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:27:03,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1289.96350 ± 1520.612
2026-01-23 01:27:03,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [100.71321, 69.95778, 27.457617, 51.345264, 32.200745, 51.63071, 3257.0825, 3295.637, 3314.205, 2699.4038]
2026-01-23 01:27:03,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [55.0, 43.0, 22.0, 31.0, 27.0, 30.0, 1000.0, 1000.0, 1000.0, 821.0]
2026-01-23 01:27:03,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 81/100 (estimated time remaining: 32 minutes, 59 seconds)
2026-01-23 01:28:40,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:28:46,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2305.17456 ± 1455.777
2026-01-23 01:28:46,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3253.2002, 3251.5244, 205.46565, 21.052034, 21.444609, 3259.0134, 3258.066, 3246.668, 3276.8457, 3258.4666]
2026-01-23 01:28:46,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 94.0, 19.0, 19.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:28:47,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 82/100 (estimated time remaining: 31 minutes, 14 seconds)
2026-01-23 01:30:20,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:30:26,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2090.04492 ± 1145.378
2026-01-23 01:30:26,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3304.7004, 1691.7505, 2834.2832, 1997.1259, 3323.469, 1790.7784, 205.25084, 36.29764, 2433.8914, 3282.9023]
2026-01-23 01:30:26,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 496.0, 838.0, 590.0, 1000.0, 530.0, 93.0, 30.0, 710.0, 1000.0]
2026-01-23 01:30:26,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 83/100 (estimated time remaining: 30 minutes, 1 second)
2026-01-23 01:31:58,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:32:03,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1839.57190 ± 1508.580
2026-01-23 01:32:03,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3280.369, 3287.0137, 3296.5505, 3305.3027, 3266.2056, 1585.1049, 59.096916, 41.675114, 44.15507, 230.2464]
2026-01-23 01:32:03,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 497.0, 40.0, 26.0, 39.0, 107.0]
2026-01-23 01:32:03,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 84/100 (estimated time remaining: 27 minutes, 44 seconds)
2026-01-23 01:33:35,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:33:41,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2345.96924 ± 814.385
2026-01-23 01:33:41,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1908.2511, 2618.735, 3311.761, 2929.9314, 1264.5377, 995.0341, 3238.1882, 2009.0316, 3322.5881, 1861.6329]
2026-01-23 01:33:41,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [560.0, 764.0, 1000.0, 866.0, 376.0, 297.0, 976.0, 585.0, 1000.0, 546.0]
2026-01-23 01:33:41,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 85/100 (estimated time remaining: 26 minutes, 18 seconds)
2026-01-23 01:35:17,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:35:25,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2829.82251 ± 734.365
2026-01-23 01:35:25,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1014.1078, 3291.6746, 2604.9165, 3308.39, 3325.1035, 1986.6498, 3304.5981, 2889.274, 3301.15, 3272.3596]
2026-01-23 01:35:25,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [304.0, 1000.0, 768.0, 1000.0, 1000.0, 580.0, 997.0, 844.0, 1000.0, 1000.0]
2026-01-23 01:35:25,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 86/100 (estimated time remaining: 25 minutes, 7 seconds)
2026-01-23 01:36:59,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:37:01,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 677.11664 ± 943.911
2026-01-23 01:37:01,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [1967.3778, 57.62874, 95.90106, 22.567427, 171.12291, 33.012978, 24.685104, 33.312397, 2024.9697, 2340.5884]
2026-01-23 01:37:01,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [600.0, 49.0, 61.0, 19.0, 76.0, 29.0, 21.0, 36.0, 591.0, 687.0]
2026-01-23 01:37:01,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 87/100 (estimated time remaining: 23 minutes, 5 seconds)
2026-01-23 01:38:32,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:38:41,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 3081.85596 ± 619.806
2026-01-23 01:38:41,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3283.0269, 3278.636, 3264.7822, 3270.0698, 3292.6287, 1223.2706, 3272.2405, 3310.5042, 3328.6375, 3294.7637]
2026-01-23 01:38:41,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 374.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:38:41,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 88/100 (estimated time remaining: 21 minutes, 26 seconds)
2026-01-23 01:40:13,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:40:22,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2771.78955 ± 987.601
2026-01-23 01:40:22,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3192.9075, 3241.6414, 3176.1619, 3251.6243, 3194.9922, 1973.9319, 25.489016, 3211.7573, 3228.418, 3220.9722]
2026-01-23 01:40:22,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 637.0, 21.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:40:22,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 89/100 (estimated time remaining: 19 minutes, 56 seconds)
2026-01-23 01:41:53,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:41:57,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1395.12610 ± 1539.253
2026-01-23 01:41:57,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3284.3608, 3258.5847, 3269.065, 3254.9314, 676.0543, 78.39058, 28.041498, 12.950752, 75.21129, 13.670927]
2026-01-23 01:41:57,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 235.0, 49.0, 25.0, 14.0, 53.0, 15.0]
2026-01-23 01:41:57,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 90/100 (estimated time remaining: 18 minutes, 10 seconds)
2026-01-23 01:43:28,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:43:37,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 3113.85400 ± 517.184
2026-01-23 01:43:37,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3293.466, 3262.9697, 1562.7599, 3287.4705, 3298.7097, 3264.0806, 3287.6448, 3284.9617, 3303.0715, 3293.405]
2026-01-23 01:43:37,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 498.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:43:37,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 91/100 (estimated time remaining: 16 minutes, 22 seconds)
2026-01-23 01:45:10,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:45:19,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 3053.12842 ± 455.466
2026-01-23 01:45:19,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3339.783, 3282.0242, 3265.4563, 2045.5719, 3287.0193, 3259.0806, 3274.146, 3280.0366, 3247.7468, 2250.4204]
2026-01-23 01:45:19,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 614.0, 1000.0, 1000.0, 967.0, 1000.0, 1000.0, 708.0]
2026-01-23 01:45:19,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 92/100 (estimated time remaining: 14 minutes, 56 seconds)
2026-01-23 01:46:53,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:46:57,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1466.62732 ± 1122.054
2026-01-23 01:46:57,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3300.1533, 2716.5706, 1627.4396, 2794.6038, 1209.9689, 62.873726, 1675.6743, 195.6976, 56.89985, 1026.3917]
2026-01-23 01:46:57,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 835.0, 480.0, 830.0, 394.0, 40.0, 511.0, 88.0, 60.0, 344.0]
2026-01-23 01:46:57,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 93/100 (estimated time remaining: 13 minutes, 14 seconds)
2026-01-23 01:48:29,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:48:35,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2367.98853 ± 592.558
2026-01-23 01:48:35,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [2034.4573, 1764.1062, 2071.4937, 1788.1833, 1592.7034, 2658.275, 2833.0872, 3277.9382, 2352.843, 3306.7976]
2026-01-23 01:48:35,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [600.0, 530.0, 622.0, 536.0, 470.0, 792.0, 840.0, 1000.0, 701.0, 1000.0]
2026-01-23 01:48:35,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 94/100 (estimated time remaining: 11 minutes, 31 seconds)
2026-01-23 01:50:08,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:50:15,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2433.64014 ± 1071.335
2026-01-23 01:50:15,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3310.388, 1490.2396, 3288.054, 461.99292, 1342.4368, 3267.989, 3301.621, 1346.2598, 3243.761, 3283.6584]
2026-01-23 01:50:15,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 448.0, 1000.0, 171.0, 399.0, 1000.0, 1000.0, 406.0, 1000.0, 1000.0]
2026-01-23 01:50:15,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 95/100 (estimated time remaining: 9 minutes, 57 seconds)
2026-01-23 01:51:53,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:51:56,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 902.50568 ± 1302.246
2026-01-23 01:51:56,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3275.461, 2156.4941, 54.07834, 20.611134, 7.9688973, 48.624798, 98.72332, 198.02844, 55.944923, 3109.1216]
2026-01-23 01:51:56,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 672.0, 48.0, 22.0, 11.0, 37.0, 60.0, 90.0, 34.0, 1000.0]
2026-01-23 01:51:56,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 18 seconds)
2026-01-23 01:53:24,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:53:33,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2957.45239 ± 849.615
2026-01-23 01:53:33,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3243.994, 3245.9058, 3254.243, 3234.1765, 408.86554, 3257.8806, 3242.123, 3217.723, 3247.212, 3222.4028]
2026-01-23 01:53:33,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 159.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:53:33,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 35 seconds)
2026-01-23 01:55:07,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:55:15,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2780.83545 ± 930.038
2026-01-23 01:55:15,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3324.8228, 3247.4888, 3063.8457, 203.53122, 3284.0, 2699.8455, 3303.5574, 3267.1274, 2144.444, 3269.6924]
2026-01-23 01:55:15,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 918.0, 90.0, 1000.0, 825.0, 1000.0, 1000.0, 636.0, 1000.0]
2026-01-23 01:55:15,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 98/100 (estimated time remaining: 4 minutes, 58 seconds)
2026-01-23 01:56:54,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:57:00,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 1859.03149 ± 1225.468
2026-01-23 01:57:00,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3196.4324, 2725.4668, 3211.0876, 2361.6995, 25.340052, 1941.9585, 1033.683, 442.07742, 353.6622, 3298.9058]
2026-01-23 01:57:00,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 853.0, 1000.0, 726.0, 22.0, 637.0, 386.0, 168.0, 149.0, 1000.0]
2026-01-23 01:57:00,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 21 seconds)
2026-01-23 01:58:28,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:58:37,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2935.50952 ± 801.153
2026-01-23 01:58:37,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3201.9536, 3202.7104, 3206.7256, 3197.372, 3215.3308, 3169.1658, 3217.2637, 3225.6934, 3186.395, 532.4872]
2026-01-23 01:58:37,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 194.0]
2026-01-23 01:58:37,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1247 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 40 seconds)
2026-01-23 02:00:14,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:00:22,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1269 [DEBUG]: Total Reward: 2806.92920 ± 411.111
2026-01-23 02:00:22,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1270 [DEBUG]: All rewards: [3271.3628, 2911.6792, 3247.4724, 3190.916, 2668.7705, 2150.638, 2210.832, 2588.7332, 3265.1997, 2563.6897]
2026-01-23 02:00:22,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1271 [DEBUG]: All trajectory lengths: [1000.0, 859.0, 1000.0, 954.0, 790.0, 675.0, 662.0, 773.0, 1000.0, 775.0]
2026-01-23 02:00:22,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-hopper):1299 [DEBUG]: Training session finished
