2026-01-22 23:14:32,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-walker2d/DatasetOffice-bpql-mem5 
2026-01-22 23:14:32,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-walker2d/DatasetOffice-bpql-mem5 
2026-01-22 23:14:32,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x14b33c0fed90>}
2026-01-22 23:14:32,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1159 [DEBUG]: using device: cuda
2026-01-22 23:14:32,599 baseline-bpql-noisy-walker2d:77 [WARNING]: args.assumed_delay != args.horizon: 5 != 32
2026-01-22 23:14:32,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1181 [INFO]: Creating new trainer
2026-01-22 23:14:32,616 baseline-bpql-noisy-walker2d:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=47, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2026-01-22 23:14:32,616 baseline-bpql-noisy-walker2d:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-22 23:14:33,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1242 [DEBUG]: Starting training session...
2026-01-22 23:14:33,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 1/100
2026-01-22 23:15:59,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:16:02,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 362.15784 ± 218.679
2026-01-22 23:16:02,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [274.38278, 299.64075, 294.5608, 280.7313, 338.33502, 326.11288, 293.2163, 1002.2505, 165.09674, 347.25125]
2026-01-22 23:16:02,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [158.0, 189.0, 180.0, 394.0, 222.0, 208.0, 170.0, 1000.0, 290.0, 228.0]
2026-01-22 23:16:02,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (362.16) for latency DatasetOffice
2026-01-22 23:16:02,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 26 minutes, 31 seconds)
2026-01-22 23:17:36,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:17:37,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 175.84660 ± 141.986
2026-01-22 23:17:37,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [285.25027, 15.103716, -1.4874605, 289.0924, 241.70013, 277.38898, 362.4112, 9.465463, -1.2638332, 280.8051]
2026-01-22 23:17:37,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [169.0, 60.0, 153.0, 223.0, 189.0, 287.0, 307.0, 151.0, 119.0, 216.0]
2026-01-22 23:17:37,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 30 minutes, 34 seconds)
2026-01-22 23:19:12,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:19:14,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 298.61453 ± 79.200
2026-01-22 23:19:14,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [184.46175, 328.1928, 308.2123, 371.4943, 425.88962, 405.00235, 211.7726, 231.81927, 280.39615, 238.90416]
2026-01-22 23:19:14,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [119.0, 412.0, 191.0, 218.0, 281.0, 277.0, 393.0, 167.0, 162.0, 131.0]
2026-01-22 23:19:14,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 31 minutes, 28 seconds)
2026-01-22 23:20:49,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:20:51,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 403.61612 ± 79.373
2026-01-22 23:20:51,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [385.36447, 618.2221, 347.95047, 388.37512, 395.0677, 364.16574, 383.13013, 432.09637, 302.71, 419.0791]
2026-01-22 23:20:51,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [210.0, 481.0, 192.0, 212.0, 218.0, 197.0, 199.0, 266.0, 147.0, 229.0]
2026-01-22 23:20:51,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (403.62) for latency DatasetOffice
2026-01-22 23:20:51,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 31 minutes, 10 seconds)
2026-01-22 23:22:24,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:22:26,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 338.24115 ± 158.852
2026-01-22 23:22:26,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [367.63364, 568.73083, 327.65976, 502.04044, 420.31693, 394.0236, 230.00688, 103.84814, 42.760403, 425.3909]
2026-01-22 23:22:26,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [165.0, 276.0, 164.0, 213.0, 198.0, 202.0, 143.0, 125.0, 59.0, 212.0]
2026-01-22 23:22:26,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 29 minutes, 42 seconds)
2026-01-22 23:24:00,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:24:02,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 423.31830 ± 124.753
2026-01-22 23:24:02,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [363.81094, 412.72617, 357.0766, 196.85976, 498.31277, 510.7163, 437.773, 384.84525, 367.94193, 703.12067]
2026-01-22 23:24:02,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [190.0, 177.0, 185.0, 131.0, 240.0, 182.0, 186.0, 178.0, 166.0, 269.0]
2026-01-22 23:24:02,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (423.32) for latency DatasetOffice
2026-01-22 23:24:02,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 30 minutes, 29 seconds)
2026-01-22 23:25:38,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:25:40,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 397.44504 ± 48.841
2026-01-22 23:25:40,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [485.80066, 359.19037, 304.413, 414.6278, 406.31992, 459.56653, 392.8233, 411.11725, 367.65836, 372.93304]
2026-01-22 23:25:40,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [224.0, 144.0, 136.0, 183.0, 166.0, 211.0, 198.0, 187.0, 167.0, 160.0]
2026-01-22 23:25:40,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 29 minutes, 36 seconds)
2026-01-22 23:27:12,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:27:13,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 290.66348 ± 99.080
2026-01-22 23:27:13,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [231.62576, 259.3339, 280.92972, 343.42148, 423.01895, 378.14526, 365.27426, 64.14669, 345.80716, 214.93176]
2026-01-22 23:27:13,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [122.0, 130.0, 160.0, 170.0, 232.0, 222.0, 196.0, 103.0, 156.0, 137.0]
2026-01-22 23:27:13,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 26 minutes, 56 seconds)
2026-01-22 23:28:49,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:28:50,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 356.53656 ± 40.266
2026-01-22 23:28:50,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [332.9983, 320.13913, 436.94016, 324.41467, 368.03122, 324.655, 394.0999, 320.95914, 407.35037, 335.7776]
2026-01-22 23:28:50,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [163.0, 160.0, 260.0, 154.0, 181.0, 166.0, 175.0, 150.0, 173.0, 175.0]
2026-01-22 23:28:50,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 25 minutes, 24 seconds)
2026-01-22 23:30:23,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:30:25,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 390.52106 ± 111.402
2026-01-22 23:30:25,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [383.96326, 527.7776, 511.83276, 337.91956, 219.59595, 319.83606, 442.28964, 570.5265, 311.17395, 280.29553]
2026-01-22 23:30:25,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [172.0, 228.0, 225.0, 183.0, 113.0, 168.0, 187.0, 279.0, 162.0, 179.0]
2026-01-22 23:30:25,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 23 minutes, 43 seconds)
2026-01-22 23:31:59,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:32:01,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 441.69336 ± 114.276
2026-01-22 23:32:01,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [511.13358, 650.21387, 229.67044, 488.55026, 526.7434, 361.90637, 525.42267, 386.46805, 349.21445, 387.61035]
2026-01-22 23:32:01,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [305.0, 246.0, 123.0, 213.0, 261.0, 200.0, 249.0, 202.0, 183.0, 198.0]
2026-01-22 23:32:01,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (441.69) for latency DatasetOffice
2026-01-22 23:32:01,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 22 minutes, 5 seconds)
2026-01-22 23:33:34,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:33:36,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 454.04697 ± 89.589
2026-01-22 23:33:36,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [384.6097, 405.72836, 563.9368, 362.3793, 421.15897, 523.50494, 335.59375, 456.43152, 634.82806, 452.29788]
2026-01-22 23:33:36,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [195.0, 191.0, 246.0, 197.0, 208.0, 234.0, 165.0, 193.0, 261.0, 247.0]
2026-01-22 23:33:36,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (454.05) for latency DatasetOffice
2026-01-22 23:33:36,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 19 minutes, 36 seconds)
2026-01-22 23:35:10,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:35:12,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 543.10681 ± 264.610
2026-01-22 23:35:12,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [400.527, 524.503, 317.84583, 331.7351, 293.732, 633.3731, 609.77405, 484.2674, 582.0421, 1253.2682]
2026-01-22 23:35:12,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [197.0, 235.0, 165.0, 181.0, 149.0, 233.0, 244.0, 223.0, 239.0, 502.0]
2026-01-22 23:35:12,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (543.11) for latency DatasetOffice
2026-01-22 23:35:12,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 18 minutes, 47 seconds)
2026-01-22 23:36:43,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:36:45,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 488.68390 ± 122.970
2026-01-22 23:36:45,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [400.01605, 501.03482, 475.58047, 509.7616, 378.0504, 608.1168, 282.19656, 512.1734, 462.5108, 757.3982]
2026-01-22 23:36:45,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [179.0, 222.0, 199.0, 213.0, 185.0, 231.0, 146.0, 226.0, 191.0, 281.0]
2026-01-22 23:36:45,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 16 minutes, 8 seconds)
2026-01-22 23:38:20,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:38:24,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 771.07098 ± 511.094
2026-01-22 23:38:24,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [632.476, 2134.108, 530.8618, 1071.7911, 616.0781, 830.6696, 483.62906, 366.11465, 214.92555, 830.05554]
2026-01-22 23:38:24,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [304.0, 1000.0, 236.0, 531.0, 289.0, 437.0, 270.0, 177.0, 110.0, 461.0]
2026-01-22 23:38:24,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (771.07) for latency DatasetOffice
2026-01-22 23:38:24,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 15 minutes, 40 seconds)
2026-01-22 23:39:56,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:39:59,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 697.04633 ± 341.684
2026-01-22 23:39:59,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [1447.2109, 861.4764, 362.70837, 1122.4257, 751.77814, 451.03394, 491.60437, 462.33893, 345.41638, 674.47003]
2026-01-22 23:39:59,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [609.0, 359.0, 208.0, 509.0, 280.0, 211.0, 221.0, 190.0, 209.0, 247.0]
2026-01-22 23:39:59,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 13 minutes, 52 seconds)
2026-01-22 23:41:30,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:41:33,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 743.11743 ± 316.327
2026-01-22 23:41:33,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [693.71136, 1330.4636, 693.397, 1298.1423, 439.11237, 881.50195, 413.5302, 549.9694, 460.29962, 671.0466]
2026-01-22 23:41:33,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [249.0, 453.0, 303.0, 429.0, 198.0, 316.0, 198.0, 228.0, 231.0, 249.0]
2026-01-22 23:41:33,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 12 minutes, 1 second)
2026-01-22 23:43:08,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:43:09,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 427.68668 ± 131.629
2026-01-22 23:43:09,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [390.42545, 526.17816, 288.35486, 304.41534, 550.557, 361.35373, 599.2255, 310.76077, 647.295, 298.30063]
2026-01-22 23:43:09,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [167.0, 206.0, 141.0, 147.0, 224.0, 178.0, 238.0, 155.0, 243.0, 147.0]
2026-01-22 23:43:09,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 10 minutes, 29 seconds)
2026-01-22 23:44:41,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:44:43,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 845.06702 ± 197.107
2026-01-22 23:44:43,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [579.5899, 776.83844, 954.05176, 1124.7489, 731.8765, 871.1238, 789.8827, 1003.7999, 1108.5437, 510.21387]
2026-01-22 23:44:43,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [241.0, 315.0, 347.0, 412.0, 291.0, 322.0, 290.0, 354.0, 380.0, 225.0]
2026-01-22 23:44:43,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (845.07) for latency DatasetOffice
2026-01-22 23:44:43,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 9 minutes, 6 seconds)
2026-01-22 23:46:19,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:46:22,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 911.18390 ± 567.875
2026-01-22 23:46:22,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [560.888, 1834.0044, 501.54828, 326.76996, 1353.2968, 169.30658, 558.85126, 710.6199, 1450.3169, 1646.237]
2026-01-22 23:46:22,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [243.0, 564.0, 216.0, 173.0, 463.0, 169.0, 229.0, 272.0, 454.0, 554.0]
2026-01-22 23:46:22,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (911.18) for latency DatasetOffice
2026-01-22 23:46:22,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 7 minutes, 36 seconds)
2026-01-22 23:47:51,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:47:53,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 539.66986 ± 173.718
2026-01-22 23:47:53,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [459.01486, 508.57138, 564.28925, 635.8446, 376.1064, 966.1152, 343.5096, 644.94183, 378.81412, 519.4923]
2026-01-22 23:47:53,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [195.0, 198.0, 223.0, 234.0, 164.0, 366.0, 169.0, 225.0, 177.0, 215.0]
2026-01-22 23:47:53,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 4 minutes, 46 seconds)
2026-01-22 23:49:26,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:49:30,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1481.71875 ± 660.266
2026-01-22 23:49:30,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [1007.34064, 2075.6216, 369.03207, 1534.5188, 1670.9828, 2948.4783, 1657.4452, 1346.9647, 1007.48615, 1199.3163]
2026-01-22 23:49:30,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [310.0, 596.0, 155.0, 416.0, 477.0, 985.0, 514.0, 387.0, 374.0, 365.0]
2026-01-22 23:49:30,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (1481.72) for latency DatasetOffice
2026-01-22 23:49:30,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 4 minutes)
2026-01-22 23:51:03,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:51:06,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1240.23413 ± 397.447
2026-01-22 23:51:06,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [1372.4845, 1595.7664, 1429.0555, 1127.6571, 1828.7307, 364.9318, 812.17206, 1476.4271, 1070.5293, 1324.5872]
2026-01-22 23:51:06,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [403.0, 480.0, 431.0, 340.0, 580.0, 162.0, 259.0, 404.0, 350.0, 405.0]
2026-01-22 23:51:06,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 2 minutes, 21 seconds)
2026-01-22 23:52:37,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:52:39,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 908.46161 ± 231.978
2026-01-22 23:52:39,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [1026.8976, 412.65503, 951.93115, 899.81305, 1150.1725, 1045.4534, 1166.9425, 556.4601, 878.9756, 995.31525]
2026-01-22 23:52:39,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [304.0, 210.0, 294.0, 292.0, 336.0, 309.0, 355.0, 234.0, 283.0, 309.0]
2026-01-22 23:52:39,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 37 seconds)
2026-01-22 23:54:11,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:54:14,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1082.59241 ± 349.732
2026-01-22 23:54:14,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [1387.4749, 1041.4487, 1033.2787, 921.6847, 600.10443, 1207.3569, 389.36655, 1386.1431, 1568.009, 1291.0571]
2026-01-22 23:54:14,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [473.0, 367.0, 379.0, 295.0, 290.0, 377.0, 188.0, 467.0, 514.0, 421.0]
2026-01-22 23:54:14,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 26/100 (estimated time remaining: 1 hour, 58 minutes, 4 seconds)
2026-01-22 23:55:48,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:55:52,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1200.56152 ± 706.841
2026-01-22 23:55:52,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [1412.9684, 1680.0881, 1198.2673, 1139.5532, 672.7188, 2068.6094, 25.19864, 36.19167, 1821.179, 1950.841]
2026-01-22 23:55:52,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [468.0, 517.0, 381.0, 387.0, 317.0, 644.0, 49.0, 50.0, 543.0, 812.0]
2026-01-22 23:55:52,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 27/100 (estimated time remaining: 1 hour, 58 minutes, 10 seconds)
2026-01-22 23:57:24,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:57:27,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1097.92090 ± 105.386
2026-01-22 23:57:27,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [927.8515, 1331.9153, 1063.4358, 1021.07904, 1189.4469, 1073.1799, 1009.35535, 1131.1785, 1145.0884, 1086.6772]
2026-01-22 23:57:27,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [313.0, 387.0, 338.0, 339.0, 363.0, 357.0, 307.0, 367.0, 338.0, 345.0]
2026-01-22 23:57:27,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 28/100 (estimated time remaining: 1 hour, 56 minutes, 9 seconds)
2026-01-22 23:59:00,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:59:03,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1197.33484 ± 358.279
2026-01-22 23:59:03,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [1580.6206, 1558.3677, 956.60175, 1504.0297, 1219.8433, 1669.8798, 1240.1025, 815.0692, 804.3874, 624.44653]
2026-01-22 23:59:03,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [526.0, 536.0, 338.0, 512.0, 447.0, 582.0, 425.0, 295.0, 306.0, 228.0]
2026-01-22 23:59:03,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 29/100 (estimated time remaining: 1 hour, 54 minutes, 33 seconds)
2026-01-23 00:00:37,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:00:41,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1441.14746 ± 415.083
2026-01-23 00:00:41,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [2467.8186, 1093.064, 1786.222, 1267.9542, 954.56213, 1594.8011, 1081.783, 1369.3131, 1409.082, 1386.8749]
2026-01-23 00:00:41,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [754.0, 374.0, 555.0, 441.0, 321.0, 495.0, 387.0, 418.0, 464.0, 441.0]
2026-01-23 00:00:41,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 30/100 (estimated time remaining: 1 hour, 53 minutes, 57 seconds)
2026-01-23 00:02:14,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:02:19,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1974.72693 ± 1187.877
2026-01-23 00:02:19,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [2282.3645, 3083.525, 216.13394, 73.9198, 737.3188, 2759.6238, 3348.1157, 1805.971, 2046.5338, 3393.7634]
2026-01-23 00:02:19,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [635.0, 998.0, 153.0, 75.0, 216.0, 866.0, 1000.0, 543.0, 663.0, 1000.0]
2026-01-23 00:02:19,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (1974.73) for latency DatasetOffice
2026-01-23 00:02:19,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 53 minutes, 10 seconds)
2026-01-23 00:03:53,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:03:58,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1579.94104 ± 792.195
2026-01-23 00:03:58,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [1768.1187, 2217.9639, 2282.4385, 297.6995, 2526.7063, 1929.4766, 2150.122, 705.52716, 1601.1372, 320.22055]
2026-01-23 00:03:58,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [521.0, 878.0, 678.0, 223.0, 1000.0, 589.0, 668.0, 444.0, 507.0, 245.0]
2026-01-23 00:03:58,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 51 minutes, 46 seconds)
2026-01-23 00:05:28,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:05:32,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1619.98169 ± 1074.369
2026-01-23 00:05:32,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [290.7808, 260.05698, 1159.4111, 1995.6849, 3023.272, 1866.8556, 1326.292, 2805.891, 3154.1663, 317.40753]
2026-01-23 00:05:32,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [191.0, 180.0, 400.0, 656.0, 1000.0, 537.0, 413.0, 775.0, 1000.0, 229.0]
2026-01-23 00:05:32,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 49 minutes, 58 seconds)
2026-01-23 00:07:11,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:07:14,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1028.68481 ± 659.572
2026-01-23 00:07:14,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [867.05176, 932.2398, 349.68027, 130.37614, 914.30493, 2170.4836, 1363.3735, 2074.0818, 317.42337, 1167.8322]
2026-01-23 00:07:14,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [313.0, 362.0, 150.0, 100.0, 312.0, 636.0, 418.0, 640.0, 225.0, 402.0]
2026-01-23 00:07:14,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 49 minutes, 32 seconds)
2026-01-23 00:08:40,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:08:46,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1971.93481 ± 802.701
2026-01-23 00:08:46,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [1927.5826, 3416.279, 1497.6702, 1336.0608, 1769.1254, 1116.3468, 1001.9059, 3236.945, 1821.7875, 2595.6438]
2026-01-23 00:08:46,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [566.0, 1000.0, 438.0, 402.0, 482.0, 364.0, 337.0, 903.0, 484.0, 677.0]
2026-01-23 00:08:46,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 46 minutes, 35 seconds)
2026-01-23 00:10:18,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:10:24,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2109.12842 ± 805.852
2026-01-23 00:10:24,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3590.8481, 1694.4534, 2452.1814, 1190.9722, 1855.4384, 1430.3649, 1583.1111, 3518.2808, 2262.2407, 1513.3943]
2026-01-23 00:10:24,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 477.0, 761.0, 362.0, 494.0, 404.0, 421.0, 1000.0, 582.0, 421.0]
2026-01-23 00:10:24,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (2109.13) for latency DatasetOffice
2026-01-23 00:10:24,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 44 minutes, 56 seconds)
2026-01-23 00:11:59,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:12:04,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2361.44678 ± 663.761
2026-01-23 00:12:04,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [2483.0427, 1762.8195, 2428.584, 2008.7899, 2762.6785, 992.6071, 2419.7517, 3616.5613, 2888.8132, 2250.8203]
2026-01-23 00:12:04,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [637.0, 458.0, 622.0, 511.0, 780.0, 300.0, 642.0, 851.0, 740.0, 593.0]
2026-01-23 00:12:04,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (2361.45) for latency DatasetOffice
2026-01-23 00:12:04,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 43 minutes, 46 seconds)
2026-01-23 00:13:35,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:13:42,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2454.43799 ± 867.202
2026-01-23 00:13:42,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [1542.3583, 1828.5609, 3123.4106, 2078.1218, 1645.1584, 1287.081, 3696.36, 3849.7231, 2624.659, 2868.9463]
2026-01-23 00:13:42,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [469.0, 477.0, 855.0, 596.0, 486.0, 390.0, 1000.0, 988.0, 722.0, 735.0]
2026-01-23 00:13:42,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (2454.44) for latency DatasetOffice
2026-01-23 00:13:42,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 42 minutes, 44 seconds)
2026-01-23 00:15:19,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:15:25,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2651.89795 ± 580.044
2026-01-23 00:15:25,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [2834.6245, 1718.8748, 3527.1953, 2859.2354, 2939.1677, 2570.4666, 2813.6643, 2274.738, 3309.5874, 1671.4261]
2026-01-23 00:15:25,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [758.0, 476.0, 1000.0, 784.0, 772.0, 683.0, 737.0, 597.0, 936.0, 478.0]
2026-01-23 00:15:25,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (2651.90) for latency DatasetOffice
2026-01-23 00:15:25,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 41 minutes, 35 seconds)
2026-01-23 00:16:55,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:17:00,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2326.78345 ± 1023.975
2026-01-23 00:17:00,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [1775.0887, 3700.8218, 2518.923, 2702.1199, 2001.7205, 3562.1455, 1580.8098, 3644.0942, 705.4318, 1076.6793]
2026-01-23 00:17:00,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [493.0, 1000.0, 651.0, 709.0, 581.0, 949.0, 461.0, 921.0, 280.0, 357.0]
2026-01-23 00:17:00,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 40 minutes, 37 seconds)
2026-01-23 00:18:32,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:18:37,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2181.91553 ± 614.999
2026-01-23 00:18:37,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [2143.8713, 1840.7827, 1883.0311, 1995.8802, 1846.3851, 1794.5979, 2795.3857, 2862.4543, 3422.8013, 1233.9668]
2026-01-23 00:18:37,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [552.0, 488.0, 475.0, 517.0, 476.0, 470.0, 687.0, 667.0, 831.0, 346.0]
2026-01-23 00:18:37,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 38 minutes, 41 seconds)
2026-01-23 00:20:11,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:20:17,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2392.43896 ± 1065.898
2026-01-23 00:20:17,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [1651.9961, 2257.1794, 111.66632, 1742.0317, 3849.239, 3181.214, 2554.4285, 2390.6533, 3934.9136, 2251.0676]
2026-01-23 00:20:17,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [448.0, 598.0, 84.0, 522.0, 1000.0, 781.0, 687.0, 618.0, 975.0, 585.0]
2026-01-23 00:20:17,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 36 minutes, 48 seconds)
2026-01-23 00:21:50,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:21:55,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1851.95215 ± 522.584
2026-01-23 00:21:55,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [1645.909, 1366.5991, 2692.2415, 1430.1998, 2257.9954, 1478.5359, 2682.3284, 1507.7385, 1267.25, 2190.7236]
2026-01-23 00:21:55,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [470.0, 397.0, 708.0, 398.0, 617.0, 430.0, 685.0, 419.0, 372.0, 597.0]
2026-01-23 00:21:55,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 35 minutes, 16 seconds)
2026-01-23 00:23:27,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:23:34,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2710.14844 ± 970.293
2026-01-23 00:23:34,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3615.0083, 1778.462, 4008.7893, 1596.6321, 1511.4559, 3644.7588, 2682.274, 2319.7202, 1923.2441, 4021.138]
2026-01-23 00:23:34,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [944.0, 475.0, 1000.0, 451.0, 429.0, 870.0, 649.0, 628.0, 496.0, 1000.0]
2026-01-23 00:23:34,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (2710.15) for latency DatasetOffice
2026-01-23 00:23:34,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 32 minutes, 47 seconds)
2026-01-23 00:25:08,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:25:13,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2222.50586 ± 421.146
2026-01-23 00:25:13,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [2603.2332, 2283.3362, 2048.686, 1731.5056, 1801.7635, 2287.224, 2562.2102, 2542.7397, 1492.5143, 2871.8455]
2026-01-23 00:25:13,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [651.0, 577.0, 521.0, 474.0, 470.0, 587.0, 628.0, 644.0, 412.0, 734.0]
2026-01-23 00:25:13,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 31 minutes, 58 seconds)
2026-01-23 00:26:49,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:26:55,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2459.71533 ± 1129.907
2026-01-23 00:26:55,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3637.666, 1268.5082, 3916.4487, 1915.9567, 2142.053, 2595.0754, 1970.6313, 4004.3318, 303.0718, 2843.409]
2026-01-23 00:26:55,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 392.0, 1000.0, 549.0, 635.0, 679.0, 564.0, 1000.0, 214.0, 753.0]
2026-01-23 00:26:55,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 31 minutes, 13 seconds)
2026-01-23 00:28:23,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:28:28,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1836.50330 ± 750.222
2026-01-23 00:28:28,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [17.343527, 1206.6821, 2377.9312, 2128.6958, 2218.3809, 1999.3167, 1952.3605, 2052.822, 1485.1454, 2926.356]
2026-01-23 00:28:28,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [32.0, 411.0, 630.0, 590.0, 595.0, 594.0, 552.0, 585.0, 420.0, 782.0]
2026-01-23 00:28:28,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 28 minutes, 27 seconds)
2026-01-23 00:30:02,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:30:10,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2938.31421 ± 1175.217
2026-01-23 00:30:10,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [4005.3682, 4075.0242, 3867.599, 3673.8638, 3944.0466, 158.35257, 2407.628, 2506.2708, 2119.7637, 2625.2258]
2026-01-23 00:30:10,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 990.0, 987.0, 1000.0, 1000.0, 103.0, 639.0, 662.0, 553.0, 654.0]
2026-01-23 00:30:10,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (2938.31) for latency DatasetOffice
2026-01-23 00:30:10,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 27 minutes, 27 seconds)
2026-01-23 00:31:44,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:31:51,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2986.01001 ± 1051.903
2026-01-23 00:31:51,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [4009.6238, 4114.687, 3322.06, 3954.726, 1784.3865, 974.93207, 2495.0105, 3299.397, 3917.548, 1987.7277]
2026-01-23 00:31:51,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 820.0, 1000.0, 507.0, 312.0, 674.0, 859.0, 1000.0, 526.0]
2026-01-23 00:31:51,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (2986.01) for latency DatasetOffice
2026-01-23 00:31:51,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 26 minutes, 17 seconds)
2026-01-23 00:33:26,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:33:31,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2115.60693 ± 966.635
2026-01-23 00:33:31,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [4236.734, 1023.76556, 1477.8939, 3196.473, 1874.777, 1031.4779, 2654.6184, 2397.326, 1578.4569, 1684.5469]
2026-01-23 00:33:31,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 337.0, 413.0, 766.0, 494.0, 334.0, 680.0, 616.0, 454.0, 469.0]
2026-01-23 00:33:31,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 24 minutes, 40 seconds)
2026-01-23 00:34:59,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:35:07,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3621.01904 ± 807.267
2026-01-23 00:35:07,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [4150.286, 4171.5654, 4329.7017, 4155.396, 3112.8506, 4152.6904, 4227.565, 1725.2546, 3297.708, 2887.1768]
2026-01-23 00:35:07,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 731.0, 1000.0, 1000.0, 443.0, 771.0, 724.0]
2026-01-23 00:35:07,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (3621.02) for latency DatasetOffice
2026-01-23 00:35:07,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 22 minutes, 4 seconds)
2026-01-23 00:36:46,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:36:53,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3161.60742 ± 1064.978
2026-01-23 00:36:53,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [1796.3324, 2869.634, 3389.2058, 3969.3245, 4197.294, 1685.0465, 4081.206, 1463.4845, 4008.3157, 4156.2295]
2026-01-23 00:36:53,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [490.0, 728.0, 818.0, 1000.0, 1000.0, 452.0, 1000.0, 409.0, 1000.0, 1000.0]
2026-01-23 00:36:53,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 22 minutes, 31 seconds)
2026-01-23 00:38:24,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:38:32,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3645.49683 ± 1085.993
2026-01-23 00:38:32,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [4345.559, 3858.6694, 4539.058, 1777.3984, 3954.2197, 4407.2705, 4602.357, 1349.5416, 3973.2651, 3647.6284]
2026-01-23 00:38:32,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 452.0, 1000.0, 1000.0, 1000.0, 378.0, 1000.0, 864.0]
2026-01-23 00:38:32,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (3645.50) for latency DatasetOffice
2026-01-23 00:38:32,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 20 minutes, 26 seconds)
2026-01-23 00:40:04,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:40:12,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3273.48389 ± 985.519
2026-01-23 00:40:12,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [2744.1133, 4046.6865, 1813.6913, 4190.096, 4118.5044, 3511.9863, 1354.471, 2846.0317, 3997.276, 4111.9814]
2026-01-23 00:40:12,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [682.0, 1000.0, 446.0, 1000.0, 1000.0, 871.0, 376.0, 706.0, 1000.0, 1000.0]
2026-01-23 00:40:12,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 18 minutes, 21 seconds)
2026-01-23 00:41:43,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:41:50,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3053.37256 ± 1006.724
2026-01-23 00:41:50,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [1975.3293, 2017.7339, 2991.4546, 2886.4749, 4031.1853, 3971.3774, 3749.797, 3919.427, 3946.698, 1044.2457]
2026-01-23 00:41:50,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [471.0, 529.0, 708.0, 744.0, 1000.0, 980.0, 911.0, 1000.0, 1000.0, 318.0]
2026-01-23 00:41:50,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 16 minutes, 31 seconds)
2026-01-23 00:43:24,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:43:32,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3463.91748 ± 602.597
2026-01-23 00:43:32,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3345.2012, 4117.617, 3277.2253, 4056.139, 4046.715, 3818.2793, 2614.9932, 3961.5808, 3021.4175, 2380.0068]
2026-01-23 00:43:32,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [794.0, 1000.0, 902.0, 977.0, 1000.0, 1000.0, 685.0, 1000.0, 738.0, 590.0]
2026-01-23 00:43:32,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 15 minutes, 42 seconds)
2026-01-23 00:45:10,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:45:17,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3260.50195 ± 883.914
2026-01-23 00:45:17,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3415.8567, 1383.522, 3716.47, 4259.476, 2699.2415, 2889.0168, 3010.2876, 4369.29, 2638.1924, 4223.667]
2026-01-23 00:45:17,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [794.0, 392.0, 856.0, 1000.0, 652.0, 681.0, 736.0, 1000.0, 645.0, 1000.0]
2026-01-23 00:45:17,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 13 minutes, 53 seconds)
2026-01-23 00:46:46,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:46:52,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2600.26025 ± 1200.940
2026-01-23 00:46:52,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [233.48523, 2147.982, 3785.9492, 2615.425, 2747.1863, 4347.528, 1018.62317, 3378.3079, 3531.0117, 2197.1047]
2026-01-23 00:46:52,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [117.0, 545.0, 1000.0, 609.0, 628.0, 1000.0, 301.0, 853.0, 924.0, 597.0]
2026-01-23 00:46:52,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 11 minutes, 40 seconds)
2026-01-23 00:48:27,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:48:35,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3259.93115 ± 1070.255
2026-01-23 00:48:35,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3133.3477, 2624.5142, 4439.4473, 3464.834, 4312.2485, 1094.7554, 4096.6533, 1718.4711, 3945.9673, 3769.072]
2026-01-23 00:48:35,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [786.0, 617.0, 1000.0, 789.0, 1000.0, 312.0, 1000.0, 462.0, 1000.0, 889.0]
2026-01-23 00:48:35,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 10 minutes, 25 seconds)
2026-01-23 00:50:09,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:50:14,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2430.66504 ± 1367.930
2026-01-23 00:50:14,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [1751.294, 2985.703, 4377.74, 1090.0248, 1388.1333, 4197.1836, 4530.902, 1542.7786, 1114.0378, 1328.8544]
2026-01-23 00:50:14,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [415.0, 729.0, 1000.0, 301.0, 359.0, 1000.0, 1000.0, 399.0, 307.0, 353.0]
2026-01-23 00:50:14,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 8 minutes, 53 seconds)
2026-01-23 00:51:44,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:51:51,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2712.81494 ± 1975.122
2026-01-23 00:51:51,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [4209.168, 4418.3164, 4295.806, 797.9455, 278.52133, 114.14962, 32.439686, 4397.176, 4273.2104, 4311.4136]
2026-01-23 00:51:51,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 250.0, 152.0, 86.0, 50.0, 982.0, 1000.0, 1000.0]
2026-01-23 00:51:51,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 6 minutes, 29 seconds)
2026-01-23 00:53:24,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:53:31,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3244.18042 ± 1424.655
2026-01-23 00:53:31,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [4271.278, 2671.168, 0.678809, 4416.937, 1205.7474, 4184.9585, 3994.331, 3669.3062, 3856.4587, 4170.9395]
2026-01-23 00:53:31,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 654.0, 13.0, 1000.0, 383.0, 985.0, 940.0, 901.0, 936.0, 1000.0]
2026-01-23 00:53:31,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 4 minutes, 13 seconds)
2026-01-23 00:55:09,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:55:16,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2953.49902 ± 1499.954
2026-01-23 00:55:16,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [4278.4854, 3858.232, 1062.7699, 3788.9893, 688.6645, 948.5891, 4309.0894, 4474.807, 4141.2466, 1984.1178]
2026-01-23 00:55:16,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 895.0, 312.0, 895.0, 232.0, 290.0, 971.0, 1000.0, 1000.0, 542.0]
2026-01-23 00:55:16,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 3 minutes, 48 seconds)
2026-01-23 00:56:51,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:56:55,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1809.21289 ± 1069.977
2026-01-23 00:56:55,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [1845.1093, 2164.7686, 3228.4204, 4125.7905, 1345.9866, 1639.1323, 1201.3871, 750.95605, 1413.3312, 377.2464]
2026-01-23 00:56:55,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [485.0, 535.0, 750.0, 1000.0, 370.0, 456.0, 348.0, 241.0, 390.0, 161.0]
2026-01-23 00:56:55,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 1 minute, 43 seconds)
2026-01-23 00:58:25,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:58:33,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3744.62964 ± 1155.779
2026-01-23 00:58:33,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3915.9785, 4208.937, 4261.855, 4323.616, 4191.209, 355.5984, 4175.085, 4343.559, 3471.1084, 4199.348]
2026-01-23 00:58:33,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 161.0, 1000.0, 1000.0, 874.0, 1000.0]
2026-01-23 00:58:33,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (3744.63) for latency DatasetOffice
2026-01-23 00:58:33,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 65/100 (estimated time remaining: 59 minutes, 52 seconds)
2026-01-23 01:00:03,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:00:11,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3547.28979 ± 1112.739
2026-01-23 01:00:11,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3207.7793, 4328.814, 3240.943, 4373.6436, 4100.9185, 4266.2534, 3384.7522, 476.90094, 3818.762, 4274.131]
2026-01-23 01:00:11,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [768.0, 1000.0, 764.0, 1000.0, 1000.0, 1000.0, 789.0, 181.0, 838.0, 1000.0]
2026-01-23 01:00:11,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 66/100 (estimated time remaining: 58 minutes, 22 seconds)
2026-01-23 01:01:44,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:01:52,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3561.84253 ± 838.023
2026-01-23 01:01:52,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3854.556, 3561.3416, 3787.7349, 4105.1245, 4176.5576, 3960.9653, 4195.867, 1861.5029, 4118.448, 1996.3289]
2026-01-23 01:01:52,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 838.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 481.0, 1000.0, 585.0]
2026-01-23 01:01:52,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 67/100 (estimated time remaining: 56 minutes, 47 seconds)
2026-01-23 01:03:32,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:03:38,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3040.31763 ± 1300.635
2026-01-23 01:03:38,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [4594.1133, 1791.2639, 3566.755, 3522.9575, 4340.484, 2453.514, 4193.2974, 3926.0364, 913.30884, 1101.4457]
2026-01-23 01:03:38,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 428.0, 799.0, 764.0, 1000.0, 592.0, 926.0, 863.0, 269.0, 308.0]
2026-01-23 01:03:38,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 68/100 (estimated time remaining: 55 minutes, 12 seconds)
2026-01-23 01:05:07,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:05:15,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3743.62183 ± 1224.873
2026-01-23 01:05:15,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [2906.2856, 3584.8145, 4375.8345, 4338.9233, 4414.4146, 4487.075, 4420.1436, 4593.203, 381.43668, 3934.0862]
2026-01-23 01:05:15,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [672.0, 838.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 158.0, 964.0]
2026-01-23 01:05:15,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 69/100 (estimated time remaining: 53 minutes, 19 seconds)
2026-01-23 01:06:49,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:06:56,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3213.08643 ± 912.771
2026-01-23 01:06:56,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [4067.2783, 3804.0916, 1164.691, 1999.3446, 2947.415, 3195.0574, 4079.6653, 4029.7083, 3360.0686, 3483.5444]
2026-01-23 01:06:56,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 349.0, 488.0, 681.0, 836.0, 1000.0, 980.0, 785.0, 837.0]
2026-01-23 01:06:56,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 70/100 (estimated time remaining: 51 minutes, 58 seconds)
2026-01-23 01:08:25,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:08:32,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3271.71362 ± 1414.922
2026-01-23 01:08:32,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3853.0735, 584.1852, 4352.802, 2887.9136, 4188.9355, 4425.4062, 4462.336, 3168.7966, 663.0732, 4130.6147]
2026-01-23 01:08:32,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [862.0, 216.0, 1000.0, 702.0, 1000.0, 1000.0, 1000.0, 738.0, 252.0, 1000.0]
2026-01-23 01:08:32,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 71/100 (estimated time remaining: 50 minutes, 8 seconds)
2026-01-23 01:10:08,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:10:16,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3325.91992 ± 955.630
2026-01-23 01:10:16,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [2457.7307, 3354.7637, 1650.5812, 4090.924, 4207.134, 4192.3164, 4120.2993, 3850.8008, 3585.0662, 1749.5829]
2026-01-23 01:10:16,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [625.0, 795.0, 439.0, 937.0, 1000.0, 1000.0, 1000.0, 1000.0, 852.0, 469.0]
2026-01-23 01:10:16,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 72/100 (estimated time remaining: 48 minutes, 39 seconds)
2026-01-23 01:11:49,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:11:56,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3142.96143 ± 949.986
2026-01-23 01:11:56,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [4364.9595, 2018.9751, 4533.734, 2138.4531, 2688.8516, 2278.0435, 3326.9678, 2181.8809, 3710.1287, 4187.6206]
2026-01-23 01:11:56,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 508.0, 1000.0, 516.0, 613.0, 540.0, 785.0, 531.0, 934.0, 1000.0]
2026-01-23 01:11:56,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 73/100 (estimated time remaining: 46 minutes, 26 seconds)
2026-01-23 01:13:33,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:13:40,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3709.54565 ± 1231.534
2026-01-23 01:13:40,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [4549.706, 1015.592, 2145.223, 3594.1328, 4710.335, 4480.5, 4520.9893, 4701.374, 4556.0137, 2821.589]
2026-01-23 01:13:40,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 292.0, 616.0, 854.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 640.0]
2026-01-23 01:13:40,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 74/100 (estimated time remaining: 45 minutes, 29 seconds)
2026-01-23 01:15:17,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:15:22,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2220.56567 ± 1255.149
2026-01-23 01:15:22,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [2536.3555, 1771.1078, 2486.3623, 549.67255, 150.73238, 3308.8728, 1379.1759, 3410.7253, 4455.004, 2157.6475]
2026-01-23 01:15:22,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [636.0, 430.0, 595.0, 207.0, 105.0, 849.0, 375.0, 785.0, 1000.0, 544.0]
2026-01-23 01:15:22,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 75/100 (estimated time remaining: 43 minutes, 51 seconds)
2026-01-23 01:16:57,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:17:06,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 4196.45947 ± 718.762
2026-01-23 01:17:06,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [4385.8745, 3443.0776, 4512.351, 4587.919, 2292.1836, 4649.183, 4375.675, 4618.4126, 4645.855, 4454.0576]
2026-01-23 01:17:06,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 772.0, 1000.0, 1000.0, 546.0, 1000.0, 939.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:17:06,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (4196.46) for latency DatasetOffice
2026-01-23 01:17:06,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 76/100 (estimated time remaining: 42 minutes, 47 seconds)
2026-01-23 01:18:34,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:18:41,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3156.16504 ± 927.789
2026-01-23 01:18:41,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [2783.6365, 4119.707, 3801.4211, 990.696, 3791.1255, 2770.3333, 3386.0498, 3731.0598, 3957.5093, 2230.1113]
2026-01-23 01:18:41,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [724.0, 1000.0, 865.0, 293.0, 876.0, 649.0, 785.0, 874.0, 920.0, 580.0]
2026-01-23 01:18:41,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 77/100 (estimated time remaining: 40 minutes, 27 seconds)
2026-01-23 01:20:16,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:20:23,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2932.56152 ± 1543.768
2026-01-23 01:20:23,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [4159.5957, 4316.0664, 2610.4526, 1597.8534, 499.9049, 128.56241, 4125.1787, 4238.1816, 3707.6296, 3942.1902]
2026-01-23 01:20:23,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 595.0, 409.0, 169.0, 96.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:20:23,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 78/100 (estimated time remaining: 38 minutes, 55 seconds)
2026-01-23 01:22:00,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:22:10,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 4298.37354 ± 392.513
2026-01-23 01:22:10,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [4477.424, 4311.12, 4357.63, 4536.7695, 4435.9854, 4484.4995, 3139.7031, 4491.132, 4334.812, 4414.658]
2026-01-23 01:22:10,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 697.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:22:10,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (4298.37) for latency DatasetOffice
2026-01-23 01:22:10,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 79/100 (estimated time remaining: 37 minutes, 19 seconds)
2026-01-23 01:23:42,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:23:51,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3995.11133 ± 657.311
2026-01-23 01:23:51,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [4218.9707, 4152.8047, 4303.4766, 3941.3364, 4239.0103, 4474.233, 3909.8877, 4318.912, 4307.562, 2084.9172]
2026-01-23 01:23:51,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 958.0, 1000.0, 1000.0, 594.0]
2026-01-23 01:23:51,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 80/100 (estimated time remaining: 35 minutes, 36 seconds)
2026-01-23 01:25:19,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:25:26,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3246.18481 ± 1798.971
2026-01-23 01:25:26,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [30.000723, 1383.6465, 385.51315, 4513.448, 4635.886, 4623.6504, 4530.3433, 3305.9934, 4526.431, 4526.931]
2026-01-23 01:25:26,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [56.0, 388.0, 154.0, 1000.0, 1000.0, 1000.0, 1000.0, 773.0, 1000.0, 1000.0]
2026-01-23 01:25:26,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 81/100 (estimated time remaining: 33 minutes, 19 seconds)
2026-01-23 01:27:05,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:27:14,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 4419.26660 ± 58.812
2026-01-23 01:27:14,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [4416.669, 4365.3184, 4402.2554, 4329.5176, 4479.274, 4444.8066, 4512.1294, 4493.0933, 4351.9297, 4397.674]
2026-01-23 01:27:14,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:27:14,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (4419.27) for latency DatasetOffice
2026-01-23 01:27:14,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 82/100 (estimated time remaining: 32 minutes, 29 seconds)
2026-01-23 01:28:48,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:28:56,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3902.55322 ± 976.957
2026-01-23 01:28:56,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [4187.6704, 1506.491, 4232.2075, 4442.898, 4511.822, 4387.322, 4497.0957, 4219.291, 4512.651, 2528.0837]
2026-01-23 01:28:56,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 383.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 612.0]
2026-01-23 01:28:56,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 83/100 (estimated time remaining: 30 minutes, 46 seconds)
2026-01-23 01:30:26,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:30:35,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 4061.98560 ± 811.375
2026-01-23 01:30:35,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [4183.7974, 4162.0293, 4239.3447, 4465.849, 4364.9453, 1657.3961, 4193.518, 4553.815, 4437.7896, 4361.3716]
2026-01-23 01:30:35,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 435.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:30:35,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 84/100 (estimated time remaining: 28 minutes, 37 seconds)
2026-01-23 01:32:14,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:32:22,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3653.19092 ± 1023.651
2026-01-23 01:32:22,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [1901.5171, 4505.2104, 1848.4369, 4642.8516, 4463.7407, 3692.0425, 4347.8867, 3333.5115, 3162.0732, 4634.6416]
2026-01-23 01:32:22,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [466.0, 1000.0, 442.0, 1000.0, 1000.0, 813.0, 1000.0, 913.0, 763.0, 1000.0]
2026-01-23 01:32:22,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 85/100 (estimated time remaining: 27 minutes, 14 seconds)
2026-01-23 01:33:52,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:33:58,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2783.15967 ± 1349.560
2026-01-23 01:33:58,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [4368.9385, 4408.2124, 4434.2188, 2851.1714, 3859.1953, 2031.3909, 1777.3611, 1482.2494, 427.32138, 2191.5356]
2026-01-23 01:33:58,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 671.0, 893.0, 501.0, 475.0, 430.0, 163.0, 559.0]
2026-01-23 01:33:58,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 86/100 (estimated time remaining: 25 minutes, 37 seconds)
2026-01-23 01:35:35,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:35:43,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3627.34644 ± 920.220
2026-01-23 01:35:43,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [4463.3984, 1972.4376, 2815.7444, 4339.462, 4309.439, 4474.436, 2564.6453, 4355.528, 4190.6055, 2787.7712]
2026-01-23 01:35:43,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 491.0, 653.0, 1000.0, 1000.0, 1000.0, 632.0, 1000.0, 1000.0, 655.0]
2026-01-23 01:35:43,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 87/100 (estimated time remaining: 23 minutes, 45 seconds)
2026-01-23 01:37:12,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:37:20,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3779.35107 ± 1188.092
2026-01-23 01:37:20,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [1714.2537, 4331.3755, 4402.526, 4315.243, 4416.701, 4251.498, 4474.788, 1127.9264, 4319.085, 4440.1143]
2026-01-23 01:37:20,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [432.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 305.0, 1000.0, 1000.0]
2026-01-23 01:37:20,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 88/100 (estimated time remaining: 21 minutes, 49 seconds)
2026-01-23 01:38:53,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:39:01,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3753.03589 ± 764.689
2026-01-23 01:39:01,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [4360.3765, 2395.2747, 4426.4277, 4396.94, 2965.0005, 4394.874, 4227.91, 2882.9006, 3127.4958, 4353.158]
2026-01-23 01:39:01,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 562.0, 1000.0, 1000.0, 680.0, 1000.0, 1000.0, 673.0, 712.0, 1000.0]
2026-01-23 01:39:01,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 89/100 (estimated time remaining: 20 minutes, 15 seconds)
2026-01-23 01:40:38,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:40:47,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3957.88208 ± 787.701
2026-01-23 01:40:47,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [4179.806, 3877.3318, 4431.762, 4272.3823, 4254.6816, 4252.2764, 4021.0737, 1644.6399, 4200.2153, 4444.652]
2026-01-23 01:40:47,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 426.0, 1000.0, 1000.0]
2026-01-23 01:40:47,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 90/100 (estimated time remaining: 18 minutes, 31 seconds)
2026-01-23 01:42:20,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:42:27,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2968.91284 ± 1698.139
2026-01-23 01:42:27,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [4466.188, 4345.3804, 4410.3486, 1026.1703, 4215.0674, 490.67206, 4276.688, 147.33919, 3995.9954, 2315.277]
2026-01-23 01:42:27,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 308.0, 1000.0, 186.0, 1000.0, 90.0, 1000.0, 574.0]
2026-01-23 01:42:27,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 91/100 (estimated time remaining: 16 minutes, 56 seconds)
2026-01-23 01:43:59,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:44:09,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 4304.18848 ± 232.954
2026-01-23 01:44:09,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3703.3345, 4218.1553, 4454.299, 4483.0205, 4369.5005, 4495.985, 4493.913, 4363.4326, 4348.884, 4111.362]
2026-01-23 01:44:09,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [869.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 994.0, 1000.0, 1000.0]
2026-01-23 01:44:09,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 92/100 (estimated time remaining: 15 minutes, 9 seconds)
2026-01-23 01:45:41,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:45:50,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3831.90698 ± 486.371
2026-01-23 01:45:50,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [4245.276, 4168.7837, 4039.7158, 4240.883, 4052.3105, 3761.374, 2594.6785, 3390.1414, 4110.6655, 3715.244]
2026-01-23 01:45:50,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 663.0, 790.0, 1000.0, 891.0]
2026-01-23 01:45:50,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 93/100 (estimated time remaining: 13 minutes, 36 seconds)
2026-01-23 01:47:23,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:47:30,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2949.11230 ± 1694.584
2026-01-23 01:47:30,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [4412.619, 992.86755, 3203.642, 4508.894, 3106.5525, 3726.0105, 0.6598415, 500.73444, 4538.5767, 4500.5664]
2026-01-23 01:47:30,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 295.0, 734.0, 1000.0, 693.0, 856.0, 13.0, 173.0, 1000.0, 1000.0]
2026-01-23 01:47:30,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 94/100 (estimated time remaining: 11 minutes, 51 seconds)
2026-01-23 01:49:04,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:49:14,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 4520.45410 ± 61.354
2026-01-23 01:49:14,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [4553.5312, 4489.3057, 4422.8945, 4550.949, 4560.0146, 4480.4775, 4412.2637, 4589.7544, 4574.4517, 4570.902]
2026-01-23 01:49:14,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:49:14,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (4520.45) for latency DatasetOffice
2026-01-23 01:49:14,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 8 seconds)
2026-01-23 01:50:49,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:50:58,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 4179.74268 ± 894.318
2026-01-23 01:50:58,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [4626.5264, 4573.5483, 4463.854, 4401.6714, 4245.0547, 4558.9473, 4338.6396, 4603.6206, 4466.324, 1519.2406]
2026-01-23 01:50:58,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 412.0]
2026-01-23 01:50:58,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 31 seconds)
2026-01-23 01:52:29,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:52:37,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3355.32886 ± 1499.962
2026-01-23 01:52:37,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [2996.2656, 2540.8792, 46.76869, 4265.1753, 4437.052, 1401.8894, 4543.538, 4453.0044, 4454.687, 4414.0273]
2026-01-23 01:52:37,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [684.0, 608.0, 69.0, 1000.0, 1000.0, 367.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:52:37,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 46 seconds)
2026-01-23 01:54:15,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:54:23,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3708.26636 ± 1354.371
2026-01-23 01:54:23,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [4409.392, 4483.5654, 3460.4846, 2819.6965, -8.248806, 3895.3464, 4513.463, 4549.112, 4533.8633, 4425.9893]
2026-01-23 01:54:23,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 773.0, 649.0, 17.0, 885.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:54:23,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 7 seconds)
2026-01-23 01:55:50,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:55:58,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3527.83325 ± 914.069
2026-01-23 01:55:58,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3818.4749, 4337.745, 4271.02, 1738.583, 2350.6997, 2994.0042, 4318.35, 2890.7656, 4156.6284, 4402.062]
2026-01-23 01:55:58,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [900.0, 1000.0, 1000.0, 453.0, 590.0, 700.0, 1000.0, 700.0, 1000.0, 1000.0]
2026-01-23 01:55:58,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 23 seconds)
2026-01-23 01:57:31,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:57:40,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 4186.20459 ± 472.848
2026-01-23 01:57:40,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [4607.957, 2846.406, 4265.1074, 4238.629, 4036.823, 4243.0186, 4322.246, 4562.412, 4392.1133, 4347.334]
2026-01-23 01:57:40,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 662.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:57:40,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 41 seconds)
2026-01-23 01:59:16,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:59:21,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2453.99585 ± 1226.319
2026-01-23 01:59:21,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [723.7807, 756.951, 2803.575, 1891.5946, 2184.545, 2314.7937, 4325.1924, 4471.63, 1809.5548, 3258.3418]
2026-01-23 01:59:21,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [228.0, 239.0, 665.0, 482.0, 537.0, 542.0, 993.0, 1000.0, 466.0, 747.0]
2026-01-23 01:59:21,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1299 [DEBUG]: Training session finished
