2026-01-22 23:14:16,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-walker2d/DatasetOffice-bpql-mem1  
2026-01-22 23:14:16,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-walker2d/DatasetOffice-bpql-mem1  
2026-01-22 23:14:16,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x148d9960c3d0>}
2026-01-22 23:14:16,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1159 [DEBUG]: using device: cuda
2026-01-22 23:14:17,120 baseline-bpql-noisy-walker2d:77 [WARNING]: args.assumed_delay != args.horizon: 1 != 32
2026-01-22 23:14:17,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1181 [INFO]: Creating new trainer
2026-01-22 23:14:17,136 baseline-bpql-noisy-walker2d:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=23, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2026-01-22 23:14:17,136 baseline-bpql-noisy-walker2d:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-22 23:14:17,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1242 [DEBUG]: Starting training session...
2026-01-22 23:14:17,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 1/100
2026-01-22 23:15:41,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:15:42,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 25.40261 ± 159.627
2026-01-22 23:15:42,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [-15.096734, 501.97577, -31.643244, 4.113334, -34.997528, -61.45721, -21.055798, -26.635427, -32.58836, -28.588663]
2026-01-22 23:15:42,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [72.0, 331.0, 94.0, 139.0, 108.0, 108.0, 82.0, 100.0, 98.0, 87.0]
2026-01-22 23:15:42,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (25.40) for latency DatasetOffice
2026-01-22 23:15:42,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 19 minutes, 15 seconds)
2026-01-22 23:17:14,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:17:15,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: -36.33228 ± 19.530
2026-01-22 23:17:15,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [-69.927246, -31.213758, -11.500514, -37.726772, -54.314835, -26.90709, -57.252308, -26.444721, -43.892536, -4.143052]
2026-01-22 23:17:15,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [113.0, 94.0, 121.0, 103.0, 103.0, 101.0, 125.0, 201.0, 123.0, 122.0]
2026-01-22 23:17:15,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 25 minutes, 14 seconds)
2026-01-22 23:18:47,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:18:48,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 79.08083 ± 95.894
2026-01-22 23:18:48,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [-55.60189, 78.64161, 279.0876, 20.81999, 13.124505, 13.334041, 177.88881, 49.044212, 38.329254, 176.14029]
2026-01-22 23:18:48,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [236.0, 158.0, 396.0, 126.0, 95.0, 23.0, 263.0, 148.0, 216.0, 149.0]
2026-01-22 23:18:48,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (79.08) for latency DatasetOffice
2026-01-22 23:18:48,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 25 minutes, 57 seconds)
2026-01-22 23:20:20,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:20:22,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 251.12296 ± 98.101
2026-01-22 23:20:22,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [285.73523, 105.276085, 232.0328, 301.6265, 106.72508, 346.17126, 350.2491, 183.49826, 404.36215, 195.55298]
2026-01-22 23:20:22,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [184.0, 121.0, 148.0, 165.0, 117.0, 186.0, 305.0, 118.0, 212.0, 116.0]
2026-01-22 23:20:22,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (251.12) for latency DatasetOffice
2026-01-22 23:20:22,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 25 minutes, 41 seconds)
2026-01-22 23:21:53,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:21:54,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 228.57280 ± 131.559
2026-01-22 23:21:54,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [0.51480925, 12.684176, 271.49683, 226.58891, 242.40663, 476.1656, 251.6511, 237.50777, 330.80948, 235.9026]
2026-01-22 23:21:54,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [16.0, 23.0, 187.0, 142.0, 142.0, 216.0, 149.0, 156.0, 162.0, 146.0]
2026-01-22 23:21:54,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 24 minutes, 36 seconds)
2026-01-22 23:23:26,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:23:28,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 301.24350 ± 122.672
2026-01-22 23:23:28,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [334.23206, 52.165012, 234.1286, 288.48477, 544.79425, 242.32983, 438.4106, 276.791, 316.1746, 284.92444]
2026-01-22 23:23:28,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [170.0, 260.0, 131.0, 140.0, 394.0, 139.0, 227.0, 139.0, 184.0, 181.0]
2026-01-22 23:23:28,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (301.24) for latency DatasetOffice
2026-01-22 23:23:28,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 25 minutes, 59 seconds)
2026-01-22 23:24:58,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:25:00,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 377.69687 ± 112.008
2026-01-22 23:25:00,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [349.04062, 421.1407, 372.69083, 430.14053, 500.1316, 383.7066, 303.38412, 445.04288, 483.34976, 88.34106]
2026-01-22 23:25:00,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [167.0, 217.0, 218.0, 216.0, 362.0, 289.0, 162.0, 221.0, 203.0, 172.0]
2026-01-22 23:25:00,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (377.70) for latency DatasetOffice
2026-01-22 23:25:00,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 24 minutes, 10 seconds)
2026-01-22 23:26:33,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:26:35,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 350.79550 ± 81.898
2026-01-22 23:26:35,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [227.91972, 328.84598, 328.59943, 456.20776, 401.61307, 196.3665, 325.09406, 418.13324, 394.8987, 430.2766]
2026-01-22 23:26:35,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [151.0, 232.0, 239.0, 252.0, 234.0, 148.0, 169.0, 222.0, 236.0, 191.0]
2026-01-22 23:26:35,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 23 minutes, 7 seconds)
2026-01-22 23:28:06,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:28:08,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 407.76501 ± 166.502
2026-01-22 23:28:08,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [7.3312435, 491.41953, 435.76373, 440.34598, 235.3452, 569.75305, 345.94235, 543.04193, 429.21912, 579.488]
2026-01-22 23:28:08,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [25.0, 196.0, 207.0, 200.0, 134.0, 224.0, 138.0, 233.0, 210.0, 314.0]
2026-01-22 23:28:08,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (407.77) for latency DatasetOffice
2026-01-22 23:28:08,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 21 minutes, 19 seconds)
2026-01-22 23:29:40,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:29:42,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 434.86337 ± 195.611
2026-01-22 23:29:42,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [557.39655, 464.72305, 414.49634, 633.1832, 641.064, 524.9645, 126.87866, 16.58685, 512.7423, 456.59802]
2026-01-22 23:29:42,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [224.0, 216.0, 190.0, 226.0, 258.0, 205.0, 90.0, 29.0, 229.0, 188.0]
2026-01-22 23:29:42,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (434.86) for latency DatasetOffice
2026-01-22 23:29:42,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 20 minutes, 23 seconds)
2026-01-22 23:31:11,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:31:12,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 299.86121 ± 211.888
2026-01-22 23:31:12,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [0.8387272, -3.6759198, 28.94115, 499.30554, 182.44514, 530.0347, 482.25446, 395.37195, 391.87503, 491.22137]
2026-01-22 23:31:12,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [14.0, 20.0, 46.0, 214.0, 128.0, 241.0, 201.0, 194.0, 195.0, 191.0]
2026-01-22 23:31:12,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 17 minutes, 52 seconds)
2026-01-22 23:32:44,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:32:46,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 276.23529 ± 54.431
2026-01-22 23:32:46,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [252.92128, 214.0314, 243.84776, 255.93599, 264.8428, 325.00385, 304.8278, 225.8654, 409.56055, 265.51608]
2026-01-22 23:32:46,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [138.0, 126.0, 134.0, 141.0, 170.0, 192.0, 165.0, 123.0, 179.0, 141.0]
2026-01-22 23:32:46,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 16 minutes, 28 seconds)
2026-01-22 23:34:17,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:34:18,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 256.49451 ± 101.230
2026-01-22 23:34:18,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [158.57191, 272.90747, 351.6352, 387.08728, 8.055975, 278.44632, 289.50125, 242.86732, 310.89862, 264.97354]
2026-01-22 23:34:18,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [144.0, 130.0, 136.0, 144.0, 22.0, 141.0, 150.0, 132.0, 135.0, 137.0]
2026-01-22 23:34:18,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 14 minutes, 23 seconds)
2026-01-22 23:35:50,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:35:52,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 375.18063 ± 192.407
2026-01-22 23:35:52,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [674.7323, 281.7339, 520.77576, 246.69238, 261.22638, 211.59833, 758.9728, 275.87256, 333.05386, 187.14784]
2026-01-22 23:35:52,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [251.0, 178.0, 232.0, 175.0, 181.0, 130.0, 329.0, 144.0, 161.0, 103.0]
2026-01-22 23:35:52,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 13 minutes, 8 seconds)
2026-01-22 23:37:22,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:37:24,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 368.75092 ± 78.751
2026-01-22 23:37:24,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [340.8161, 434.4524, 370.24036, 261.47763, 262.21362, 511.2998, 442.58426, 317.60553, 426.4646, 320.35458]
2026-01-22 23:37:24,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [158.0, 187.0, 163.0, 155.0, 172.0, 219.0, 210.0, 186.0, 230.0, 184.0]
2026-01-22 23:37:24,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 10 minutes, 51 seconds)
2026-01-22 23:38:56,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:38:57,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 402.10828 ± 100.371
2026-01-22 23:38:57,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [306.6765, 309.97983, 519.62445, 384.0771, 619.6071, 321.36777, 467.63385, 417.61194, 370.73315, 303.77094]
2026-01-22 23:38:57,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [151.0, 149.0, 257.0, 186.0, 232.0, 156.0, 230.0, 199.0, 171.0, 149.0]
2026-01-22 23:38:57,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 10 minutes, 11 seconds)
2026-01-22 23:40:30,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:40:32,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 442.55972 ± 163.030
2026-01-22 23:40:32,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [370.90027, 655.99347, 367.20544, 538.8074, 401.36893, 246.10226, 451.0567, 263.64056, 344.27704, 786.24506]
2026-01-22 23:40:32,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [177.0, 271.0, 176.0, 235.0, 265.0, 197.0, 190.0, 147.0, 221.0, 364.0]
2026-01-22 23:40:32,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (442.56) for latency DatasetOffice
2026-01-22 23:40:32,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 8 minutes, 56 seconds)
2026-01-22 23:42:03,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:42:04,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 388.27679 ± 52.883
2026-01-22 23:42:04,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [327.3988, 397.94434, 331.95352, 333.3229, 436.118, 355.2886, 488.92007, 364.24524, 397.62726, 449.9493]
2026-01-22 23:42:04,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [137.0, 195.0, 136.0, 129.0, 142.0, 142.0, 174.0, 145.0, 144.0, 162.0]
2026-01-22 23:42:04,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 7 minutes, 16 seconds)
2026-01-22 23:43:36,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:43:38,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 754.87122 ± 261.046
2026-01-22 23:43:38,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [1165.8572, 770.92316, 576.4409, 1115.0385, 643.3605, 503.2109, 486.22342, 511.09396, 1120.402, 656.1623]
2026-01-22 23:43:38,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [412.0, 349.0, 210.0, 472.0, 219.0, 187.0, 190.0, 187.0, 366.0, 310.0]
2026-01-22 23:43:38,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (754.87) for latency DatasetOffice
2026-01-22 23:43:38,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 5 minutes, 54 seconds)
2026-01-22 23:45:10,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:45:13,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 625.03101 ± 173.426
2026-01-22 23:45:13,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [560.217, 516.36945, 563.38806, 908.2111, 810.54877, 548.3117, 597.84424, 884.4696, 521.8864, 339.06384]
2026-01-22 23:45:13,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [233.0, 210.0, 227.0, 305.0, 291.0, 215.0, 252.0, 433.0, 213.0, 171.0]
2026-01-22 23:45:13,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 5 minutes, 1 second)
2026-01-22 23:46:46,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:46:49,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 780.31213 ± 593.194
2026-01-22 23:46:49,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [1958.676, 48.583797, 776.57135, 209.2669, 963.76227, 473.79068, 436.7781, 472.5455, 721.13043, 1742.0159]
2026-01-22 23:46:49,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [890.0, 70.0, 287.0, 114.0, 456.0, 201.0, 182.0, 177.0, 295.0, 612.0]
2026-01-22 23:46:49,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (780.31) for latency DatasetOffice
2026-01-22 23:46:49,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 4 minutes, 5 seconds)
2026-01-22 23:48:22,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:48:25,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 958.17059 ± 349.138
2026-01-22 23:48:25,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [1600.116, 801.30304, 786.6191, 430.23297, 1078.1017, 527.6859, 832.12604, 936.65234, 1168.0948, 1420.7748]
2026-01-22 23:48:25,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [598.0, 275.0, 227.0, 136.0, 348.0, 165.0, 310.0, 389.0, 439.0, 466.0]
2026-01-22 23:48:25,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (958.17) for latency DatasetOffice
2026-01-22 23:48:25,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 2 minutes, 56 seconds)
2026-01-22 23:49:54,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:49:56,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 514.73260 ± 94.506
2026-01-22 23:49:56,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [487.83765, 419.19833, 537.3722, 782.11615, 467.75836, 482.81522, 529.185, 488.68185, 462.53046, 489.83124]
2026-01-22 23:49:56,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [187.0, 175.0, 209.0, 245.0, 184.0, 166.0, 205.0, 192.0, 182.0, 189.0]
2026-01-22 23:49:56,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 1 minute, 5 seconds)
2026-01-22 23:51:28,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:51:32,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1247.64319 ± 483.738
2026-01-22 23:51:32,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [674.2579, 1689.97, 1923.4438, 543.2846, 1581.3342, 898.6454, 664.803, 1244.9005, 1608.7574, 1647.0349]
2026-01-22 23:51:32,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [303.0, 773.0, 725.0, 207.0, 612.0, 314.0, 258.0, 435.0, 623.0, 613.0]
2026-01-22 23:51:32,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (1247.64) for latency DatasetOffice
2026-01-22 23:51:32,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 4 seconds)
2026-01-22 23:53:07,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:53:10,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 921.07581 ± 454.824
2026-01-22 23:53:10,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [868.3934, 615.4727, 843.1018, 830.71375, 955.2567, 1404.1692, 690.84875, 1993.5148, 799.3672, 209.91994]
2026-01-22 23:53:10,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [290.0, 188.0, 353.0, 259.0, 334.0, 478.0, 222.0, 721.0, 307.0, 85.0]
2026-01-22 23:53:10,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 26/100 (estimated time remaining: 1 hour, 59 minutes, 15 seconds)
2026-01-22 23:54:39,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:54:43,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1035.62329 ± 515.509
2026-01-22 23:54:43,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [1481.2051, 843.5211, 1073.498, 1661.5769, 1347.7664, 1201.0223, 1574.3796, 287.01874, 24.062155, 862.18207]
2026-01-22 23:54:43,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [485.0, 319.0, 635.0, 633.0, 535.0, 713.0, 648.0, 110.0, 41.0, 320.0]
2026-01-22 23:54:43,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 27/100 (estimated time remaining: 1 hour, 56 minutes, 52 seconds)
2026-01-22 23:56:16,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:56:21,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1579.35742 ± 654.236
2026-01-22 23:56:21,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [1142.1454, 2503.367, 2046.7231, 1464.3372, 1014.16144, 1031.4017, 2728.363, 1086.0211, 777.17426, 1999.881]
2026-01-22 23:56:21,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [401.0, 1000.0, 685.0, 459.0, 368.0, 344.0, 1000.0, 423.0, 316.0, 643.0]
2026-01-22 23:56:21,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (1579.36) for latency DatasetOffice
2026-01-22 23:56:21,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 28/100 (estimated time remaining: 1 hour, 55 minutes, 56 seconds)
2026-01-22 23:57:58,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:58:02,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1260.05981 ± 859.941
2026-01-22 23:58:02,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [878.76074, 394.2092, 2684.0593, 2609.7378, 640.05695, 1711.6345, 1571.3074, 25.140678, 626.96704, 1458.7253]
2026-01-22 23:58:02,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [351.0, 141.0, 1000.0, 1000.0, 222.0, 652.0, 555.0, 39.0, 223.0, 459.0]
2026-01-22 23:58:02,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 29/100 (estimated time remaining: 1 hour, 56 minutes, 37 seconds)
2026-01-22 23:59:35,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:59:39,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1575.16284 ± 834.489
2026-01-22 23:59:39,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [2038.7589, 3042.601, 548.24524, 1545.3618, 2129.1443, 2065.634, 454.43356, 1243.0343, 2213.3105, 471.10474]
2026-01-22 23:59:39,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [632.0, 1000.0, 194.0, 423.0, 543.0, 695.0, 161.0, 333.0, 652.0, 165.0]
2026-01-22 23:59:39,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 30/100 (estimated time remaining: 1 hour, 55 minutes, 17 seconds)
2026-01-23 00:01:08,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:01:10,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 884.95526 ± 614.741
2026-01-23 00:01:10,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [576.5479, 730.6756, 312.8648, 1357.3132, 328.03442, 656.635, 450.62463, 675.9683, 2396.745, 1364.1439]
2026-01-23 00:01:10,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [177.0, 220.0, 116.0, 349.0, 123.0, 196.0, 150.0, 198.0, 765.0, 430.0]
2026-01-23 00:01:10,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 52 minutes, 6 seconds)
2026-01-23 00:02:39,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:02:45,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2285.32544 ± 715.440
2026-01-23 00:02:45,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3265.4075, 3172.8582, 1275.3955, 1300.1771, 2575.6057, 2982.9556, 1567.6201, 2473.1628, 2439.2656, 1800.8042]
2026-01-23 00:02:45,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 931.0, 354.0, 345.0, 897.0, 1000.0, 534.0, 816.0, 672.0, 553.0]
2026-01-23 00:02:45,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (2285.33) for latency DatasetOffice
2026-01-23 00:02:45,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 51 minutes, 3 seconds)
2026-01-23 00:04:22,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:04:28,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2077.71411 ± 1383.411
2026-01-23 00:04:28,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [2207.715, 807.4102, 2383.3958, 955.7392, 33.668842, 237.02798, 3340.2742, 3657.0105, 3613.0862, 3541.8115]
2026-01-23 00:04:28,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [602.0, 234.0, 693.0, 271.0, 46.0, 100.0, 951.0, 926.0, 1000.0, 1000.0]
2026-01-23 00:04:28,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 50 minutes, 17 seconds)
2026-01-23 00:05:58,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:06:00,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 882.77667 ± 778.740
2026-01-23 00:06:00,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [482.35257, 37.10898, 29.889048, 2844.4197, 429.91858, 1231.1323, 1291.0415, 687.1145, 1099.3677, 695.4211]
2026-01-23 00:06:00,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [163.0, 50.0, 50.0, 1000.0, 139.0, 396.0, 384.0, 211.0, 356.0, 225.0]
2026-01-23 00:06:00,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 46 minutes, 53 seconds)
2026-01-23 00:07:38,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:07:43,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1816.43518 ± 875.891
2026-01-23 00:07:43,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3314.5347, 1650.8499, 2354.801, 2439.1606, -7.4899316, 2512.6462, 916.4648, 1628.1016, 1561.1232, 1794.1587]
2026-01-23 00:07:43,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 554.0, 717.0, 751.0, 15.0, 815.0, 295.0, 509.0, 483.0, 596.0]
2026-01-23 00:07:43,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 46 minutes, 24 seconds)
2026-01-23 00:09:09,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:09:14,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1413.92688 ± 587.284
2026-01-23 00:09:14,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [2432.0522, 1286.4562, 570.27936, 1350.9125, 1867.3848, 1792.9451, 912.3239, 576.78064, 2023.2, 1326.9333]
2026-01-23 00:09:14,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [887.0, 523.0, 209.0, 609.0, 675.0, 714.0, 349.0, 201.0, 709.0, 495.0]
2026-01-23 00:09:14,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 44 minutes, 45 seconds)
2026-01-23 00:10:49,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:10:51,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 868.24255 ± 796.734
2026-01-23 00:10:51,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [751.6223, 1969.5087, 1669.9396, 1102.4877, 675.31006, 2222.775, 79.607895, 186.59848, 12.669348, 11.906889]
2026-01-23 00:10:51,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [229.0, 524.0, 430.0, 307.0, 204.0, 600.0, 69.0, 131.0, 32.0, 34.0]
2026-01-23 00:10:51,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 43 minutes, 40 seconds)
2026-01-23 00:12:21,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:12:27,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2285.36865 ± 1322.591
2026-01-23 00:12:27,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [1555.5088, 2836.2507, 3458.434, 116.41863, 3326.1255, 3305.5464, 1440.4307, 0.64588094, 3382.4573, 3431.8687]
2026-01-23 00:12:27,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [481.0, 1000.0, 1000.0, 66.0, 1000.0, 1000.0, 512.0, 29.0, 1000.0, 1000.0]
2026-01-23 00:12:27,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (2285.37) for latency DatasetOffice
2026-01-23 00:12:27,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 40 minutes, 44 seconds)
2026-01-23 00:14:04,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:14:08,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1245.94983 ± 944.481
2026-01-23 00:14:08,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [596.2069, 2837.3857, 605.08746, 796.408, 541.7909, 953.00006, 2936.9243, 260.98105, 815.8998, 2115.8137]
2026-01-23 00:14:08,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [187.0, 1000.0, 182.0, 255.0, 166.0, 289.0, 1000.0, 95.0, 342.0, 765.0]
2026-01-23 00:14:08,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 40 minutes, 50 seconds)
2026-01-23 00:15:41,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:15:44,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1352.66504 ± 866.597
2026-01-23 00:15:44,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3244.8398, 1166.5157, 1408.7949, 1610.6849, -7.7307787, 1509.5068, 853.47687, 719.2504, 720.21265, 2301.0986]
2026-01-23 00:15:44,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 352.0, 459.0, 505.0, 17.0, 400.0, 279.0, 197.0, 203.0, 732.0]
2026-01-23 00:15:44,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 37 minutes, 51 seconds)
2026-01-23 00:17:14,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:17:15,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 662.26447 ± 557.231
2026-01-23 00:17:15,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [1214.7905, 278.12064, 2011.8429, 317.49, 721.7159, 328.2459, 306.16217, 973.3124, 302.69897, 168.26483]
2026-01-23 00:17:15,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [406.0, 100.0, 523.0, 108.0, 217.0, 110.0, 106.0, 268.0, 106.0, 70.0]
2026-01-23 00:17:15,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 36 minutes, 20 seconds)
2026-01-23 00:18:47,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:18:51,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1393.55115 ± 732.382
2026-01-23 00:18:51,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [1000.6702, 1940.315, 3185.73, 1496.4385, 1319.0814, 833.6566, 731.461, 1093.5393, 559.7455, 1774.8749]
2026-01-23 00:18:51,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [317.0, 703.0, 949.0, 451.0, 411.0, 300.0, 251.0, 371.0, 189.0, 467.0]
2026-01-23 00:18:51,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 34 minutes, 21 seconds)
2026-01-23 00:20:24,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:20:28,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1473.26294 ± 967.347
2026-01-23 00:20:28,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [1883.8422, 977.7806, 653.5904, 266.76523, 1061.7789, 856.76135, 1315.6848, 1321.5061, 3566.9648, 2827.9553]
2026-01-23 00:20:28,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [505.0, 347.0, 200.0, 170.0, 307.0, 270.0, 375.0, 435.0, 962.0, 874.0]
2026-01-23 00:20:28,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 33 minutes)
2026-01-23 00:22:05,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:22:07,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 810.11920 ± 350.230
2026-01-23 00:22:07,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [712.5297, 1279.1177, 191.67996, 920.5422, 1251.7482, 884.44714, 858.4667, 838.74695, 965.7759, 198.13792]
2026-01-23 00:22:07,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [219.0, 327.0, 85.0, 264.0, 367.0, 257.0, 244.0, 252.0, 269.0, 85.0]
2026-01-23 00:22:07,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 30 minutes, 53 seconds)
2026-01-23 00:23:38,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:23:44,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2264.45728 ± 1303.926
2026-01-23 00:23:44,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [1639.0138, 667.61053, 2450.5347, 3902.702, 3453.3616, 1032.1039, 3693.2244, 1275.3439, 3907.0974, 623.5799]
2026-01-23 00:23:44,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [453.0, 193.0, 612.0, 1000.0, 888.0, 302.0, 1000.0, 339.0, 1000.0, 182.0]
2026-01-23 00:23:44,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 29 minutes, 29 seconds)
2026-01-23 00:25:14,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:25:20,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2663.86475 ± 975.544
2026-01-23 00:25:20,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3295.437, 3531.0664, 1476.7886, 2455.2, 1853.7422, 3728.8472, 859.42175, 3198.63, 3883.005, 2356.5117]
2026-01-23 00:25:20,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [900.0, 1000.0, 424.0, 676.0, 466.0, 1000.0, 246.0, 806.0, 1000.0, 686.0]
2026-01-23 00:25:20,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (2663.86) for latency DatasetOffice
2026-01-23 00:25:20,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 28 minutes, 53 seconds)
2026-01-23 00:26:55,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:26:57,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 814.58130 ± 440.991
2026-01-23 00:26:57,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [902.023, 1625.3795, 1399.2328, 672.7247, 804.7464, 871.5218, 178.40549, 101.3858, 807.33765, 783.05524]
2026-01-23 00:26:57,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [257.0, 454.0, 447.0, 193.0, 231.0, 261.0, 114.0, 52.0, 237.0, 230.0]
2026-01-23 00:26:57,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 27 minutes, 30 seconds)
2026-01-23 00:28:26,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:28:30,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1841.80530 ± 1107.337
2026-01-23 00:28:30,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [1749.5105, 3693.1035, 1171.0961, 712.00397, 1032.9366, 655.3434, 1927.8083, 4035.8684, 2025.9558, 1414.4253]
2026-01-23 00:28:30,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [497.0, 1000.0, 306.0, 207.0, 287.0, 205.0, 582.0, 1000.0, 527.0, 391.0]
2026-01-23 00:28:30,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 25 minutes, 8 seconds)
2026-01-23 00:30:10,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:30:15,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1676.68970 ± 1181.146
2026-01-23 00:30:15,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3321.458, 3612.296, 1036.0156, 636.71783, 1422.7968, 1188.5443, 1152.9247, 98.84528, 3265.5747, 1031.7225]
2026-01-23 00:30:15,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 312.0, 195.0, 447.0, 352.0, 335.0, 71.0, 1000.0, 323.0]
2026-01-23 00:30:15,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 24 minutes, 33 seconds)
2026-01-23 00:31:40,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:31:45,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2183.74170 ± 1413.036
2026-01-23 00:31:45,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [364.60583, 3737.8687, 331.94934, 1683.4377, 3888.1577, 1612.8181, 3845.8606, 1563.6229, 3834.0637, 975.0324]
2026-01-23 00:31:45,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [121.0, 1000.0, 116.0, 490.0, 1000.0, 413.0, 1000.0, 432.0, 1000.0, 270.0]
2026-01-23 00:31:46,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 21 minutes, 53 seconds)
2026-01-23 00:33:19,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:33:21,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 764.13922 ± 1050.939
2026-01-23 00:33:21,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [15.740494, 38.656628, 22.03649, 1.8265756, 235.75708, 17.243753, 1177.2786, 2032.3759, 3254.859, 845.61774]
2026-01-23 00:33:21,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [34.0, 55.0, 46.0, 25.0, 107.0, 30.0, 496.0, 667.0, 1000.0, 232.0]
2026-01-23 00:33:21,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 20 minutes, 6 seconds)
2026-01-23 00:35:00,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:35:05,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2513.88623 ± 1015.183
2026-01-23 00:35:05,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [1423.1605, 1795.4971, 3069.4336, 999.0332, 1287.2124, 3657.6848, 3872.034, 3005.0725, 3560.7524, 2468.9822]
2026-01-23 00:35:05,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [388.0, 464.0, 764.0, 357.0, 317.0, 1000.0, 1000.0, 721.0, 873.0, 644.0]
2026-01-23 00:35:05,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 19 minutes, 43 seconds)
2026-01-23 00:36:31,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:36:37,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2153.64404 ± 1228.155
2026-01-23 00:36:37,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [1204.9026, 3066.5537, 3571.2366, 923.2547, 3367.604, 1216.9669, 117.33366, 3391.8909, 1408.9435, 3267.7537]
2026-01-23 00:36:37,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [365.0, 1000.0, 1000.0, 252.0, 1000.0, 352.0, 57.0, 1000.0, 457.0, 1000.0]
2026-01-23 00:36:37,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 17 minutes, 55 seconds)
2026-01-23 00:38:14,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:38:17,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1059.63965 ± 1363.948
2026-01-23 00:38:17,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [2215.1406, 72.51636, 4.1029463, 39.821823, 112.60588, 20.503418, 18.533792, 3081.9304, 1343.0647, 3688.1763]
2026-01-23 00:38:17,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [570.0, 80.0, 26.0, 36.0, 134.0, 39.0, 49.0, 876.0, 395.0, 1000.0]
2026-01-23 00:38:17,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 15 minutes, 37 seconds)
2026-01-23 00:39:53,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:40:00,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2739.90967 ± 960.420
2026-01-23 00:40:00,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3369.918, 2626.9597, 3488.427, 883.60364, 2716.6797, 3626.3015, 3325.3027, 2877.328, 984.2762, 3500.3025]
2026-01-23 00:40:00,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 731.0, 1000.0, 280.0, 775.0, 1000.0, 977.0, 844.0, 293.0, 1000.0]
2026-01-23 00:40:00,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (2739.91) for latency DatasetOffice
2026-01-23 00:40:00,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 15 minutes, 47 seconds)
2026-01-23 00:41:28,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:41:35,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2820.22241 ± 1143.840
2026-01-23 00:41:35,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [2613.8562, 3695.233, 3590.9343, 3297.9436, 3473.6118, 3542.7004, 1012.82434, 247.97644, 3239.5579, 3487.5857]
2026-01-23 00:41:35,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [694.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 286.0, 95.0, 1000.0, 1000.0]
2026-01-23 00:41:35,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (2820.22) for latency DatasetOffice
2026-01-23 00:41:35,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 14 minutes, 7 seconds)
2026-01-23 00:43:06,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:43:11,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1915.14746 ± 1738.151
2026-01-23 00:43:11,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3724.4097, 3691.1511, 3700.1057, 3590.6694, 3465.2202, 894.0373, 1.2767031, 10.962353, 46.605503, 27.036148]
2026-01-23 00:43:11,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 266.0, 25.0, 29.0, 57.0, 49.0]
2026-01-23 00:43:11,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 11 minutes, 12 seconds)
2026-01-23 00:44:43,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:44:49,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2056.85791 ± 1242.487
2026-01-23 00:44:49,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [535.6565, 707.79663, 1647.1204, 1672.689, 496.922, 1547.0105, 3566.1765, 3353.3691, 3364.4329, 3677.407]
2026-01-23 00:44:49,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [166.0, 198.0, 449.0, 464.0, 160.0, 445.0, 1000.0, 1000.0, 959.0, 1000.0]
2026-01-23 00:44:49,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 10 minutes, 25 seconds)
2026-01-23 00:46:24,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:46:32,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3033.92236 ± 839.277
2026-01-23 00:46:32,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3556.4055, 2232.4932, 3880.0925, 3516.7278, 3138.1846, 3551.713, 3656.0083, 1458.9763, 3616.2266, 1732.3964]
2026-01-23 00:46:32,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 634.0, 1000.0, 1000.0, 802.0, 1000.0, 1000.0, 398.0, 1000.0, 502.0]
2026-01-23 00:46:32,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (3033.92) for latency DatasetOffice
2026-01-23 00:46:32,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 9 minutes, 16 seconds)
2026-01-23 00:48:02,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:48:07,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1547.68530 ± 1266.193
2026-01-23 00:48:07,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [2519.4045, 2329.743, 2923.25, -5.039417, 214.41727, 13.472382, 139.339, 3483.3083, 2030.582, 1828.3752]
2026-01-23 00:48:07,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [730.0, 770.0, 815.0, 18.0, 160.0, 27.0, 104.0, 1000.0, 545.0, 528.0]
2026-01-23 00:48:07,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 6 minutes, 32 seconds)
2026-01-23 00:49:43,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:49:49,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2588.16748 ± 1494.975
2026-01-23 00:49:49,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [467.6517, 487.61118, 1509.7189, 4028.488, 803.5633, 3903.0005, 3985.099, 3004.3506, 3845.8213, 3846.3716]
2026-01-23 00:49:49,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [155.0, 157.0, 411.0, 1000.0, 232.0, 1000.0, 1000.0, 815.0, 1000.0, 1000.0]
2026-01-23 00:49:49,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 5 minutes, 48 seconds)
2026-01-23 00:51:26,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:51:34,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3164.39062 ± 715.556
2026-01-23 00:51:34,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3357.086, 3455.4958, 3514.8848, 3576.384, 3564.2583, 3722.6047, 1519.986, 2058.0837, 3157.3613, 3717.7588]
2026-01-23 00:51:34,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 430.0, 596.0, 1000.0, 1000.0]
2026-01-23 00:51:34,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (3164.39) for latency DatasetOffice
2026-01-23 00:51:34,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 5 minutes, 24 seconds)
2026-01-23 00:53:00,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:53:05,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2003.99097 ± 1652.389
2026-01-23 00:53:05,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3540.301, 3781.671, 3920.573, 3039.5388, 2099.165, 3401.3762, 23.899145, -4.643912, 204.01564, 34.014]
2026-01-23 00:53:05,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 785.0, 518.0, 1000.0, 46.0, 20.0, 111.0, 46.0]
2026-01-23 00:53:05,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 2 minutes, 51 seconds)
2026-01-23 00:54:42,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:54:50,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3187.32153 ± 917.931
2026-01-23 00:54:50,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3184.7808, 3681.6875, 3515.109, 3378.257, 497.27173, 3427.0854, 3668.2947, 3821.4182, 3519.5059, 3179.8052]
2026-01-23 00:54:50,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 161.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:54:50,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (3187.32) for latency DatasetOffice
2026-01-23 00:54:50,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 1 minute, 25 seconds)
2026-01-23 00:56:18,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:56:25,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2646.45117 ± 936.994
2026-01-23 00:56:25,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3042.0266, 3423.72, 3624.4473, 1956.2836, 3395.5676, 2341.8928, 1003.25134, 1122.755, 3037.943, 3516.6223]
2026-01-23 00:56:25,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 567.0, 1000.0, 716.0, 279.0, 292.0, 1000.0, 1000.0]
2026-01-23 00:56:25,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 65/100 (estimated time remaining: 59 minutes, 49 seconds)
2026-01-23 00:58:02,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:58:07,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2083.37183 ± 1224.675
2026-01-23 00:58:07,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [2274.6785, 1903.0026, 3747.601, 3108.377, 3701.8308, 2347.032, 24.185366, 2354.1013, 1005.5443, 367.36456]
2026-01-23 00:58:07,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [627.0, 490.0, 1000.0, 838.0, 1000.0, 622.0, 44.0, 640.0, 330.0, 261.0]
2026-01-23 00:58:07,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 66/100 (estimated time remaining: 58 minutes, 7 seconds)
2026-01-23 00:59:34,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:59:43,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3278.47461 ± 854.379
2026-01-23 00:59:43,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3360.866, 3691.5544, 3904.7578, 3901.1453, 1612.1167, 3792.8296, 3611.3115, 3654.413, 3677.2295, 1578.5206]
2026-01-23 00:59:43,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 450.0, 1000.0, 1000.0, 1000.0, 1000.0, 456.0]
2026-01-23 00:59:43,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (3278.47) for latency DatasetOffice
2026-01-23 00:59:43,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 67/100 (estimated time remaining: 55 minutes, 21 seconds)
2026-01-23 01:01:17,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:01:22,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2266.42017 ± 1536.137
2026-01-23 01:01:22,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [1503.5867, -3.7726467, 3626.512, 3899.58, 1020.05884, 293.07877, 3593.7354, 3689.88, 3931.3599, 1110.1816]
2026-01-23 01:01:22,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [436.0, 21.0, 1000.0, 1000.0, 304.0, 108.0, 1000.0, 1000.0, 1000.0, 320.0]
2026-01-23 01:01:22,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 68/100 (estimated time remaining: 54 minutes, 43 seconds)
2026-01-23 01:02:53,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:02:54,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 520.25574 ± 1171.327
2026-01-23 01:02:54,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3891.8965, 1127.0957, 1.2967718, -1.6319996, 3.2841852, 32.15172, 39.600136, 80.30959, 9.407462, 19.147657]
2026-01-23 01:02:54,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 328.0, 22.0, 23.0, 27.0, 55.0, 55.0, 66.0, 29.0, 30.0]
2026-01-23 01:02:54,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 69/100 (estimated time remaining: 51 minutes, 39 seconds)
2026-01-23 01:04:28,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:04:35,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2949.34229 ± 1239.779
2026-01-23 01:04:35,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3795.8618, 945.88214, 955.80524, 1296.6494, 3748.3086, 3840.8657, 3530.9158, 3657.5564, 3855.112, 3866.4668]
2026-01-23 01:04:35,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 271.0, 247.0, 396.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:04:35,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 70/100 (estimated time remaining: 50 minutes, 37 seconds)
2026-01-23 01:06:07,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:06:16,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3566.74683 ± 264.499
2026-01-23 01:06:16,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3007.1526, 3687.5862, 3418.732, 3738.1646, 4009.3127, 3548.397, 3466.874, 3333.1758, 3669.2463, 3788.8289]
2026-01-23 01:06:16,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [918.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:06:16,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (3566.75) for latency DatasetOffice
2026-01-23 01:06:16,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 71/100 (estimated time remaining: 48 minutes, 55 seconds)
2026-01-23 01:07:39,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:07:45,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2070.46631 ± 1503.032
2026-01-23 01:07:45,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3526.8252, 2256.2, 3436.7407, 3350.208, 3726.326, 3227.5361, 44.256397, 484.54242, 58.10875, 593.91785]
2026-01-23 01:07:45,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 615.0, 1000.0, 1000.0, 1000.0, 922.0, 58.0, 311.0, 41.0, 239.0]
2026-01-23 01:07:45,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 72/100 (estimated time remaining: 46 minutes, 35 seconds)
2026-01-23 01:09:18,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:09:27,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3572.49927 ± 637.250
2026-01-23 01:09:27,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3732.2092, 3945.7341, 3949.5823, 3412.0046, 3806.0068, 3701.6348, 3674.917, 3783.2856, 3996.9414, 1722.6776]
2026-01-23 01:09:27,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 464.0]
2026-01-23 01:09:27,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (3572.50) for latency DatasetOffice
2026-01-23 01:09:27,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 73/100 (estimated time remaining: 45 minutes, 14 seconds)
2026-01-23 01:11:02,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:11:09,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3084.14966 ± 861.060
2026-01-23 01:11:09,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [2043.4061, 3703.738, 2217.5864, 3676.6658, 3904.5645, 2671.1973, 1440.8976, 3779.102, 3537.1658, 3867.1716]
2026-01-23 01:11:09,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [521.0, 1000.0, 599.0, 1000.0, 1000.0, 725.0, 388.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:11:09,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 74/100 (estimated time remaining: 44 minutes, 30 seconds)
2026-01-23 01:12:34,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:12:42,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3542.05591 ± 684.670
2026-01-23 01:12:42,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3794.6296, 3664.9797, 3820.963, 3716.3752, 3750.9019, 1494.4548, 3723.8557, 3790.5244, 3804.478, 3859.3982]
2026-01-23 01:12:42,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 952.0, 431.0, 1000.0, 947.0, 1000.0, 1000.0]
2026-01-23 01:12:42,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 75/100 (estimated time remaining: 42 minutes, 13 seconds)
2026-01-23 01:14:13,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:14:21,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3222.93701 ± 981.602
2026-01-23 01:14:21,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3728.3755, 3666.0364, 1559.6024, 3269.1938, 3817.701, 3764.709, 3942.9573, 1042.23, 3740.5603, 3698.004]
2026-01-23 01:14:21,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 379.0, 1000.0, 1000.0, 1000.0, 1000.0, 279.0, 1000.0, 1000.0]
2026-01-23 01:14:21,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 76/100 (estimated time remaining: 40 minutes, 25 seconds)
2026-01-23 01:15:55,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:16:01,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2449.60303 ± 1618.349
2026-01-23 01:16:01,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3631.431, 3651.8403, 3830.1846, 3722.9548, 3329.367, 3539.2986, 2744.0679, 26.310394, 8.715492, 11.86088]
2026-01-23 01:16:01,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 779.0, 40.0, 27.0, 30.0]
2026-01-23 01:16:01,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 77/100 (estimated time remaining: 39 minutes, 44 seconds)
2026-01-23 01:17:28,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:17:35,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2576.71167 ± 1444.159
2026-01-23 01:17:35,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3800.479, 344.21448, 106.1688, 3369.7437, 3650.33, 3624.258, 2353.6084, 3768.2993, 989.2275, 3760.789]
2026-01-23 01:17:35,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 116.0, 55.0, 1000.0, 1000.0, 1000.0, 712.0, 1000.0, 298.0, 1000.0]
2026-01-23 01:17:35,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 78/100 (estimated time remaining: 37 minutes, 22 seconds)
2026-01-23 01:19:05,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:19:14,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3240.94165 ± 899.196
2026-01-23 01:19:14,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [2652.2407, 3853.6558, 3864.0, 3745.0557, 3318.699, 3562.6982, 761.7062, 3700.9534, 3743.2275, 3207.182]
2026-01-23 01:19:14,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 990.0, 224.0, 1000.0, 967.0, 830.0]
2026-01-23 01:19:14,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 79/100 (estimated time remaining: 35 minutes, 31 seconds)
2026-01-23 01:20:43,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:20:50,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2764.23462 ± 1435.422
2026-01-23 01:20:50,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3454.586, 3730.1606, 3405.7024, 3412.9756, 3729.7124, 3799.9927, 3760.729, 2329.0369, -5.1727757, 24.622585]
2026-01-23 01:20:50,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 653.0, 18.0, 36.0]
2026-01-23 01:20:50,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 80/100 (estimated time remaining: 34 minutes, 6 seconds)
2026-01-23 01:22:20,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:22:27,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2920.06470 ± 1125.562
2026-01-23 01:22:27,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [689.7879, 3639.873, 3626.9365, 3006.1992, 2269.1572, 3788.017, 3818.416, 1032.7295, 3749.3604, 3580.1694]
2026-01-23 01:22:27,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [206.0, 1000.0, 1000.0, 840.0, 542.0, 1000.0, 1000.0, 299.0, 1000.0, 1000.0]
2026-01-23 01:22:27,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 81/100 (estimated time remaining: 32 minutes, 24 seconds)
2026-01-23 01:24:04,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:24:11,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3080.37280 ± 1204.877
2026-01-23 01:24:11,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3567.504, 3919.28, 3916.1875, 3777.3638, 2149.873, -3.9802332, 3987.5876, 3740.0671, 3453.7686, 2296.076]
2026-01-23 01:24:11,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 597.0, 19.0, 1000.0, 1000.0, 1000.0, 598.0]
2026-01-23 01:24:11,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 82/100 (estimated time remaining: 31 minutes)
2026-01-23 01:25:40,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:25:41,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 393.23135 ± 1112.607
2026-01-23 01:25:41,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [13.979967, 24.947283, 20.825348, 14.176361, -2.7978525, 74.95769, 31.45422, 30.774067, -6.444671, 3730.4412]
2026-01-23 01:25:41,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [31.0, 36.0, 38.0, 32.0, 20.0, 62.0, 43.0, 52.0, 17.0, 1000.0]
2026-01-23 01:25:41,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 83/100 (estimated time remaining: 29 minutes, 11 seconds)
2026-01-23 01:27:08,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:27:15,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2897.03271 ± 1321.240
2026-01-23 01:27:15,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3761.9248, 3823.6362, 3971.1938, 3220.1025, 3928.8223, 785.83527, 2837.5864, -4.8466897, 3699.0579, 2947.014]
2026-01-23 01:27:15,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 815.0, 1000.0, 225.0, 726.0, 19.0, 1000.0, 731.0]
2026-01-23 01:27:15,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 84/100 (estimated time remaining: 27 minutes, 16 seconds)
2026-01-23 01:28:52,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:29:00,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3135.98926 ± 1335.685
2026-01-23 01:29:00,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3938.7422, 3844.7085, 3684.1108, 3741.8035, 3764.4043, 3858.3872, 3779.5842, 646.3746, 298.41202, 3803.3652]
2026-01-23 01:29:00,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 979.0, 1000.0, 1000.0, 1000.0, 203.0, 113.0, 1000.0]
2026-01-23 01:29:00,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 85/100 (estimated time remaining: 26 minutes, 9 seconds)
2026-01-23 01:30:24,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:30:30,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2426.73291 ± 1639.014
2026-01-23 01:30:30,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3557.3423, 3643.114, 3892.5105, 3392.8408, 3752.2114, 3793.514, 2165.79, 11.353477, 0.19247639, 58.459663]
2026-01-23 01:30:30,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 631.0, 32.0, 25.0, 65.0]
2026-01-23 01:30:30,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 86/100 (estimated time remaining: 24 minutes, 9 seconds)
2026-01-23 01:32:04,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:32:11,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2776.09497 ± 1300.891
2026-01-23 01:32:11,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3667.889, 139.33742, 3332.909, 3550.4868, 1295.1503, 3611.86, 1103.0874, 3568.6768, 3927.795, 3563.7585]
2026-01-23 01:32:11,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 61.0, 892.0, 1000.0, 345.0, 1000.0, 292.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:32:11,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 87/100 (estimated time remaining: 22 minutes, 25 seconds)
2026-01-23 01:33:37,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:33:46,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3740.27417 ± 120.182
2026-01-23 01:33:46,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3728.2793, 3487.1863, 3902.7083, 3951.2651, 3657.8123, 3699.7498, 3747.8525, 3730.0037, 3731.8774, 3766.0088]
2026-01-23 01:33:46,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:33:46,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (3740.27) for latency DatasetOffice
2026-01-23 01:33:46,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 88/100 (estimated time remaining: 21 minutes, 1 second)
2026-01-23 01:35:23,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:35:28,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2031.65002 ± 1449.604
2026-01-23 01:35:28,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [353.95847, 3872.216, 3936.4548, 2158.7205, 3893.296, 2325.742, 2145.3655, 1342.1339, 112.23755, 176.37573]
2026-01-23 01:35:28,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [119.0, 1000.0, 1000.0, 558.0, 1000.0, 643.0, 683.0, 382.0, 53.0, 91.0]
2026-01-23 01:35:28,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 89/100 (estimated time remaining: 19 minutes, 42 seconds)
2026-01-23 01:36:54,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:37:00,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2377.10742 ± 1555.936
2026-01-23 01:37:00,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [46.710854, 3898.254, 3822.2493, 2224.0427, 3809.0867, 343.49002, 3692.9712, 1738.7704, 384.32703, 3811.1748]
2026-01-23 01:37:00,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [59.0, 1000.0, 1000.0, 608.0, 1000.0, 116.0, 1000.0, 422.0, 128.0, 1000.0]
2026-01-23 01:37:00,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 90/100 (estimated time remaining: 17 minutes, 36 seconds)
2026-01-23 01:38:33,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:38:41,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3184.14429 ± 1180.327
2026-01-23 01:38:41,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3568.6848, 3402.526, 3946.965, 273.68222, 3590.495, 3923.0515, 1605.5378, 3509.6423, 4077.2244, 3943.6326]
2026-01-23 01:38:41,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 99.0, 959.0, 1000.0, 479.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:38:41,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 91/100 (estimated time remaining: 16 minutes, 20 seconds)
2026-01-23 01:40:10,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:40:13,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1064.17847 ± 1586.865
2026-01-23 01:40:13,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3852.12, 3036.8794, 14.871188, 33.086514, 13.518809, 2.0231047, 103.73285, 37.217297, 22.63999, 3525.696]
2026-01-23 01:40:13,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 824.0, 29.0, 55.0, 34.0, 25.0, 75.0, 48.0, 34.0, 1000.0]
2026-01-23 01:40:13,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 92/100 (estimated time remaining: 14 minutes, 26 seconds)
2026-01-23 01:41:40,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:41:48,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3404.25317 ± 950.193
2026-01-23 01:41:48,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [717.6318, 3928.2427, 3665.2944, 3759.9377, 3734.256, 3864.2039, 2783.3613, 3848.1924, 3830.0225, 3911.3862]
2026-01-23 01:41:48,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [207.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 690.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:41:48,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 93/100 (estimated time remaining: 12 minutes, 50 seconds)
2026-01-23 01:43:25,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:43:30,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2224.93970 ± 1389.914
2026-01-23 01:43:30,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [1421.2953, 333.11182, 3920.857, 1848.6906, 389.1646, 3900.323, 3823.4443, 3671.3682, 1232.2803, 1708.8616]
2026-01-23 01:43:30,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [373.0, 115.0, 1000.0, 505.0, 136.0, 1000.0, 1000.0, 1000.0, 320.0, 448.0]
2026-01-23 01:43:30,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 94/100 (estimated time remaining: 11 minutes, 15 seconds)
2026-01-23 01:44:57,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:45:01,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1572.52002 ± 1341.647
2026-01-23 01:45:01,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [2695.3438, 8.363698, 6.2911696, 125.0301, 162.46269, 1706.3544, 2054.4495, 2144.076, 3867.6868, 2955.1423]
2026-01-23 01:45:01,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [706.0, 28.0, 27.0, 66.0, 77.0, 599.0, 554.0, 566.0, 1000.0, 751.0]
2026-01-23 01:45:01,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 95/100 (estimated time remaining: 9 minutes, 36 seconds)
2026-01-23 01:46:31,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:46:38,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3055.45703 ± 1081.875
2026-01-23 01:46:38,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [2056.9873, 870.97504, 3751.2368, 1642.0979, 2847.7917, 3739.4563, 3860.1187, 3883.9065, 4001.7852, 3900.2144]
2026-01-23 01:46:38,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [519.0, 311.0, 1000.0, 437.0, 734.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:46:38,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 96/100 (estimated time remaining: 7 minutes, 57 seconds)
2026-01-23 01:48:12,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:48:18,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2683.49268 ± 1261.911
2026-01-23 01:48:18,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3780.9958, 3728.327, 3669.1953, 3286.6575, 199.23547, 1016.4905, 3365.2441, 3916.85, 2281.3738, 1590.5581]
2026-01-23 01:48:18,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 79.0, 280.0, 913.0, 1000.0, 628.0, 447.0]
2026-01-23 01:48:18,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 28 seconds)
2026-01-23 01:49:50,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:49:56,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2696.22314 ± 1623.654
2026-01-23 01:49:56,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3665.9739, 3604.46, 3806.1226, 3790.5867, 22.27475, 296.02548, 348.12338, 3883.959, 3681.8933, 3862.8127]
2026-01-23 01:49:56,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 35.0, 160.0, 234.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:49:56,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 98/100 (estimated time remaining: 4 minutes, 53 seconds)
2026-01-23 01:51:28,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:51:36,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3632.40747 ± 628.701
2026-01-23 01:51:36,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3902.6484, 1763.2324, 3775.3171, 3868.456, 3895.1223, 3706.8542, 4020.1506, 3835.6694, 3799.0703, 3757.5525]
2026-01-23 01:51:36,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 483.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:51:36,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 14 seconds)
2026-01-23 01:53:04,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:53:12,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3033.72388 ± 1431.914
2026-01-23 01:53:12,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3689.8794, 3734.076, 3920.112, 3784.582, 4023.0461, 3747.8386, 224.3994, 144.80933, 3482.3662, 3586.1282]
2026-01-23 01:53:12,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 106.0, 114.0, 1000.0, 1000.0]
2026-01-23 01:53:12,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 38 seconds)
2026-01-23 01:54:43,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:54:46,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 977.77979 ± 1175.828
2026-01-23 01:54:46,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3857.6335, 2089.746, 495.21475, 1490.8807, 23.490728, 27.298372, 1233.3643, 209.98273, 320.0427, 30.143885]
2026-01-23 01:54:46,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 563.0, 191.0, 420.0, 38.0, 40.0, 339.0, 92.0, 112.0, 41.0]
2026-01-23 01:54:46,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1299 [DEBUG]: Training session finished
