2026-01-22 23:14:21,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-walker2d/DatasetOffice-bpql-mem2
2026-01-22 23:14:21,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-walker2d/DatasetOffice-bpql-mem2
2026-01-22 23:14:21,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x145da2a7bcd0>}
2026-01-22 23:14:21,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1159 [DEBUG]: using device: cuda
2026-01-22 23:14:22,120 baseline-bpql-noisy-walker2d:77 [WARNING]: args.assumed_delay != args.horizon: 2 != 32
2026-01-22 23:14:22,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1181 [INFO]: Creating new trainer
2026-01-22 23:14:22,125 baseline-bpql-noisy-walker2d:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=29, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2026-01-22 23:14:22,125 baseline-bpql-noisy-walker2d:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-22 23:14:22,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1242 [DEBUG]: Starting training session...
2026-01-22 23:14:22,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 1/100
2026-01-22 23:15:46,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:15:48,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 177.49641 ± 122.570
2026-01-22 23:15:48,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [366.11545, 368.9027, 84.0069, 151.83504, 350.1925, 102.1522, 114.867615, 84.05883, 80.559044, 72.27376]
2026-01-22 23:15:48,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [251.0, 253.0, 197.0, 266.0, 232.0, 214.0, 228.0, 196.0, 195.0, 185.0]
2026-01-22 23:15:48,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (177.50) for latency DatasetOffice
2026-01-22 23:15:48,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 21 minutes, 26 seconds)
2026-01-22 23:17:20,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:17:22,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 70.25076 ± 103.309
2026-01-22 23:17:22,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [-5.5222864, 35.26382, 24.890663, 301.42004, 50.887737, 7.8783536, 63.50965, -11.376288, -2.1663594, 237.72223]
2026-01-22 23:17:22,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [161.0, 141.0, 161.0, 304.0, 215.0, 17.0, 121.0, 135.0, 168.0, 168.0]
2026-01-22 23:17:22,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 26 minutes, 14 seconds)
2026-01-22 23:18:54,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:18:55,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: -31.38421 ± 24.779
2026-01-22 23:18:55,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [-18.196383, -16.563871, -27.567999, -90.506325, -11.776204, -31.003523, -30.734241, -27.156158, -60.422516, 0.085142076]
2026-01-22 23:18:55,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [136.0, 174.0, 141.0, 185.0, 103.0, 112.0, 117.0, 123.0, 166.0, 94.0]
2026-01-22 23:18:55,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 26 minutes, 39 seconds)
2026-01-22 23:20:27,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:20:28,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 280.25412 ± 59.428
2026-01-22 23:20:28,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [266.05914, 282.66324, 344.91043, 317.60034, 268.70288, 240.02565, 340.38177, 229.32115, 357.53067, 155.34595]
2026-01-22 23:20:28,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [170.0, 159.0, 217.0, 156.0, 188.0, 162.0, 241.0, 140.0, 255.0, 146.0]
2026-01-22 23:20:28,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (280.25) for latency DatasetOffice
2026-01-22 23:20:28,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 26 minutes, 22 seconds)
2026-01-22 23:22:01,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:22:03,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 226.67232 ± 200.010
2026-01-22 23:22:03,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [-21.615925, 45.12259, 257.86826, 428.35245, 308.49023, 346.40686, 254.76915, 617.0597, 9.341587, 20.928286]
2026-01-22 23:22:03,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [192.0, 203.0, 225.0, 266.0, 285.0, 276.0, 262.0, 484.0, 178.0, 104.0]
2026-01-22 23:22:03,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 25 minutes, 48 seconds)
2026-01-22 23:23:35,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:23:37,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 262.14273 ± 123.172
2026-01-22 23:23:37,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [271.50833, 314.49033, 431.71564, 145.61195, 356.63852, 335.78088, 98.494644, 390.41156, 234.13744, 42.63795]
2026-01-22 23:23:37,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [197.0, 206.0, 367.0, 113.0, 216.0, 249.0, 229.0, 269.0, 140.0, 92.0]
2026-01-22 23:23:37,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 26 minutes, 54 seconds)
2026-01-22 23:25:09,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:25:13,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 483.21884 ± 239.678
2026-01-22 23:25:13,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [531.4551, 427.37555, 446.45486, 77.12576, 454.0377, 928.5829, 875.62256, 304.32965, 421.02264, 366.18204]
2026-01-22 23:25:13,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [450.0, 338.0, 566.0, 73.0, 316.0, 780.0, 1000.0, 173.0, 312.0, 282.0]
2026-01-22 23:25:13,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (483.22) for latency DatasetOffice
2026-01-22 23:25:13,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 26 minutes, 1 second)
2026-01-22 23:26:44,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:26:45,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 127.53646 ± 150.524
2026-01-22 23:26:45,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [49.27664, 230.73691, 515.02747, 25.91629, 23.983568, 34.133575, 6.0631104, 230.4156, 79.533905, 80.27755]
2026-01-22 23:26:45,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [80.0, 154.0, 502.0, 46.0, 37.0, 42.0, 30.0, 142.0, 93.0, 137.0]
2026-01-22 23:26:45,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 24 minutes, 14 seconds)
2026-01-22 23:28:17,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:28:18,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 178.33904 ± 107.543
2026-01-22 23:28:18,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [255.99551, 36.53562, 118.32202, 41.75766, 225.73291, 16.321985, 322.8664, 263.72662, 249.19203, 252.93956]
2026-01-22 23:28:18,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [176.0, 56.0, 133.0, 59.0, 116.0, 139.0, 190.0, 141.0, 144.0, 206.0]
2026-01-22 23:28:18,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 22 minutes, 26 seconds)
2026-01-22 23:29:50,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:29:51,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 276.13260 ± 88.360
2026-01-22 23:29:51,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [204.60614, 331.79382, 349.0076, 398.42752, 138.34444, 317.90634, 365.19394, 132.97786, 259.19104, 263.87714]
2026-01-22 23:29:51,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [135.0, 244.0, 187.0, 345.0, 116.0, 233.0, 250.0, 106.0, 171.0, 192.0]
2026-01-22 23:29:51,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 20 minutes, 29 seconds)
2026-01-22 23:31:24,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:31:25,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 287.01108 ± 66.939
2026-01-22 23:31:25,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [252.8318, 259.42865, 345.39676, 215.63306, 290.5971, 348.60318, 247.75839, 233.75333, 440.47763, 235.63098]
2026-01-22 23:31:25,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [126.0, 162.0, 185.0, 122.0, 165.0, 187.0, 140.0, 142.0, 235.0, 135.0]
2026-01-22 23:31:25,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 18 minutes, 54 seconds)
2026-01-22 23:32:56,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:32:58,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 303.16080 ± 95.264
2026-01-22 23:32:58,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [218.80957, 181.08621, 286.96463, 346.92212, 293.45392, 230.13963, 263.86984, 493.4008, 265.54065, 451.42047]
2026-01-22 23:32:58,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [132.0, 128.0, 151.0, 194.0, 160.0, 146.0, 178.0, 258.0, 160.0, 267.0]
2026-01-22 23:32:58,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 16 minutes, 24 seconds)
2026-01-22 23:34:30,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:34:32,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 233.62578 ± 99.751
2026-01-22 23:34:32,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [342.65463, 165.62697, 187.49625, 194.82652, 214.0963, 66.903015, 245.1985, 248.7233, 212.65497, 458.0775]
2026-01-22 23:34:32,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [213.0, 149.0, 138.0, 180.0, 184.0, 111.0, 188.0, 196.0, 152.0, 298.0]
2026-01-22 23:34:32,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 15 minutes, 21 seconds)
2026-01-22 23:36:03,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:36:04,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 277.95578 ± 84.804
2026-01-22 23:36:04,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [261.6298, 224.05278, 456.21167, 373.16248, 192.26653, 276.53778, 197.39993, 332.4361, 292.29602, 173.5647]
2026-01-22 23:36:04,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [152.0, 144.0, 283.0, 257.0, 119.0, 176.0, 131.0, 202.0, 178.0, 122.0]
2026-01-22 23:36:04,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 13 minutes, 42 seconds)
2026-01-22 23:37:36,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:37:38,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 507.72662 ± 274.918
2026-01-22 23:37:38,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [1018.32153, 12.148041, 518.23584, 801.4722, 497.33957, 413.7344, 713.0674, 549.6906, 317.9385, 235.31775]
2026-01-22 23:37:38,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [462.0, 32.0, 242.0, 313.0, 226.0, 191.0, 399.0, 249.0, 160.0, 136.0]
2026-01-22 23:37:38,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (507.73) for latency DatasetOffice
2026-01-22 23:37:38,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 12 minutes, 15 seconds)
2026-01-22 23:39:10,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:39:12,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 305.75427 ± 85.037
2026-01-22 23:39:12,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [453.93613, 195.58376, 342.8789, 272.8774, 337.05014, 394.9499, 242.67726, 185.0333, 379.7201, 252.8357]
2026-01-22 23:39:12,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [272.0, 114.0, 189.0, 135.0, 208.0, 224.0, 130.0, 122.0, 219.0, 138.0]
2026-01-22 23:39:12,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 10 minutes, 39 seconds)
2026-01-22 23:40:43,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:40:45,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 334.17862 ± 128.705
2026-01-22 23:40:45,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [370.38422, 299.47165, 237.8092, 242.48036, 232.92049, 290.17685, 270.87646, 259.8111, 651.07404, 486.78192]
2026-01-22 23:40:45,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [195.0, 160.0, 135.0, 134.0, 127.0, 147.0, 140.0, 147.0, 269.0, 247.0]
2026-01-22 23:40:45,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 9 minutes, 10 seconds)
2026-01-22 23:42:17,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:42:19,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 371.47675 ± 139.415
2026-01-22 23:42:19,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [353.7489, 439.05176, 493.42572, 309.2474, 138.03024, 578.815, 199.41078, 495.81488, 236.24654, 470.97635]
2026-01-22 23:42:19,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [230.0, 233.0, 284.0, 170.0, 140.0, 314.0, 161.0, 249.0, 146.0, 269.0]
2026-01-22 23:42:19,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 7 minutes, 38 seconds)
2026-01-22 23:43:50,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:43:51,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 380.63116 ± 108.194
2026-01-22 23:43:51,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [325.77548, 285.0829, 311.96902, 586.4228, 503.22552, 281.33194, 355.14172, 435.376, 480.4655, 241.5207]
2026-01-22 23:43:51,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [146.0, 139.0, 140.0, 323.0, 188.0, 131.0, 158.0, 210.0, 217.0, 110.0]
2026-01-22 23:43:51,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 6 minutes, 1 second)
2026-01-22 23:45:23,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:45:24,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 304.37466 ± 71.931
2026-01-22 23:45:24,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [261.3513, 329.654, 257.46246, 285.58972, 273.64258, 294.82666, 266.3697, 273.3308, 289.6901, 511.82925]
2026-01-22 23:45:24,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [122.0, 173.0, 141.0, 136.0, 133.0, 142.0, 119.0, 126.0, 164.0, 254.0]
2026-01-22 23:45:24,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 4 minutes, 14 seconds)
2026-01-22 23:46:56,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:46:57,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 325.08246 ± 20.181
2026-01-22 23:46:57,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [326.95648, 310.55676, 313.0441, 335.12872, 329.08505, 340.57526, 326.6639, 350.25415, 275.95892, 342.60138]
2026-01-22 23:46:57,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [142.0, 156.0, 137.0, 155.0, 153.0, 152.0, 171.0, 170.0, 115.0, 168.0]
2026-01-22 23:46:57,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 2 minutes, 29 seconds)
2026-01-22 23:48:28,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:48:31,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 737.12781 ± 228.630
2026-01-22 23:48:31,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [1381.3689, 705.6423, 648.1901, 742.5008, 597.6381, 520.7016, 782.9598, 615.6532, 614.75836, 761.8646]
2026-01-22 23:48:31,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [552.0, 396.0, 269.0, 270.0, 242.0, 280.0, 285.0, 248.0, 242.0, 296.0]
2026-01-22 23:48:31,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (737.13) for latency DatasetOffice
2026-01-22 23:48:31,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 1 minute, 17 seconds)
2026-01-22 23:50:05,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:50:07,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 499.87558 ± 630.029
2026-01-22 23:50:07,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [368.2244, 356.24066, 231.9732, 2330.5295, 372.32208, 353.5174, 317.17328, 32.12571, 42.139576, 594.5099]
2026-01-22 23:50:07,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [167.0, 172.0, 129.0, 1000.0, 178.0, 172.0, 142.0, 95.0, 84.0, 258.0]
2026-01-22 23:50:07,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 9 seconds)
2026-01-22 23:51:40,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:51:43,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 543.80145 ± 315.370
2026-01-22 23:51:43,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [745.12103, 171.99977, 525.87036, 803.8805, 488.00635, 544.6477, 223.62413, 215.28903, 450.75717, 1268.8186]
2026-01-22 23:51:43,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [279.0, 119.0, 227.0, 368.0, 224.0, 246.0, 128.0, 132.0, 201.0, 677.0]
2026-01-22 23:51:43,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 25/100 (estimated time remaining: 1 hour, 59 minutes, 24 seconds)
2026-01-22 23:53:12,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:53:14,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 397.99377 ± 191.936
2026-01-22 23:53:14,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [864.05316, 192.75092, 271.27533, 621.1147, 384.96622, 449.58575, 277.66797, 298.77527, 293.54886, 326.19952]
2026-01-22 23:53:14,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [339.0, 93.0, 123.0, 210.0, 152.0, 169.0, 121.0, 134.0, 130.0, 147.0]
2026-01-22 23:53:14,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 26/100 (estimated time remaining: 1 hour, 57 minutes, 27 seconds)
2026-01-22 23:54:47,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:54:48,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 317.91602 ± 88.014
2026-01-22 23:54:48,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [287.34735, 531.8927, 234.87628, 404.051, 305.24106, 366.6698, 274.58008, 282.90964, 231.90079, 259.6913]
2026-01-22 23:54:48,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [124.0, 202.0, 102.0, 209.0, 127.0, 150.0, 153.0, 129.0, 100.0, 111.0]
2026-01-22 23:54:48,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 27/100 (estimated time remaining: 1 hour, 56 minutes, 8 seconds)
2026-01-22 23:56:19,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:56:21,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 442.96118 ± 174.504
2026-01-22 23:56:21,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [381.91788, 243.47046, 744.80096, 471.54337, 403.6009, 272.15573, 790.57245, 377.67252, 332.18433, 411.69315]
2026-01-22 23:56:21,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [172.0, 103.0, 277.0, 200.0, 177.0, 112.0, 266.0, 170.0, 158.0, 181.0]
2026-01-22 23:56:21,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 28/100 (estimated time remaining: 1 hour, 54 minutes, 18 seconds)
2026-01-22 23:57:53,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:57:55,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 577.27277 ± 267.766
2026-01-22 23:57:55,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [585.2908, 1036.4203, 284.44427, 1002.03894, 720.5423, 389.46286, 712.8781, 410.15775, 293.12027, 338.37222]
2026-01-22 23:57:55,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [326.0, 322.0, 130.0, 387.0, 235.0, 142.0, 259.0, 147.0, 112.0, 129.0]
2026-01-22 23:57:55,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 29/100 (estimated time remaining: 1 hour, 52 minutes, 21 seconds)
2026-01-22 23:59:26,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:59:28,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 659.82550 ± 224.468
2026-01-22 23:59:28,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [423.67398, 511.3096, 945.84784, 311.87613, 748.3968, 596.7996, 426.27704, 909.7832, 941.03784, 783.2529]
2026-01-22 23:59:28,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [144.0, 291.0, 311.0, 122.0, 290.0, 205.0, 151.0, 287.0, 304.0, 257.0]
2026-01-22 23:59:28,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 30/100 (estimated time remaining: 1 hour, 50 minutes, 15 seconds)
2026-01-23 00:00:59,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:01:01,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 477.38809 ± 131.489
2026-01-23 00:01:01,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [643.8465, 526.45215, 466.78897, 562.7456, 299.52008, 496.70056, 215.04793, 409.91006, 504.41226, 648.45703]
2026-01-23 00:01:01,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [249.0, 207.0, 166.0, 228.0, 123.0, 186.0, 99.0, 160.0, 190.0, 254.0]
2026-01-23 00:01:01,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 31/100 (estimated time remaining: 1 hour, 49 minutes)
2026-01-23 00:02:32,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:02:34,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 604.75677 ± 102.977
2026-01-23 00:02:34,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [614.8421, 553.52435, 710.0798, 468.07492, 513.969, 712.8277, 512.8443, 513.67694, 789.0685, 658.6606]
2026-01-23 00:02:34,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [229.0, 196.0, 251.0, 168.0, 179.0, 252.0, 177.0, 174.0, 282.0, 243.0]
2026-01-23 00:02:34,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 32/100 (estimated time remaining: 1 hour, 47 minutes, 6 seconds)
2026-01-23 00:04:04,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:04:09,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1800.64124 ± 1244.779
2026-01-23 00:04:09,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3026.128, 555.0601, 492.39682, 465.39944, 3038.5576, 3108.5315, 3051.125, 675.0947, 598.4161, 2995.7026]
2026-01-23 00:04:09,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 195.0, 206.0, 173.0, 1000.0, 1000.0, 1000.0, 261.0, 246.0, 1000.0]
2026-01-23 00:04:09,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (1800.64) for latency DatasetOffice
2026-01-23 00:04:09,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 33/100 (estimated time remaining: 1 hour, 46 minutes, 10 seconds)
2026-01-23 00:05:40,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:05:41,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 260.06854 ± 277.501
2026-01-23 00:05:41,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [327.61856, -0.24785762, 134.333, 6.3997154, 112.141205, 24.044146, 19.305609, 712.85706, 641.32495, 622.9093]
2026-01-23 00:05:41,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [138.0, 26.0, 89.0, 41.0, 88.0, 37.0, 38.0, 243.0, 255.0, 234.0]
2026-01-23 00:05:41,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 34/100 (estimated time remaining: 1 hour, 44 minutes, 3 seconds)
2026-01-23 00:07:13,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:07:19,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1895.60742 ± 1027.468
2026-01-23 00:07:19,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [2963.7366, 1824.9191, 565.56604, 831.2657, 2868.133, 2768.5266, 2785.4797, 614.5868, 779.07935, 2954.7812]
2026-01-23 00:07:19,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 565.0, 226.0, 290.0, 1000.0, 1000.0, 1000.0, 220.0, 290.0, 1000.0]
2026-01-23 00:07:19,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (1895.61) for latency DatasetOffice
2026-01-23 00:07:19,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 35/100 (estimated time remaining: 1 hour, 43 minutes, 28 seconds)
2026-01-23 00:08:50,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:08:54,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1149.09204 ± 757.616
2026-01-23 00:08:54,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [1268.5264, 731.75885, 839.36456, 967.75305, 1184.0402, 737.991, 485.30383, 1189.8074, 776.0511, 3310.3232]
2026-01-23 00:08:54,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [410.0, 278.0, 274.0, 332.0, 345.0, 276.0, 174.0, 374.0, 285.0, 1000.0]
2026-01-23 00:08:54,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 42 minutes, 24 seconds)
2026-01-23 00:10:31,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:10:36,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1603.59570 ± 845.707
2026-01-23 00:10:36,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [1996.6929, 1814.9951, 704.96924, 2480.23, 2757.354, 413.9666, 1033.5685, 653.5391, 2771.9548, 1408.6869]
2026-01-23 00:10:36,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [655.0, 641.0, 247.0, 923.0, 1000.0, 182.0, 331.0, 239.0, 1000.0, 486.0]
2026-01-23 00:10:36,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 42 minutes, 59 seconds)
2026-01-23 00:12:02,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:12:09,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2039.78833 ± 1043.582
2026-01-23 00:12:09,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [2774.6956, 2846.9863, 468.35388, 750.2498, 2975.793, 800.7693, 2890.4778, 2839.176, 1079.0371, 2972.3435]
2026-01-23 00:12:09,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 182.0, 257.0, 1000.0, 265.0, 1000.0, 1000.0, 354.0, 1000.0]
2026-01-23 00:12:09,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (2039.79) for latency DatasetOffice
2026-01-23 00:12:09,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 40 minutes, 45 seconds)
2026-01-23 00:13:39,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:13:44,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1790.86755 ± 913.027
2026-01-23 00:13:44,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3190.8162, 2810.199, 874.52356, 1201.109, 2246.1333, 1511.3962, 1030.9713, 3148.484, 1050.9315, 844.1139]
2026-01-23 00:13:44,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 272.0, 368.0, 621.0, 431.0, 335.0, 1000.0, 324.0, 290.0]
2026-01-23 00:13:44,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 39 minutes, 52 seconds)
2026-01-23 00:15:16,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:15:23,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2490.34814 ± 980.072
2026-01-23 00:15:23,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [990.04865, 3238.7708, 3253.9182, 1866.6699, 3268.9775, 3254.28, 3217.4949, 3287.8267, 1739.078, 786.417]
2026-01-23 00:15:23,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [301.0, 1000.0, 1000.0, 584.0, 1000.0, 1000.0, 1000.0, 1000.0, 544.0, 267.0]
2026-01-23 00:15:23,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (2490.35) for latency DatasetOffice
2026-01-23 00:15:23,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 38 minutes, 34 seconds)
2026-01-23 00:16:54,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:16:58,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1372.90747 ± 1116.809
2026-01-23 00:16:58,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [935.3232, 3495.804, 1375.7006, 675.9136, 1006.52563, 3514.1487, 5.5622764, 799.04553, 837.1596, 1083.892]
2026-01-23 00:16:58,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [290.0, 1000.0, 390.0, 227.0, 312.0, 957.0, 26.0, 263.0, 274.0, 334.0]
2026-01-23 00:16:58,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 36 minutes, 50 seconds)
2026-01-23 00:18:29,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:18:37,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2828.34692 ± 757.658
2026-01-23 00:18:37,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3141.4385, 3017.2322, 3093.4795, 3033.43, 3142.7964, 559.4589, 3129.9375, 3085.7039, 3019.02, 3060.973]
2026-01-23 00:18:37,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 212.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:18:37,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (2828.35) for latency DatasetOffice
2026-01-23 00:18:37,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 34 minutes, 30 seconds)
2026-01-23 00:20:05,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:20:09,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1278.57837 ± 641.334
2026-01-23 00:20:09,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [1090.2181, 1036.1992, 1838.9879, 2229.8916, 166.01031, 1940.078, 563.8075, 1944.6561, 1009.8138, 966.1213]
2026-01-23 00:20:09,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [329.0, 309.0, 521.0, 652.0, 103.0, 552.0, 205.0, 620.0, 328.0, 301.0]
2026-01-23 00:20:09,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 32 minutes, 46 seconds)
2026-01-23 00:21:40,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:21:47,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2461.23145 ± 828.986
2026-01-23 00:21:47,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [970.844, 3080.6362, 1779.4395, 2953.8677, 2979.2302, 2990.3425, 2979.3997, 2963.6577, 2953.167, 961.7282]
2026-01-23 00:21:47,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [289.0, 1000.0, 570.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 296.0]
2026-01-23 00:21:47,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 31 minutes, 47 seconds)
2026-01-23 00:23:21,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:23:26,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1994.37329 ± 932.358
2026-01-23 00:23:26,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [1501.9102, 1126.4441, 1227.5862, 3323.491, 1462.2821, 3092.3538, 900.4744, 1382.6829, 3413.6194, 2512.8884]
2026-01-23 00:23:26,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [447.0, 361.0, 372.0, 1000.0, 398.0, 886.0, 280.0, 402.0, 1000.0, 713.0]
2026-01-23 00:23:26,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 30 minutes, 3 seconds)
2026-01-23 00:24:58,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:25:07,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2956.44775 ± 667.216
2026-01-23 00:25:07,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [1056.1587, 3213.8208, 3476.5295, 2599.621, 3158.1345, 3201.449, 3155.279, 3272.8098, 3248.877, 3181.7979]
2026-01-23 00:25:07,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [312.0, 1000.0, 1000.0, 823.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:25:07,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (2956.45) for latency DatasetOffice
2026-01-23 00:25:07,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 29 minutes, 38 seconds)
2026-01-23 00:26:39,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:26:42,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1029.61169 ± 725.121
2026-01-23 00:26:42,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [712.93176, 498.50375, 656.71875, 806.4966, 3120.4917, 751.84296, 769.32367, 1322.4863, 786.28046, 871.0421]
2026-01-23 00:26:42,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [240.0, 188.0, 212.0, 263.0, 865.0, 249.0, 252.0, 379.0, 254.0, 274.0]
2026-01-23 00:26:42,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 27 minutes, 15 seconds)
2026-01-23 00:28:06,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:28:11,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2106.78857 ± 1225.397
2026-01-23 00:28:11,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [795.4344, 3471.0176, 2927.3423, 3440.661, 1326.426, 3473.4568, 818.8555, 523.78253, 1064.1926, 3226.7173]
2026-01-23 00:28:11,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [261.0, 1000.0, 844.0, 1000.0, 377.0, 1000.0, 269.0, 173.0, 329.0, 918.0]
2026-01-23 00:28:12,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 25 minutes, 16 seconds)
2026-01-23 00:29:48,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:29:53,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1723.87036 ± 1111.837
2026-01-23 00:29:53,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [605.88983, 621.4552, 3318.3645, 843.98346, 756.7471, 3385.728, 863.35516, 1813.3988, 1850.4429, 3179.3381]
2026-01-23 00:29:53,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [212.0, 238.0, 1000.0, 272.0, 217.0, 1000.0, 273.0, 520.0, 536.0, 1000.0]
2026-01-23 00:29:53,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 24 minutes, 9 seconds)
2026-01-23 00:31:18,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:31:25,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2328.05933 ± 1310.479
2026-01-23 00:31:25,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3271.3882, 879.894, 176.99202, 3.569467, 3030.4883, 3195.5005, 3213.5166, 3155.4785, 3194.7107, 3159.0542]
2026-01-23 00:31:25,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 305.0, 109.0, 29.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:31:25,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 21 minutes, 23 seconds)
2026-01-23 00:33:01,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:33:08,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2530.92212 ± 910.941
2026-01-23 00:33:08,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3270.3933, 1003.1045, 2059.678, 1630.6356, 3250.6824, 3248.659, 3231.3484, 1157.3063, 3222.1165, 3235.2944]
2026-01-23 00:33:08,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 304.0, 606.0, 479.0, 1000.0, 1000.0, 1000.0, 358.0, 1000.0, 1000.0]
2026-01-23 00:33:08,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 20 minutes, 15 seconds)
2026-01-23 00:34:38,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:34:45,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2812.84375 ± 972.159
2026-01-23 00:34:45,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3314.4592, 3218.8574, 771.06244, 3318.8748, 3362.0547, 978.1182, 3163.252, 3342.934, 3283.543, 3375.282]
2026-01-23 00:34:45,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 252.0, 1000.0, 1000.0, 348.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:34:45,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 18 minutes, 59 seconds)
2026-01-23 00:36:16,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:36:21,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1902.57874 ± 1462.717
2026-01-23 00:36:21,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [956.18475, 1010.3348, 198.81383, 205.54404, -13.068676, 3287.355, 3412.0027, 3370.6536, 3306.206, 3291.761]
2026-01-23 00:36:21,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [345.0, 461.0, 118.0, 101.0, 15.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:36:21,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 18 minutes, 22 seconds)
2026-01-23 00:37:51,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:38:00,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3306.92822 ± 694.157
2026-01-23 00:38:00,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3514.8435, 3508.9343, 3535.378, 3614.8423, 3325.9001, 1239.1163, 3571.5574, 3642.5923, 3522.9263, 3593.1895]
2026-01-23 00:38:00,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 921.0, 381.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:38:00,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (3306.93) for latency DatasetOffice
2026-01-23 00:38:00,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 16 minutes, 19 seconds)
2026-01-23 00:39:34,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:39:42,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3129.32227 ± 911.933
2026-01-23 00:39:42,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3422.865, 3572.69, 3553.098, 3545.1746, 3596.7495, 627.6712, 2316.2185, 3500.011, 3596.2395, 3562.5063]
2026-01-23 00:39:42,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [967.0, 1000.0, 1000.0, 1000.0, 1000.0, 214.0, 644.0, 915.0, 1000.0, 1000.0]
2026-01-23 00:39:42,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 16 minutes, 17 seconds)
2026-01-23 00:41:16,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:41:24,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2896.35059 ± 794.519
2026-01-23 00:41:24,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3305.2588, 3330.031, 3361.4873, 3386.775, 1048.4031, 2855.854, 3328.4583, 3344.4014, 1669.4061, 3333.4321]
2026-01-23 00:41:24,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 336.0, 838.0, 1000.0, 1000.0, 507.0, 1000.0]
2026-01-23 00:41:24,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 14 minutes, 23 seconds)
2026-01-23 00:42:50,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:42:57,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2523.20337 ± 1207.232
2026-01-23 00:42:57,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3354.354, 3311.3374, 594.7319, 3288.5496, 3314.2476, 3296.4734, 747.06647, 3314.188, 3312.7185, 698.367]
2026-01-23 00:42:57,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 190.0, 1000.0, 1000.0, 1000.0, 255.0, 1000.0, 1000.0, 246.0]
2026-01-23 00:42:57,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 12 minutes, 9 seconds)
2026-01-23 00:44:32,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:44:38,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2099.87231 ± 1317.053
2026-01-23 00:44:38,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [977.34235, 694.6642, 3263.9358, 3430.9324, 2127.974, 404.07935, 171.96574, 3262.0396, 3342.4192, 3323.3691]
2026-01-23 00:44:38,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [297.0, 242.0, 1000.0, 1000.0, 651.0, 169.0, 107.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:44:38,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 11 minutes, 11 seconds)
2026-01-23 00:46:08,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:46:17,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3052.92993 ± 923.435
2026-01-23 00:46:17,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3468.9983, 3513.786, 751.242, 1779.9099, 3466.292, 3492.9805, 3532.1272, 3564.054, 3438.8743, 3521.0352]
2026-01-23 00:46:17,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 236.0, 518.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:46:17,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 9 minutes, 31 seconds)
2026-01-23 00:47:47,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:47:53,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2183.58350 ± 1082.073
2026-01-23 00:47:53,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [1389.0254, 3472.9937, 563.5618, 1190.646, 3437.3328, 3440.871, 1651.1522, 3483.688, 1555.2644, 1651.3003]
2026-01-23 00:47:53,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [433.0, 1000.0, 195.0, 347.0, 1000.0, 1000.0, 496.0, 1000.0, 440.0, 489.0]
2026-01-23 00:47:53,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 7 minutes, 5 seconds)
2026-01-23 00:49:27,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:49:32,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1781.63501 ± 1726.700
2026-01-23 00:49:32,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3669.6636, 3665.1511, 3617.1616, 3588.7, 2932.9487, 6.6183825, 21.385918, 303.23904, 11.199954, 0.2838999]
2026-01-23 00:49:32,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 821.0, 31.0, 32.0, 114.0, 30.0, 27.0]
2026-01-23 00:49:32,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 5 minutes)
2026-01-23 00:51:03,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:51:12,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3338.67920 ± 280.779
2026-01-23 00:51:12,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3426.7903, 3417.786, 3465.9233, 3476.6833, 2500.5574, 3402.9426, 3390.2712, 3463.8647, 3438.9866, 3402.9885]
2026-01-23 00:51:12,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 736.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:51:12,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (3338.68) for latency DatasetOffice
2026-01-23 00:51:12,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 4 minutes, 22 seconds)
2026-01-23 00:52:39,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:52:47,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3347.81689 ± 451.774
2026-01-23 00:52:47,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3591.9797, 3541.154, 3606.931, 2385.853, 3518.754, 3554.1536, 3570.1162, 3607.1848, 3592.6987, 2509.3435]
2026-01-23 00:52:47,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 674.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 713.0]
2026-01-23 00:52:47,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (3347.82) for latency DatasetOffice
2026-01-23 00:52:47,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 1 minute, 58 seconds)
2026-01-23 00:54:19,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:54:25,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2016.74023 ± 1400.118
2026-01-23 00:54:25,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3622.3943, 3140.2812, 3551.4177, 2514.1096, 2146.278, 1153.7183, 492.73544, 36.73385, 20.154411, 3489.5796]
2026-01-23 00:54:25,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 889.0, 1000.0, 720.0, 626.0, 379.0, 180.0, 50.0, 46.0, 1000.0]
2026-01-23 00:54:25,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 13 seconds)
2026-01-23 00:56:01,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:56:09,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2936.06372 ± 1302.306
2026-01-23 00:56:09,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [531.3582, 3547.306, 3567.4873, 3564.7744, 3604.9846, 3637.836, 3601.0632, 3581.281, 3580.5974, 143.9487]
2026-01-23 00:56:09,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [179.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 137.0]
2026-01-23 00:56:09,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 65/100 (estimated time remaining: 59 minutes, 26 seconds)
2026-01-23 00:57:34,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:57:42,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2916.26392 ± 983.779
2026-01-23 00:57:42,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3559.3245, 3610.5867, 1487.0422, 3576.768, 3519.3289, 3550.712, 1263.9486, 1499.0194, 3529.946, 3565.9636]
2026-01-23 00:57:42,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 449.0, 1000.0, 1000.0, 1000.0, 372.0, 447.0, 1000.0, 1000.0]
2026-01-23 00:57:42,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 66/100 (estimated time remaining: 57 minutes, 8 seconds)
2026-01-23 00:59:14,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:59:23,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3357.66992 ± 856.337
2026-01-23 00:59:23,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3655.3582, 3636.6235, 3724.5364, 3583.2166, 3597.419, 3605.3062, 791.5843, 3689.5566, 3629.3372, 3663.7583]
2026-01-23 00:59:23,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 236.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:59:23,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (3357.67) for latency DatasetOffice
2026-01-23 00:59:23,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 67/100 (estimated time remaining: 55 minutes, 36 seconds)
2026-01-23 01:00:57,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:01:05,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3372.10156 ± 779.738
2026-01-23 01:01:05,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3652.5945, 3589.0686, 3656.8792, 3700.58, 3589.416, 3674.337, 3612.2273, 1035.6804, 3632.2944, 3577.9404]
2026-01-23 01:01:05,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 300.0, 1000.0, 1000.0]
2026-01-23 01:01:05,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (3372.10) for latency DatasetOffice
2026-01-23 01:01:05,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 68/100 (estimated time remaining: 54 minutes, 47 seconds)
2026-01-23 01:02:33,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:02:40,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2401.79272 ± 1454.394
2026-01-23 01:02:40,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3582.7136, 3544.8052, 3578.5476, 3593.1516, 578.0135, 454.63852, 3360.8137, 3593.1118, 1724.5583, 7.5746436]
2026-01-23 01:02:40,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 196.0, 160.0, 921.0, 1000.0, 506.0, 32.0]
2026-01-23 01:02:40,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 69/100 (estimated time remaining: 52 minutes, 47 seconds)
2026-01-23 01:04:09,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:04:18,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3183.63281 ± 967.759
2026-01-23 01:04:18,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3645.3481, 3635.5667, 3583.7773, 3594.8079, 3606.0625, 3583.413, 462.5554, 2481.7327, 3632.773, 3610.2915]
2026-01-23 01:04:18,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 156.0, 691.0, 1000.0, 1000.0]
2026-01-23 01:04:18,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 70/100 (estimated time remaining: 50 minutes, 31 seconds)
2026-01-23 01:05:51,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:05:55,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1360.04077 ± 933.760
2026-01-23 01:05:55,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [844.36005, 1870.803, 629.0083, 1725.3959, 772.7205, 678.72174, 3888.5273, 902.40186, 1045.7036, 1242.7653]
2026-01-23 01:05:55,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [268.0, 503.0, 200.0, 465.0, 237.0, 218.0, 1000.0, 269.0, 311.0, 362.0]
2026-01-23 01:05:55,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 71/100 (estimated time remaining: 49 minutes, 18 seconds)
2026-01-23 01:07:24,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:07:32,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3200.10864 ± 1097.769
2026-01-23 01:07:32,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [854.7803, 3781.098, 3777.2773, 3581.681, 3753.3186, 3717.4004, 3775.2166, 1170.2183, 3821.4617, 3768.6377]
2026-01-23 01:07:32,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [262.0, 1000.0, 1000.0, 947.0, 1000.0, 1000.0, 1000.0, 329.0, 1000.0, 1000.0]
2026-01-23 01:07:32,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 72/100 (estimated time remaining: 47 minutes, 15 seconds)
2026-01-23 01:09:08,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:09:15,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2837.47803 ± 1284.513
2026-01-23 01:09:15,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3702.585, 3618.352, 3668.0718, 3747.4956, 742.4358, 3706.3372, 1217.6913, 3633.5422, 697.97235, 3640.2966]
2026-01-23 01:09:15,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 237.0, 1000.0, 339.0, 1000.0, 223.0, 1000.0]
2026-01-23 01:09:15,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 73/100 (estimated time remaining: 45 minutes, 43 seconds)
2026-01-23 01:10:47,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:10:56,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3539.43213 ± 38.652
2026-01-23 01:10:56,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3504.414, 3523.1655, 3584.529, 3536.7112, 3517.7422, 3590.0352, 3576.4585, 3478.512, 3500.834, 3581.917]
2026-01-23 01:10:56,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:10:56,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (3539.43) for latency DatasetOffice
2026-01-23 01:10:56,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 74/100 (estimated time remaining: 44 minutes, 39 seconds)
2026-01-23 01:12:21,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:12:26,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2206.51978 ± 1533.828
2026-01-23 01:12:26,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3557.5542, 953.44366, 184.08595, 32.55467, 298.42328, 3548.5833, 3582.1257, 3554.6084, 2803.0588, 3550.7583]
2026-01-23 01:12:26,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 316.0, 104.0, 81.0, 150.0, 1000.0, 1000.0, 1000.0, 789.0, 1000.0]
2026-01-23 01:12:26,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 75/100 (estimated time remaining: 42 minutes, 22 seconds)
2026-01-23 01:13:58,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:14:07,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3461.72021 ± 537.304
2026-01-23 01:14:07,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3664.8926, 1850.3365, 3646.483, 3636.3318, 3665.0254, 3633.061, 3631.3186, 3638.063, 3620.2004, 3631.4915]
2026-01-23 01:14:07,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 521.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:14:07,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 76/100 (estimated time remaining: 41 minutes, 1 second)
2026-01-23 01:15:40,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:15:48,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3179.15161 ± 1030.862
2026-01-23 01:15:48,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3704.1938, 1406.2257, 3648.4478, 3726.4033, 3666.1216, 3718.893, 3706.8853, 3702.0242, 859.03516, 3653.2844]
2026-01-23 01:15:48,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 425.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 276.0, 1000.0]
2026-01-23 01:15:48,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 77/100 (estimated time remaining: 39 minutes, 42 seconds)
2026-01-23 01:17:18,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:17:23,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1849.01599 ± 1795.335
2026-01-23 01:17:23,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [30.219479, -2.926156, 169.65193, 107.93113, -31.587452, 3644.3013, 3682.033, 3632.2803, 3682.137, 3576.1206]
2026-01-23 01:17:23,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [38.0, 27.0, 94.0, 73.0, 80.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:17:23,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 78/100 (estimated time remaining: 37 minutes, 22 seconds)
2026-01-23 01:19:00,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:19:10,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3645.16333 ± 20.927
2026-01-23 01:19:10,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3646.7322, 3685.5947, 3635.2163, 3629.0823, 3660.8147, 3610.1099, 3621.9211, 3646.5244, 3652.8809, 3662.7568]
2026-01-23 01:19:10,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:19:10,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (3645.16) for latency DatasetOffice
2026-01-23 01:19:10,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 79/100 (estimated time remaining: 36 minutes, 11 seconds)
2026-01-23 01:20:34,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:20:42,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3171.31641 ± 1020.440
2026-01-23 01:20:42,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3652.226, 3640.6575, 3655.4646, 1915.8262, 3597.9407, 3650.8154, 3692.9636, 3682.9607, 3687.1606, 537.14905]
2026-01-23 01:20:42,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 565.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 195.0]
2026-01-23 01:20:42,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 80/100 (estimated time remaining: 34 minutes, 39 seconds)
2026-01-23 01:22:19,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:22:28,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3682.16162 ± 64.611
2026-01-23 01:22:28,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3665.841, 3728.1653, 3704.531, 3732.8447, 3725.4443, 3688.743, 3683.5688, 3497.9985, 3695.7024, 3698.7756]
2026-01-23 01:22:28,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 930.0, 1000.0, 1000.0]
2026-01-23 01:22:28,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (3682.16) for latency DatasetOffice
2026-01-23 01:22:28,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 81/100 (estimated time remaining: 33 minutes, 25 seconds)
2026-01-23 01:24:00,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:24:08,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3411.60278 ± 702.662
2026-01-23 01:24:08,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3675.5767, 3629.283, 3629.813, 3648.926, 3668.7063, 3639.9956, 3621.5269, 3648.5913, 1304.1693, 3649.441]
2026-01-23 01:24:08,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 398.0, 1000.0]
2026-01-23 01:24:08,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 82/100 (estimated time remaining: 31 minutes, 38 seconds)
2026-01-23 01:25:35,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:25:44,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1151.96765 ± 652.112
2026-01-23 01:25:44,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [1010.56836, 1023.0934, 1012.663, 1006.8668, 1014.32355, 1002.7109, 405.82864, 3032.0571, 1002.6015, 1008.9637]
2026-01-23 01:25:44,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 535.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:25:44,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 83/100 (estimated time remaining: 30 minutes, 5 seconds)
2026-01-23 01:27:13,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:27:14,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: -5.77648 ± 4.414
2026-01-23 01:27:14,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [-8.046714, -2.5381994, -5.6626263, -4.3319793, -5.73561, -2.8152936, -1.6919184, -17.941021, -4.0257773, -4.975696]
2026-01-23 01:27:14,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [139.0, 121.0, 127.0, 135.0, 122.0, 119.0, 122.0, 115.0, 140.0, 124.0]
2026-01-23 01:27:14,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 84/100 (estimated time remaining: 27 minutes, 25 seconds)
2026-01-23 01:28:43,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:28:44,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: -22.58173 ± 1.923
2026-01-23 01:28:44,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [-25.006336, -21.626898, -20.029099, -21.51745, -25.190058, -24.617865, -20.004774, -24.205729, -22.554438, -21.06462]
2026-01-23 01:28:44,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [84.0, 82.0, 75.0, 82.0, 86.0, 85.0, 82.0, 88.0, 81.0, 84.0]
2026-01-23 01:28:44,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 85/100 (estimated time remaining: 25 minutes, 43 seconds)
2026-01-23 01:30:19,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:30:27,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 887.27374 ± 266.264
2026-01-23 01:30:27,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [978.8808, 953.675, 984.4613, 954.7457, 960.28284, 1000.9171, 987.85284, 999.76416, 90.05731, 962.10034]
2026-01-23 01:30:27,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 213.0, 1000.0]
2026-01-23 01:30:27,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 86/100 (estimated time remaining: 23 minutes, 56 seconds)
2026-01-23 01:31:53,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:32:01,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3437.29614 ± 618.097
2026-01-23 01:32:01,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3670.9314, 3641.9482, 3667.5962, 3654.2986, 3672.1523, 3642.4165, 3638.3467, 3567.281, 3632.99, 1585.0015]
2026-01-23 01:32:01,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 463.0]
2026-01-23 01:32:01,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 87/100 (estimated time remaining: 22 minutes, 5 seconds)
2026-01-23 01:33:33,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:33:42,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3348.60815 ± 862.671
2026-01-23 01:33:42,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [763.4962, 3676.8071, 3699.938, 3568.4353, 3583.6763, 3670.5398, 3620.6116, 3665.231, 3599.4414, 3637.9067]
2026-01-23 01:33:42,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [260.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:33:42,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 88/100 (estimated time remaining: 20 minutes, 40 seconds)
2026-01-23 01:35:20,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:35:26,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2492.33398 ± 1686.679
2026-01-23 01:35:26,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3793.4956, 3808.3933, 3785.9004, 3727.305, 3771.9873, 3825.1865, 2141.039, 21.373886, 14.43852, 34.22039]
2026-01-23 01:35:26,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 607.0, 29.0, 38.0, 63.0]
2026-01-23 01:35:26,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 89/100 (estimated time remaining: 19 minutes, 42 seconds)
2026-01-23 01:36:57,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:37:06,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3667.29346 ± 22.096
2026-01-23 01:37:06,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3686.1711, 3699.934, 3657.9802, 3686.8677, 3626.5918, 3633.3994, 3667.6677, 3663.897, 3670.3, 3680.1233]
2026-01-23 01:37:06,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:37:06,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 90/100 (estimated time remaining: 18 minutes, 24 seconds)
2026-01-23 01:38:38,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:38:47,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3575.70898 ± 469.378
2026-01-23 01:38:47,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3798.4814, 3782.7146, 3822.336, 3772.168, 3786.373, 3819.377, 3842.926, 3855.491, 2666.0964, 2611.1243]
2026-01-23 01:38:47,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 733.0, 709.0]
2026-01-23 01:38:47,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 91/100 (estimated time remaining: 16 minutes, 38 seconds)
2026-01-23 01:40:12,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:40:17,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1750.07007 ± 1519.891
2026-01-23 01:40:17,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3763.8755, 3643.2354, 3705.826, 2594.275, 2079.7017, 25.316397, 1172.3401, 233.03969, 264.50424, 18.587357]
2026-01-23 01:40:17,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 958.0, 1000.0, 700.0, 584.0, 41.0, 367.0, 100.0, 109.0, 29.0]
2026-01-23 01:40:17,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 92/100 (estimated time remaining: 14 minutes, 51 seconds)
2026-01-23 01:41:53,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:42:02,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3491.54492 ± 827.807
2026-01-23 01:42:02,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3806.7598, 3841.0967, 3808.3496, 3766.7122, 3802.5732, 3848.8647, 3833.4854, 3317.8513, 3839.9536, 1049.8002]
2026-01-23 01:42:02,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 859.0, 1000.0, 337.0]
2026-01-23 01:42:02,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 93/100 (estimated time remaining: 13 minutes, 20 seconds)
2026-01-23 01:43:28,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:43:37,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3538.93115 ± 887.285
2026-01-23 01:43:37,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3786.0117, 3845.6882, 3853.9739, 3790.8225, 3834.0288, 3824.9868, 878.31226, 3870.849, 3868.145, 3836.4941]
2026-01-23 01:43:37,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 275.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:43:37,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 94/100 (estimated time remaining: 11 minutes, 26 seconds)
2026-01-23 01:45:09,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:45:18,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3805.10547 ± 44.026
2026-01-23 01:45:18,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3812.5793, 3854.1003, 3745.4268, 3795.4092, 3840.9387, 3853.9558, 3840.4836, 3774.4717, 3814.1892, 3719.4995]
2026-01-23 01:45:18,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:45:18,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (3805.11) for latency DatasetOffice
2026-01-23 01:45:18,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 95/100 (estimated time remaining: 9 minutes, 50 seconds)
2026-01-23 01:46:52,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:47:01,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3809.63037 ± 98.779
2026-01-23 01:47:01,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3834.2314, 3827.6182, 3834.8618, 3861.031, 3839.4092, 3857.796, 3825.0933, 3852.7188, 3848.1577, 3515.387]
2026-01-23 01:47:01,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 909.0]
2026-01-23 01:47:01,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1274 [INFO]: New best (3809.63) for latency DatasetOffice
2026-01-23 01:47:01,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 96/100 (estimated time remaining: 8 minutes, 14 seconds)
2026-01-23 01:48:33,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:48:37,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 1528.14795 ± 1656.645
2026-01-23 01:48:37,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3868.791, 1608.2526, 3836.7214, -2.8929074, 3862.6182, 1884.9768, -4.1115627, 4.04552, 222.81003, 0.26958406]
2026-01-23 01:48:37,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 447.0, 1000.0, 59.0, 1000.0, 519.0, 19.0, 29.0, 105.0, 36.0]
2026-01-23 01:48:37,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 97/100 (estimated time remaining: 6 minutes, 40 seconds)
2026-01-23 01:50:08,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:50:17,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3724.80933 ± 293.194
2026-01-23 01:50:17,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [2848.4944, 3809.9836, 3772.687, 3841.1777, 3822.1177, 3817.1838, 3792.2595, 3862.5945, 3849.9182, 3831.6777]
2026-01-23 01:50:17,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [749.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:50:17,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 98/100 (estimated time remaining: 4 minutes, 57 seconds)
2026-01-23 01:51:46,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:51:56,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3715.81567 ± 303.719
2026-01-23 01:51:56,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3835.9292, 3842.7722, 3830.07, 3855.4866, 3788.0637, 3815.7402, 3699.6907, 3843.8542, 3832.841, 2813.7114]
2026-01-23 01:51:56,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 748.0]
2026-01-23 01:51:56,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 19 seconds)
2026-01-23 01:53:30,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:53:37,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 2718.41431 ± 1564.275
2026-01-23 01:53:37,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3926.7903, 3899.7292, 3932.215, 2482.3696, 7.7915645, 480.13873, 3807.0647, 3935.021, 807.274, 3905.7488]
2026-01-23 01:53:37,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 656.0, 89.0, 200.0, 1000.0, 1000.0, 247.0, 1000.0]
2026-01-23 01:53:37,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1247 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 39 seconds)
2026-01-23 01:55:08,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:55:17,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1269 [DEBUG]: Total Reward: 3551.30713 ± 681.572
2026-01-23 01:55:17,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1270 [DEBUG]: All rewards: [3754.6558, 1506.8999, 3782.0793, 3798.4285, 3769.0002, 3791.3125, 3784.0247, 3777.9004, 3767.3643, 3781.4067]
2026-01-23 01:55:17,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1271 [DEBUG]: All trajectory lengths: [1000.0, 424.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:55:17,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-walker2d):1299 [DEBUG]: Training session finished
