2026-01-22 23:14:11,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-humanoid/DatasetOffice-bpql-mem1  
2026-01-22 23:14:11,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-humanoid/DatasetOffice-bpql-mem1  
2026-01-22 23:14:11,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x1492a9328dd0>}
2026-01-22 23:14:11,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1159 [DEBUG]: using device: cuda
2026-01-22 23:14:11,949 baseline-bpql-noisy-humanoid:77 [WARNING]: args.assumed_delay != args.horizon: 1 != 32
2026-01-22 23:14:11,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1181 [INFO]: Creating new trainer
2026-01-22 23:14:11,956 baseline-bpql-noisy-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=393, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2026-01-22 23:14:11,956 baseline-bpql-noisy-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-22 23:14:13,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1242 [DEBUG]: Starting training session...
2026-01-22 23:14:13,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 1/100
2026-01-22 23:15:56,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:15:56,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 251.15005 ± 5.731
2026-01-22 23:15:56,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [248.30525, 249.58084, 264.14023, 255.19269, 245.47238, 254.88547, 249.82108, 254.12286, 245.17322, 244.80663]
2026-01-22 23:15:56,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [46.0, 46.0, 49.0, 47.0, 45.0, 47.0, 46.0, 47.0, 45.0, 45.0]
2026-01-22 23:15:56,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (251.15) for latency DatasetOffice
2026-01-22 23:15:56,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 49 minutes, 52 seconds)
2026-01-22 23:17:48,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:17:49,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 337.02997 ± 49.542
2026-01-22 23:17:49,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [287.10416, 349.62787, 403.38187, 329.62097, 288.08664, 265.10782, 315.47858, 347.49155, 352.16534, 432.23505]
2026-01-22 23:17:49,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [54.0, 64.0, 74.0, 62.0, 56.0, 51.0, 59.0, 65.0, 65.0, 81.0]
2026-01-22 23:17:49,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (337.03) for latency DatasetOffice
2026-01-22 23:17:49,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 56 minutes, 9 seconds)
2026-01-22 23:19:40,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:19:41,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 422.01971 ± 107.319
2026-01-22 23:19:41,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [428.1086, 372.91882, 394.53256, 303.38147, 686.95337, 277.4567, 439.7152, 455.15097, 482.15533, 379.8243]
2026-01-22 23:19:41,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [81.0, 72.0, 87.0, 61.0, 148.0, 58.0, 96.0, 86.0, 95.0, 78.0]
2026-01-22 23:19:41,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (422.02) for latency DatasetOffice
2026-01-22 23:19:41,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 56 minutes, 54 seconds)
2026-01-22 23:21:33,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:21:35,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 463.74933 ± 80.252
2026-01-22 23:21:35,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [559.52496, 402.42642, 346.79318, 506.1782, 425.14697, 617.0719, 524.00745, 419.4082, 443.91708, 393.01862]
2026-01-22 23:21:35,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [107.0, 77.0, 72.0, 99.0, 80.0, 118.0, 101.0, 78.0, 86.0, 75.0]
2026-01-22 23:21:35,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (463.75) for latency DatasetOffice
2026-01-22 23:21:35,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 56 minutes, 33 seconds)
2026-01-22 23:23:26,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:23:27,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 454.12387 ± 107.779
2026-01-22 23:23:27,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [496.84372, 523.8102, 598.50146, 579.70044, 349.25165, 439.07794, 291.9216, 344.2802, 566.679, 351.17215]
2026-01-22 23:23:27,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [107.0, 107.0, 117.0, 121.0, 76.0, 92.0, 58.0, 76.0, 107.0, 76.0]
2026-01-22 23:23:27,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 55 minutes, 31 seconds)
2026-01-22 23:25:19,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:25:20,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 394.41617 ± 54.436
2026-01-22 23:25:20,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [430.0233, 407.83707, 383.95917, 349.61026, 467.0529, 341.06683, 340.07343, 387.53998, 335.47906, 501.51953]
2026-01-22 23:25:20,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [83.0, 76.0, 74.0, 74.0, 89.0, 70.0, 71.0, 84.0, 64.0, 94.0]
2026-01-22 23:25:20,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 56 minutes, 44 seconds)
2026-01-22 23:27:13,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:27:15,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 547.63641 ± 109.869
2026-01-22 23:27:15,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [485.11783, 606.4779, 556.94586, 337.9155, 576.879, 450.31, 697.99884, 495.8933, 537.98035, 730.8456]
2026-01-22 23:27:15,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [91.0, 116.0, 102.0, 68.0, 114.0, 91.0, 144.0, 94.0, 103.0, 141.0]
2026-01-22 23:27:15,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (547.64) for latency DatasetOffice
2026-01-22 23:27:15,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 55 minutes, 24 seconds)
2026-01-22 23:29:07,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:29:08,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 528.34143 ± 178.364
2026-01-22 23:29:08,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [551.8039, 529.2157, 715.9509, 334.9325, 509.03775, 308.59903, 437.2193, 954.1502, 461.5253, 480.98016]
2026-01-22 23:29:08,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [116.0, 112.0, 149.0, 75.0, 111.0, 59.0, 97.0, 188.0, 89.0, 105.0]
2026-01-22 23:29:08,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 53 minutes, 48 seconds)
2026-01-22 23:31:00,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:31:02,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 568.65417 ± 130.884
2026-01-22 23:31:02,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [553.01843, 385.6628, 586.87103, 786.1684, 409.42664, 732.68744, 594.00323, 700.03796, 473.71497, 464.95053]
2026-01-22 23:31:02,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [106.0, 84.0, 114.0, 154.0, 84.0, 154.0, 129.0, 151.0, 105.0, 92.0]
2026-01-22 23:31:02,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (568.65) for latency DatasetOffice
2026-01-22 23:31:02,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 52 minutes, 5 seconds)
2026-01-22 23:32:53,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:32:55,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 525.97131 ± 107.326
2026-01-22 23:32:55,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [703.2904, 472.02405, 480.3296, 487.22452, 563.0407, 327.48596, 496.77628, 712.6036, 518.0517, 498.88593]
2026-01-22 23:32:55,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [147.0, 87.0, 90.0, 91.0, 119.0, 64.0, 94.0, 143.0, 97.0, 89.0]
2026-01-22 23:32:55,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 50 minutes, 9 seconds)
2026-01-22 23:34:47,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:34:48,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 550.25726 ± 92.459
2026-01-22 23:34:48,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [715.04865, 644.9106, 465.86554, 499.947, 657.6508, 464.14825, 549.50336, 429.87274, 483.19342, 592.4321]
2026-01-22 23:34:48,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [143.0, 130.0, 93.0, 111.0, 136.0, 85.0, 122.0, 80.0, 88.0, 121.0]
2026-01-22 23:34:48,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 48 minutes, 32 seconds)
2026-01-22 23:36:41,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:36:43,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 559.85870 ± 70.184
2026-01-22 23:36:43,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [545.21735, 654.0503, 450.02222, 595.4511, 437.6789, 561.09863, 619.20325, 541.79266, 545.5633, 648.50885]
2026-01-22 23:36:43,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [112.0, 138.0, 84.0, 129.0, 83.0, 119.0, 118.0, 116.0, 106.0, 123.0]
2026-01-22 23:36:43,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 46 minutes, 38 seconds)
2026-01-22 23:38:35,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:38:37,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 647.99261 ± 169.267
2026-01-22 23:38:37,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [996.4451, 524.63586, 920.24786, 615.7745, 610.54224, 480.3203, 513.13257, 501.8414, 710.6078, 606.37823]
2026-01-22 23:38:37,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [192.0, 116.0, 200.0, 138.0, 112.0, 86.0, 98.0, 113.0, 134.0, 112.0]
2026-01-22 23:38:37,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (647.99) for latency DatasetOffice
2026-01-22 23:38:37,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 44 minutes, 52 seconds)
2026-01-22 23:40:29,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:40:30,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 320.62207 ± 172.543
2026-01-22 23:40:30,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [571.7799, 662.0573, 342.4967, 254.62798, 172.22694, 433.98718, 125.8936, 280.2067, 187.31421, 175.63013]
2026-01-22 23:40:30,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [107.0, 129.0, 69.0, 51.0, 33.0, 79.0, 26.0, 56.0, 36.0, 35.0]
2026-01-22 23:40:30,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 42 minutes, 45 seconds)
2026-01-22 23:42:22,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:42:23,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 697.23541 ± 136.986
2026-01-22 23:42:23,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [760.6624, 631.9755, 581.6832, 783.6006, 689.7987, 539.4483, 739.43225, 516.6386, 717.5582, 1011.556]
2026-01-22 23:42:23,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [143.0, 117.0, 106.0, 168.0, 132.0, 114.0, 133.0, 100.0, 145.0, 197.0]
2026-01-22 23:42:23,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (697.24) for latency DatasetOffice
2026-01-22 23:42:23,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 41 minutes, 7 seconds)
2026-01-22 23:44:17,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:44:18,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 648.88055 ± 84.914
2026-01-22 23:44:18,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [710.96045, 591.25134, 563.103, 606.2106, 569.8507, 536.0361, 664.94275, 814.8459, 718.1878, 713.4169]
2026-01-22 23:44:18,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [135.0, 125.0, 116.0, 117.0, 127.0, 114.0, 130.0, 175.0, 144.0, 142.0]
2026-01-22 23:44:18,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 39 minutes, 36 seconds)
2026-01-22 23:46:10,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:46:11,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 646.59650 ± 288.604
2026-01-22 23:46:11,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [729.8184, 802.8915, 82.433495, 410.00256, 781.85364, 1198.5587, 523.0169, 724.05505, 406.7978, 806.5368]
2026-01-22 23:46:11,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [141.0, 153.0, 17.0, 79.0, 157.0, 227.0, 114.0, 138.0, 77.0, 162.0]
2026-01-22 23:46:11,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 37 minutes, 14 seconds)
2026-01-22 23:48:04,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:48:05,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 724.72919 ± 182.762
2026-01-22 23:48:05,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [757.8891, 468.47946, 558.9568, 455.7079, 637.9172, 830.5335, 797.8719, 764.7226, 1029.5304, 945.6835]
2026-01-22 23:48:05,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [147.0, 89.0, 106.0, 95.0, 122.0, 173.0, 161.0, 148.0, 199.0, 182.0]
2026-01-22 23:48:05,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (724.73) for latency DatasetOffice
2026-01-22 23:48:05,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 35 minutes, 27 seconds)
2026-01-22 23:49:59,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:50:01,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 835.13947 ± 396.181
2026-01-22 23:50:01,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [734.38367, 1887.7036, 565.7809, 656.8382, 515.9031, 917.322, 398.51297, 807.5554, 815.99994, 1051.3943]
2026-01-22 23:50:01,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [149.0, 366.0, 119.0, 139.0, 116.0, 192.0, 82.0, 148.0, 172.0, 209.0]
2026-01-22 23:50:01,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (835.14) for latency DatasetOffice
2026-01-22 23:50:01,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 34 minutes, 17 seconds)
2026-01-22 23:51:54,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:51:56,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 745.08429 ± 317.436
2026-01-22 23:51:56,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [667.32086, 423.2704, 467.01144, 797.99725, 404.71634, 1530.7471, 587.7449, 936.94244, 737.7365, 897.3556]
2026-01-22 23:51:56,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [128.0, 90.0, 101.0, 158.0, 86.0, 297.0, 132.0, 181.0, 145.0, 166.0]
2026-01-22 23:51:56,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 32 minutes, 39 seconds)
2026-01-22 23:53:49,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:53:51,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 739.28351 ± 175.369
2026-01-22 23:53:51,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [924.2923, 553.6058, 721.1248, 434.8815, 889.64557, 598.75745, 611.3305, 879.9596, 784.43085, 994.80664]
2026-01-22 23:53:51,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [179.0, 122.0, 159.0, 93.0, 171.0, 117.0, 118.0, 171.0, 152.0, 192.0]
2026-01-22 23:53:51,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 30 minutes, 49 seconds)
2026-01-22 23:55:46,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:55:48,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1043.55054 ± 299.282
2026-01-22 23:55:48,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [1431.1229, 952.93256, 980.99927, 1032.7219, 1690.2023, 997.6693, 935.3209, 632.53485, 683.8893, 1098.1113]
2026-01-22 23:55:48,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [281.0, 179.0, 194.0, 205.0, 326.0, 195.0, 174.0, 133.0, 138.0, 210.0]
2026-01-22 23:55:48,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (1043.55) for latency DatasetOffice
2026-01-22 23:55:48,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 30 minutes, 1 second)
2026-01-22 23:57:41,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:57:44,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1142.90649 ± 286.872
2026-01-22 23:57:44,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [1632.796, 743.96735, 828.6204, 1172.729, 1265.917, 1363.1936, 999.6221, 948.5603, 1540.5935, 933.0657]
2026-01-22 23:57:44,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [337.0, 145.0, 161.0, 228.0, 237.0, 289.0, 205.0, 181.0, 306.0, 187.0]
2026-01-22 23:57:44,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (1142.91) for latency DatasetOffice
2026-01-22 23:57:44,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 28 minutes, 24 seconds)
2026-01-22 23:59:39,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:59:41,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 833.37482 ± 175.938
2026-01-22 23:59:41,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [637.5875, 804.58655, 921.8536, 510.30005, 689.1844, 974.24146, 1007.9718, 759.1969, 924.96436, 1103.8623]
2026-01-22 23:59:41,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [124.0, 167.0, 185.0, 102.0, 138.0, 185.0, 191.0, 145.0, 179.0, 210.0]
2026-01-22 23:59:41,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 26 minutes, 50 seconds)
2026-01-23 00:01:33,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:01:35,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1060.51147 ± 305.731
2026-01-23 00:01:35,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [1264.2708, 688.18494, 1165.5632, 949.86597, 832.6564, 950.7109, 774.4683, 967.23553, 1805.8578, 1206.3005]
2026-01-23 00:01:35,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [254.0, 141.0, 221.0, 195.0, 158.0, 187.0, 153.0, 189.0, 380.0, 240.0]
2026-01-23 00:01:35,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 24 minutes, 54 seconds)
2026-01-23 00:03:28,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:03:30,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 976.46552 ± 342.079
2026-01-23 00:03:30,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [384.33804, 967.7139, 1345.2406, 1440.4684, 902.85614, 810.477, 1530.3502, 725.5964, 724.1797, 933.4347]
2026-01-23 00:03:30,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [85.0, 180.0, 275.0, 282.0, 173.0, 152.0, 308.0, 148.0, 146.0, 178.0]
2026-01-23 00:03:30,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 22 minutes, 52 seconds)
2026-01-23 00:05:24,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:05:27,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1074.68262 ± 409.473
2026-01-23 00:05:27,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [1050.575, 1003.54645, 1059.1388, 1616.5238, 855.121, 540.67096, 747.231, 799.22406, 2002.4358, 1072.3595]
2026-01-23 00:05:27,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [207.0, 193.0, 204.0, 321.0, 167.0, 116.0, 142.0, 156.0, 376.0, 214.0]
2026-01-23 00:05:27,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 20 minutes, 46 seconds)
2026-01-23 00:07:22,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:07:25,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1282.17700 ± 656.675
2026-01-23 00:07:25,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [568.45526, 631.99194, 1887.1912, 1144.2886, 1297.633, 1574.1552, 675.1136, 2816.665, 858.9702, 1367.3064]
2026-01-23 00:07:25,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [125.0, 126.0, 384.0, 219.0, 267.0, 298.0, 130.0, 572.0, 167.0, 258.0]
2026-01-23 00:07:25,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (1282.18) for latency DatasetOffice
2026-01-23 00:07:25,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 19 minutes, 37 seconds)
2026-01-23 00:09:20,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:09:22,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 940.38342 ± 258.312
2026-01-23 00:09:22,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [815.9171, 890.04456, 895.8242, 1684.1726, 751.4398, 868.2656, 1017.065, 837.233, 887.27295, 756.59985]
2026-01-23 00:09:22,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [153.0, 168.0, 171.0, 322.0, 144.0, 166.0, 194.0, 160.0, 171.0, 144.0]
2026-01-23 00:09:22,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 17 minutes, 36 seconds)
2026-01-23 00:11:17,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:11:19,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 859.55139 ± 186.575
2026-01-23 00:11:19,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [754.6306, 837.46185, 895.9159, 742.18005, 472.1101, 808.0463, 848.43176, 958.18054, 1104.6578, 1173.8987]
2026-01-23 00:11:19,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [139.0, 164.0, 170.0, 139.0, 91.0, 158.0, 160.0, 198.0, 207.0, 226.0]
2026-01-23 00:11:19,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 16 minutes, 11 seconds)
2026-01-23 00:13:10,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:13:12,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 905.08075 ± 500.703
2026-01-23 00:13:12,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [529.2117, 337.1719, 1170.7737, 800.0134, 2201.3323, 1048.1853, 799.83685, 992.2352, 447.11697, 724.93005]
2026-01-23 00:13:12,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [98.0, 63.0, 219.0, 149.0, 427.0, 205.0, 177.0, 197.0, 92.0, 137.0]
2026-01-23 00:13:12,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 13 minutes, 51 seconds)
2026-01-23 00:15:06,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:15:12,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 2277.65063 ± 1069.032
2026-01-23 00:15:12,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [1403.9661, 1153.1315, 4289.4243, 2182.3254, 1907.978, 1575.8087, 2085.8962, 4279.058, 1471.4133, 2427.5037]
2026-01-23 00:15:12,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [274.0, 228.0, 863.0, 433.0, 379.0, 300.0, 415.0, 828.0, 288.0, 480.0]
2026-01-23 00:15:12,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (2277.65) for latency DatasetOffice
2026-01-23 00:15:12,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 12 minutes, 44 seconds)
2026-01-23 00:17:08,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:17:12,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1799.70996 ± 771.631
2026-01-23 00:17:12,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [2039.7065, 2472.2056, 3383.6746, 1822.4121, 1599.5286, 1808.677, 1900.5063, 1357.367, 208.1965, 1404.8241]
2026-01-23 00:17:12,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [391.0, 490.0, 658.0, 359.0, 301.0, 355.0, 373.0, 270.0, 42.0, 276.0]
2026-01-23 00:17:12,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 11 minutes, 5 seconds)
2026-01-23 00:19:07,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:19:10,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1186.41956 ± 495.505
2026-01-23 00:19:10,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [1526.1959, 531.6404, 1944.8812, 1903.8016, 1141.1183, 938.3556, 974.9431, 351.6297, 1275.8074, 1275.8209]
2026-01-23 00:19:10,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [302.0, 102.0, 390.0, 370.0, 223.0, 176.0, 189.0, 71.0, 247.0, 254.0]
2026-01-23 00:19:10,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 9 minutes, 22 seconds)
2026-01-23 00:21:02,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:21:04,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1030.01111 ± 433.531
2026-01-23 00:21:04,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [1249.4867, 552.61896, 536.5099, 1639.9374, 524.2275, 1363.6271, 825.1843, 741.8335, 1141.4606, 1725.2258]
2026-01-23 00:21:04,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [251.0, 121.0, 116.0, 319.0, 117.0, 293.0, 167.0, 154.0, 235.0, 362.0]
2026-01-23 00:21:04,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 6 minutes, 49 seconds)
2026-01-23 00:22:59,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:23:02,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1203.29089 ± 562.927
2026-01-23 00:23:02,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [1546.3168, 1527.4496, 1235.8933, 1165.6691, 785.34357, 981.7254, 2505.118, 717.90344, 1251.4406, 316.04898]
2026-01-23 00:23:02,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [304.0, 293.0, 247.0, 224.0, 160.0, 189.0, 496.0, 152.0, 254.0, 63.0]
2026-01-23 00:23:02,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 5 minutes, 52 seconds)
2026-01-23 00:24:57,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:25:01,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1386.26440 ± 509.610
2026-01-23 00:25:01,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [1364.598, 1043.7251, 2275.2114, 1382.0411, 383.72992, 1524.5365, 1068.6123, 2123.7515, 1440.6694, 1255.7687]
2026-01-23 00:25:01,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [265.0, 204.0, 444.0, 273.0, 73.0, 316.0, 205.0, 412.0, 282.0, 239.0]
2026-01-23 00:25:01,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 3 minutes, 33 seconds)
2026-01-23 00:26:55,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:27:00,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 2051.02490 ± 995.065
2026-01-23 00:27:00,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [1294.0, 1430.5106, 1929.066, 1032.293, 2141.1897, 1213.4507, 3976.9133, 1551.8029, 2095.3281, 3845.695]
2026-01-23 00:27:00,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [254.0, 279.0, 366.0, 237.0, 406.0, 239.0, 805.0, 323.0, 406.0, 757.0]
2026-01-23 00:27:00,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 1 minute, 32 seconds)
2026-01-23 00:28:55,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:29:03,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 3003.10645 ± 966.083
2026-01-23 00:29:03,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [1410.9255, 3285.0132, 2969.697, 3557.007, 4547.325, 2406.8276, 3701.0728, 3799.4272, 1407.4692, 2946.3008]
2026-01-23 00:29:03,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [279.0, 644.0, 574.0, 700.0, 891.0, 474.0, 723.0, 753.0, 278.0, 586.0]
2026-01-23 00:29:03,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (3003.11) for latency DatasetOffice
2026-01-23 00:29:03,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 25 seconds)
2026-01-23 00:30:56,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:31:00,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1462.17725 ± 1032.164
2026-01-23 00:31:00,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [874.8416, 4303.8394, 1094.3287, 710.53064, 1021.41144, 965.5742, 1605.3749, 2167.0212, 1059.0094, 819.8409]
2026-01-23 00:31:00,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [174.0, 841.0, 216.0, 141.0, 201.0, 196.0, 313.0, 429.0, 211.0, 160.0]
2026-01-23 00:31:00,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 59 minutes, 3 seconds)
2026-01-23 00:32:53,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:32:59,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1946.17322 ± 1161.516
2026-01-23 00:32:59,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [1144.625, 1698.032, 2756.7073, 1275.8665, 949.81415, 4988.2974, 1223.9619, 1909.3282, 2405.4302, 1109.6694]
2026-01-23 00:32:59,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [226.0, 336.0, 540.0, 248.0, 190.0, 966.0, 258.0, 371.0, 475.0, 211.0]
2026-01-23 00:32:59,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 57 minutes, 15 seconds)
2026-01-23 00:34:48,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:34:54,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 2191.16895 ± 1192.852
2026-01-23 00:34:54,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [4932.3345, 1628.4706, 1438.3221, 2087.681, 3935.997, 2204.4895, 1780.0067, 907.963, 1488.3467, 1508.0757]
2026-01-23 00:34:54,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 310.0, 281.0, 407.0, 800.0, 419.0, 349.0, 171.0, 282.0, 295.0]
2026-01-23 00:34:54,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 54 minutes, 44 seconds)
2026-01-23 00:36:52,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:36:57,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1625.84033 ± 1158.022
2026-01-23 00:36:57,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [4521.297, 1572.0896, 2177.17, 707.63965, 1315.1083, 412.42365, 744.935, 2612.468, 1106.9156, 1088.3567]
2026-01-23 00:36:57,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [917.0, 305.0, 450.0, 153.0, 251.0, 81.0, 143.0, 505.0, 209.0, 229.0]
2026-01-23 00:36:57,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 53 minutes, 17 seconds)
2026-01-23 00:38:46,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:38:49,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1312.46118 ± 545.503
2026-01-23 00:38:49,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [1877.5814, 851.05774, 988.62854, 1093.4338, 1035.2423, 2306.206, 811.6281, 1098.4424, 2176.9258, 885.4645]
2026-01-23 00:38:49,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [363.0, 163.0, 189.0, 205.0, 203.0, 441.0, 155.0, 211.0, 420.0, 170.0]
2026-01-23 00:38:49,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 49 minutes, 26 seconds)
2026-01-23 00:40:39,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:40:43,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1583.75000 ± 741.260
2026-01-23 00:40:43,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [2710.1482, 719.2831, 1136.2007, 2097.9233, 1569.4951, 2373.3806, 947.7361, 2073.7131, 1906.9463, 302.67316]
2026-01-23 00:40:43,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [519.0, 135.0, 216.0, 405.0, 306.0, 460.0, 189.0, 398.0, 360.0, 63.0]
2026-01-23 00:40:43,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 46 minutes, 58 seconds)
2026-01-23 00:42:39,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:42:44,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1760.15991 ± 1088.433
2026-01-23 00:42:44,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [2585.1365, 2608.9927, 809.0628, 323.73236, 3807.184, 1003.0734, 2976.0403, 872.5406, 1168.1274, 1447.7102]
2026-01-23 00:42:44,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [483.0, 499.0, 158.0, 66.0, 738.0, 195.0, 576.0, 168.0, 225.0, 280.0]
2026-01-23 00:42:44,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 45 minutes, 17 seconds)
2026-01-23 00:44:34,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:44:41,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 2542.09253 ± 1397.188
2026-01-23 00:44:41,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [1136.5237, 5108.154, 1247.7416, 2128.9448, 4973.145, 2982.6455, 2522.3184, 1284.025, 2682.0093, 1355.4185]
2026-01-23 00:44:41,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [220.0, 1000.0, 247.0, 415.0, 974.0, 588.0, 502.0, 249.0, 526.0, 264.0]
2026-01-23 00:44:41,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 43 minutes, 41 seconds)
2026-01-23 00:46:32,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:46:37,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1705.22559 ± 880.481
2026-01-23 00:46:37,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [2057.1377, 2617.3828, 2777.6133, 1286.6687, 920.57574, 786.6811, 1651.4398, 495.96198, 3228.713, 1230.0815]
2026-01-23 00:46:37,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [402.0, 546.0, 592.0, 264.0, 178.0, 154.0, 369.0, 106.0, 664.0, 234.0]
2026-01-23 00:46:37,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 40 minutes, 35 seconds)
2026-01-23 00:48:30,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:48:37,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 2501.44067 ± 967.769
2026-01-23 00:48:37,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [4113.825, 3905.277, 1329.9352, 1571.2965, 1523.3192, 1975.9921, 2467.5579, 1818.8354, 3206.9395, 3101.428]
2026-01-23 00:48:37,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [803.0, 747.0, 255.0, 308.0, 307.0, 390.0, 495.0, 359.0, 623.0, 617.0]
2026-01-23 00:48:37,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 39 minutes, 57 seconds)
2026-01-23 00:50:33,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:50:37,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1542.27271 ± 1616.280
2026-01-23 00:50:37,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [1914.997, 185.4723, 142.70995, 184.6007, 807.7741, 187.58502, 3001.3225, 3546.357, 4849.593, 602.31683]
2026-01-23 00:50:37,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [380.0, 36.0, 28.0, 36.0, 155.0, 36.0, 588.0, 719.0, 1000.0, 135.0]
2026-01-23 00:50:37,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 39 minutes, 1 second)
2026-01-23 00:52:33,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:52:38,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1891.05042 ± 1168.050
2026-01-23 00:52:38,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [1502.4336, 1805.6703, 938.8816, 1659.767, 1314.6295, 1981.9984, 1267.8405, 958.4051, 5169.7314, 2311.1465]
2026-01-23 00:52:38,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [292.0, 351.0, 182.0, 312.0, 254.0, 380.0, 244.0, 188.0, 1000.0, 451.0]
2026-01-23 00:52:38,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 37 minutes, 5 seconds)
2026-01-23 00:54:29,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:54:34,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 2115.44556 ± 1208.306
2026-01-23 00:54:34,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [464.45053, 696.0583, 2462.7324, 1726.0422, 1069.6671, 3711.1304, 4262.253, 2890.331, 2516.3206, 1355.4698]
2026-01-23 00:54:34,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [95.0, 134.0, 470.0, 337.0, 234.0, 733.0, 828.0, 560.0, 488.0, 275.0]
2026-01-23 00:54:34,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 34 minutes, 55 seconds)
2026-01-23 00:56:32,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:56:37,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 2015.05408 ± 1541.473
2026-01-23 00:56:37,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [1597.7069, 5100.931, 1077.8397, 1121.3461, 871.4106, 1570.2831, 1274.3893, 2220.8977, 482.51022, 4833.226]
2026-01-23 00:56:37,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [348.0, 1000.0, 223.0, 222.0, 197.0, 323.0, 252.0, 427.0, 102.0, 998.0]
2026-01-23 00:56:37,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 34 minutes, 3 seconds)
2026-01-23 00:58:24,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:58:34,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 3496.16357 ± 1745.880
2026-01-23 00:58:34,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [2626.3657, 5005.9385, 1268.7557, 616.3118, 1330.6543, 4077.0203, 5069.9297, 4958.8896, 4987.44, 5020.3286]
2026-01-23 00:58:34,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [568.0, 1000.0, 253.0, 145.0, 273.0, 838.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:58:34,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (3496.16) for latency DatasetOffice
2026-01-23 00:58:34,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 31 minutes, 33 seconds)
2026-01-23 01:00:29,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:00:37,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 2714.96997 ± 1674.457
2026-01-23 01:00:37,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [2457.901, 1015.08325, 5009.739, 652.14526, 5060.07, 2705.0, 729.496, 2875.5662, 1685.9117, 4958.7866]
2026-01-23 01:00:37,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [492.0, 201.0, 1000.0, 140.0, 1000.0, 567.0, 152.0, 573.0, 357.0, 1000.0]
2026-01-23 01:00:37,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 29 minutes, 58 seconds)
2026-01-23 01:02:31,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:02:42,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 3576.00928 ± 1745.136
2026-01-23 01:02:42,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [4945.9375, 2936.9692, 2182.5532, 5076.633, 4981.7944, 717.97815, 4497.551, 535.1911, 4944.923, 4940.567]
2026-01-23 01:02:42,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 568.0, 435.0, 1000.0, 1000.0, 149.0, 889.0, 104.0, 1000.0, 1000.0]
2026-01-23 01:02:42,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (3576.01) for latency DatasetOffice
2026-01-23 01:02:42,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 28 minutes, 31 seconds)
2026-01-23 01:04:35,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:04:46,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4204.52637 ± 1448.559
2026-01-23 01:04:46,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5037.5195, 5070.8135, 3143.7302, 5045.143, 4955.0312, 485.39578, 5054.051, 5052.1445, 5062.5103, 3138.9272]
2026-01-23 01:04:46,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 617.0, 1000.0, 998.0, 102.0, 1000.0, 1000.0, 1000.0, 601.0]
2026-01-23 01:04:46,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (4204.53) for latency DatasetOffice
2026-01-23 01:04:46,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 27 minutes, 43 seconds)
2026-01-23 01:06:41,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:06:55,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4671.40527 ± 523.009
2026-01-23 01:06:55,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [4958.453, 4925.0566, 4962.2676, 5027.88, 4121.165, 4949.7783, 4660.766, 4904.2866, 4906.6855, 3297.7195]
2026-01-23 01:06:55,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 833.0, 1000.0, 940.0, 1000.0, 1000.0, 679.0]
2026-01-23 01:06:55,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (4671.41) for latency DatasetOffice
2026-01-23 01:06:55,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 26 minutes, 28 seconds)
2026-01-23 01:08:47,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:08:54,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 2475.23389 ± 1909.434
2026-01-23 01:08:54,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5103.193, 5081.3506, 1393.41, 2854.1287, 1648.3408, 5112.2905, 2555.9678, 155.35728, 363.18738, 485.11383]
2026-01-23 01:08:54,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 266.0, 580.0, 317.0, 1000.0, 496.0, 30.0, 68.0, 94.0]
2026-01-23 01:08:54,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 24 minutes, 41 seconds)
2026-01-23 01:10:48,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:10:59,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 3730.81958 ± 1420.758
2026-01-23 01:10:59,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [3436.5105, 948.48334, 5063.2134, 4889.588, 4987.6304, 2649.2034, 2413.4272, 5121.6343, 2723.7903, 5074.7114]
2026-01-23 01:10:59,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [681.0, 192.0, 1000.0, 1000.0, 1000.0, 548.0, 527.0, 1000.0, 538.0, 1000.0]
2026-01-23 01:10:59,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 22 minutes, 51 seconds)
2026-01-23 01:12:55,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:13:09,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 5136.37012 ± 104.721
2026-01-23 01:13:09,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5149.6313, 5173.606, 5178.5586, 5125.815, 5146.2783, 5265.87, 5171.448, 5180.4966, 4842.9204, 5129.0767]
2026-01-23 01:13:09,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:13:09,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (5136.37) for latency DatasetOffice
2026-01-23 01:13:09,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 21 minutes, 35 seconds)
2026-01-23 01:15:01,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:15:10,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 3038.71875 ± 2275.860
2026-01-23 01:15:10,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5144.6646, 5139.738, 5144.84, 5140.996, 5120.9004, 3383.3936, 178.53824, 179.63678, 759.3499, 195.13179]
2026-01-23 01:15:10,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 655.0, 34.0, 34.0, 145.0, 38.0]
2026-01-23 01:15:10,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 18 minutes, 55 seconds)
2026-01-23 01:16:58,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:17:10,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4165.37695 ± 1555.030
2026-01-23 01:17:10,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5120.9727, 5094.55, 901.9928, 5151.093, 2997.5312, 5110.82, 1804.6213, 5162.7944, 5147.014, 5162.3774]
2026-01-23 01:17:10,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 176.0, 1000.0, 581.0, 1000.0, 351.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:17:10,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 15 minutes, 48 seconds)
2026-01-23 01:19:10,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:19:24,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4938.69434 ± 376.102
2026-01-23 01:19:24,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [4198.691, 5001.515, 5179.9985, 4203.324, 4969.2363, 5178.554, 5162.0093, 5123.0977, 5190.1978, 5180.3223]
2026-01-23 01:19:24,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [804.0, 1000.0, 1000.0, 820.0, 960.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:19:24,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 15 minutes, 38 seconds)
2026-01-23 01:21:17,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:21:25,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 2839.21338 ± 1836.859
2026-01-23 01:21:25,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5056.4062, 5078.16, 1557.9639, 1005.1816, 4945.648, 4493.1826, 3353.5505, 1723.2247, 605.4707, 573.34515]
2026-01-23 01:21:25,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 329.0, 199.0, 1000.0, 887.0, 655.0, 347.0, 121.0, 122.0]
2026-01-23 01:21:25,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 13 minutes, 8 seconds)
2026-01-23 01:23:12,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:23:24,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 3893.98193 ± 1617.822
2026-01-23 01:23:24,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [697.49475, 3012.821, 5057.886, 4917.193, 5024.328, 1109.1199, 5069.6465, 4863.1104, 5064.7314, 4123.488]
2026-01-23 01:23:24,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [140.0, 610.0, 1000.0, 1000.0, 1000.0, 242.0, 1000.0, 1000.0, 1000.0, 819.0]
2026-01-23 01:23:24,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 9 minutes, 37 seconds)
2026-01-23 01:25:26,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:25:37,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 3801.91333 ± 1375.262
2026-01-23 01:25:37,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [2082.7874, 3142.987, 5086.792, 2412.1082, 5023.8315, 3423.3694, 5071.8203, 5135.212, 5115.9946, 1524.2268]
2026-01-23 01:25:37,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [420.0, 628.0, 1000.0, 480.0, 1000.0, 677.0, 1000.0, 1000.0, 1000.0, 307.0]
2026-01-23 01:25:37,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 9 minutes, 1 second)
2026-01-23 01:27:23,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:27:28,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1953.01135 ± 2181.562
2026-01-23 01:27:28,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5149.7153, 2752.2515, 180.019, 365.3801, 307.30182, 260.76517, 216.47308, 176.31664, 4999.399, 5122.493]
2026-01-23 01:27:28,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 533.0, 35.0, 68.0, 65.0, 57.0, 42.0, 34.0, 1000.0, 1000.0]
2026-01-23 01:27:28,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 5 minutes, 58 seconds)
2026-01-23 01:29:24,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:29:35,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 3710.19922 ± 1876.665
2026-01-23 01:29:35,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [4902.014, 693.51263, 699.6637, 4949.1777, 1155.5537, 4940.054, 4969.6562, 4873.666, 4966.967, 4951.7275]
2026-01-23 01:29:35,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 147.0, 151.0, 1000.0, 246.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:29:35,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 3 minutes, 6 seconds)
2026-01-23 01:31:34,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:31:46,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4267.76807 ± 1665.595
2026-01-23 01:31:46,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5105.189, 5079.8057, 5045.5723, 5112.222, 5113.0347, 1050.2848, 5107.8066, 5113.0635, 5124.2783, 826.4248]
2026-01-23 01:31:46,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 218.0, 1000.0, 1000.0, 1000.0, 155.0]
2026-01-23 01:31:46,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 2 minutes, 4 seconds)
2026-01-23 01:33:34,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:33:42,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 2643.11987 ± 1806.987
2026-01-23 01:33:42,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5145.287, 5053.9805, 2160.5146, 1547.5558, 5115.073, 2194.3225, 2897.4897, 159.06635, 155.54944, 2002.362]
2026-01-23 01:33:42,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 422.0, 326.0, 1000.0, 452.0, 565.0, 31.0, 30.0, 393.0]
2026-01-23 01:33:42,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 72/100 (estimated time remaining: 59 minutes, 44 seconds)
2026-01-23 01:35:33,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:35:47,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4845.51123 ± 389.702
2026-01-23 01:35:47,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5030.5347, 5138.872, 4990.383, 5130.9004, 5102.551, 5131.9404, 4060.8545, 4424.5054, 4319.463, 5125.114]
2026-01-23 01:35:47,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 792.0, 861.0, 842.0, 1000.0]
2026-01-23 01:35:47,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 73/100 (estimated time remaining: 56 minutes, 54 seconds)
2026-01-23 01:37:42,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:37:51,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 3207.43066 ± 1564.135
2026-01-23 01:37:51,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5120.4136, 5144.268, 2468.1426, 1554.7168, 1041.9479, 4791.6606, 1814.6124, 5089.6763, 2808.5789, 2240.2905]
2026-01-23 01:37:51,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 480.0, 299.0, 214.0, 949.0, 341.0, 1000.0, 535.0, 441.0]
2026-01-23 01:37:51,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 74/100 (estimated time remaining: 56 minutes, 2 seconds)
2026-01-23 01:39:48,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:39:52,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1536.96851 ± 2020.905
2026-01-23 01:39:52,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [3084.704, 160.80293, 159.8462, 185.81303, 269.74905, 533.69135, 166.83783, 384.34213, 5210.9023, 5212.9956]
2026-01-23 01:39:52,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [590.0, 31.0, 31.0, 36.0, 55.0, 100.0, 32.0, 71.0, 1000.0, 1000.0]
2026-01-23 01:39:52,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 75/100 (estimated time remaining: 53 minutes, 28 seconds)
2026-01-23 01:41:42,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:41:47,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1989.55603 ± 1033.512
2026-01-23 01:41:47,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [1753.5675, 618.83124, 3509.5222, 3776.0933, 1402.3251, 1885.6947, 1728.0927, 601.8869, 1722.3295, 2897.2188]
2026-01-23 01:41:47,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [330.0, 115.0, 667.0, 724.0, 266.0, 361.0, 334.0, 113.0, 325.0, 554.0]
2026-01-23 01:41:47,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 76/100 (estimated time remaining: 50 minutes, 1 second)
2026-01-23 01:43:45,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:44:00,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 5042.96777 ± 307.793
2026-01-23 01:44:00,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5195.1846, 5069.5645, 5185.667, 5068.7554, 5163.401, 4128.4463, 5161.699, 5119.2793, 5168.27, 5169.406]
2026-01-23 01:44:00,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 800.0, 995.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:44:00,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 77/100 (estimated time remaining: 49 minutes, 27 seconds)
2026-01-23 01:45:53,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:46:01,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 2951.58862 ± 2158.398
2026-01-23 01:46:01,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [3184.3677, 145.24214, 602.3752, 385.54816, 504.70126, 5144.2744, 5139.099, 5152.7793, 4127.557, 5129.943]
2026-01-23 01:46:01,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [614.0, 28.0, 110.0, 79.0, 112.0, 1000.0, 1000.0, 1000.0, 801.0, 1000.0]
2026-01-23 01:46:01,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 78/100 (estimated time remaining: 47 minutes, 5 seconds)
2026-01-23 01:47:53,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:48:05,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4679.95996 ± 862.191
2026-01-23 01:48:05,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5174.8296, 3834.5828, 5131.4434, 5216.258, 5193.024, 5222.3037, 5194.361, 5199.0596, 4083.4358, 2550.3052]
2026-01-23 01:48:05,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 736.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 798.0, 505.0]
2026-01-23 01:48:05,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 79/100 (estimated time remaining: 45 minutes, 2 seconds)
2026-01-23 01:50:01,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:50:13,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4159.92285 ± 1603.260
2026-01-23 01:50:13,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5128.858, 5123.067, 790.66895, 5123.585, 5141.8438, 5080.293, 1362.7521, 3691.842, 5169.295, 4987.023]
2026-01-23 01:50:13,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 162.0, 1000.0, 1000.0, 982.0, 272.0, 718.0, 1000.0, 1000.0]
2026-01-23 01:50:13,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 80/100 (estimated time remaining: 43 minutes, 28 seconds)
2026-01-23 01:52:01,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:52:07,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 2210.26294 ± 2135.740
2026-01-23 01:52:07,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5141.155, 3209.973, 705.3513, 316.9423, 183.53098, 171.40665, 166.60677, 5182.3184, 5172.2124, 1853.1301]
2026-01-23 01:52:07,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 625.0, 134.0, 57.0, 36.0, 33.0, 32.0, 1000.0, 1000.0, 350.0]
2026-01-23 01:52:07,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 81/100 (estimated time remaining: 41 minutes, 21 seconds)
2026-01-23 01:54:11,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:54:24,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4617.44385 ± 1299.016
2026-01-23 01:54:24,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5130.945, 5125.6167, 5163.2046, 5159.859, 5136.2153, 4780.571, 4798.18, 4997.213, 742.5647, 5140.0645]
2026-01-23 01:54:24,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 150.0, 1000.0]
2026-01-23 01:54:24,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 82/100 (estimated time remaining: 39 minutes, 33 seconds)
2026-01-23 01:56:18,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:56:29,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4073.15674 ± 1569.931
2026-01-23 01:56:29,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [2296.4915, 5203.2236, 5173.8794, 5155.087, 5199.136, 5144.143, 4039.0742, 491.61673, 2878.5083, 5150.4053]
2026-01-23 01:56:29,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [447.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 772.0, 102.0, 559.0, 1000.0]
2026-01-23 01:56:29,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 83/100 (estimated time remaining: 37 minutes, 42 seconds)
2026-01-23 01:58:17,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:58:23,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 2548.01050 ± 2000.791
2026-01-23 01:58:23,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [4691.879, 5244.1396, 3553.2778, 3207.6965, 167.52861, 1968.6342, 1101.2935, 160.13461, 154.90741, 5230.6143]
2026-01-23 01:58:23,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [888.0, 1000.0, 681.0, 627.0, 32.0, 378.0, 217.0, 31.0, 30.0, 1000.0]
2026-01-23 01:58:23,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 84/100 (estimated time remaining: 35 minutes, 1 second)
2026-01-23 02:00:23,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:00:35,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4174.91895 ± 1248.533
2026-01-23 02:00:35,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [4142.3276, 5058.2046, 5057.002, 1906.859, 5073.6606, 2577.0552, 5112.6035, 5160.7783, 5140.958, 2519.7422]
2026-01-23 02:00:35,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [830.0, 1000.0, 1000.0, 382.0, 1000.0, 519.0, 1000.0, 1000.0, 1000.0, 496.0]
2026-01-23 02:00:35,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 85/100 (estimated time remaining: 33 minutes, 10 seconds)
2026-01-23 02:02:22,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:02:35,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4799.60986 ± 923.070
2026-01-23 02:02:35,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5218.0986, 5193.4956, 4546.648, 5156.854, 5195.736, 5143.8613, 2089.087, 5177.5986, 5085.8945, 5188.824]
2026-01-23 02:02:35,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 883.0, 1000.0, 1000.0, 1000.0, 417.0, 1000.0, 989.0, 1000.0]
2026-01-23 02:02:35,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 86/100 (estimated time remaining: 31 minutes, 25 seconds)
2026-01-23 02:04:31,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:04:44,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4653.96777 ± 874.862
2026-01-23 02:04:44,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [4904.9717, 5218.7466, 5198.257, 2513.0, 3744.4321, 4101.356, 5240.628, 5224.926, 5197.039, 5196.317]
2026-01-23 02:04:44,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 492.0, 733.0, 839.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:04:44,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 87/100 (estimated time remaining: 28 minutes, 53 seconds)
2026-01-23 02:06:44,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:06:59,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 5217.49170 ± 28.285
2026-01-23 02:06:59,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5256.7114, 5207.1973, 5216.132, 5212.0854, 5225.3486, 5171.1978, 5216.212, 5258.9365, 5238.075, 5173.023]
2026-01-23 02:06:59,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:06:59,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (5217.49) for latency DatasetOffice
2026-01-23 02:06:59,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 88/100 (estimated time remaining: 27 minutes, 16 seconds)
2026-01-23 02:08:46,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:08:53,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 2797.29541 ± 2109.711
2026-01-23 02:08:53,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5236.732, 5211.8066, 2703.008, 832.8318, 4848.5977, 5250.1094, 2961.4834, 161.70963, 447.7413, 318.93307]
2026-01-23 02:08:53,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 518.0, 162.0, 928.0, 1000.0, 562.0, 31.0, 91.0, 59.0]
2026-01-23 02:08:53,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 89/100 (estimated time remaining: 25 minutes, 11 seconds)
2026-01-23 02:10:45,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:10:56,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4083.08936 ± 1229.147
2026-01-23 02:10:56,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [3779.5327, 5177.007, 5188.099, 3147.0967, 5185.2197, 5177.597, 2130.021, 3804.7595, 2041.2141, 5200.3467]
2026-01-23 02:10:56,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [723.0, 1000.0, 1000.0, 597.0, 1000.0, 1000.0, 420.0, 735.0, 401.0, 1000.0]
2026-01-23 02:10:56,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 90/100 (estimated time remaining: 22 minutes, 46 seconds)
2026-01-23 02:12:46,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:12:59,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4893.35547 ± 787.532
2026-01-23 02:12:59,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5176.0737, 5189.3784, 5179.35, 5165.312, 5017.0513, 5174.4326, 5121.823, 5188.363, 2535.5112, 5186.26]
2026-01-23 02:12:59,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 964.0, 1000.0, 1000.0, 1000.0, 485.0, 1000.0]
2026-01-23 02:12:59,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 91/100 (estimated time remaining: 20 minutes, 46 seconds)
2026-01-23 02:14:48,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:14:57,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 3303.70654 ± 2094.336
2026-01-23 02:14:57,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [2957.8567, 5187.9805, 5184.6523, 5112.035, 5182.0796, 5199.733, 3167.8489, 195.10323, 379.69757, 470.07928]
2026-01-23 02:14:57,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [583.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 613.0, 40.0, 76.0, 103.0]
2026-01-23 02:14:57,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 92/100 (estimated time remaining: 18 minutes, 23 seconds)
2026-01-23 02:16:49,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:17:04,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4959.96191 ± 15.031
2026-01-23 02:17:04,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [4963.6606, 4977.807, 4975.316, 4961.967, 4977.116, 4958.707, 4926.8037, 4958.115, 4956.871, 4943.2524]
2026-01-23 02:17:04,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:17:04,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 93/100 (estimated time remaining: 16 minutes, 7 seconds)
2026-01-23 02:18:58,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:19:10,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4312.07568 ± 1813.143
2026-01-23 02:19:10,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5208.4033, 5236.7173, 5168.826, 5222.6245, 5201.2695, 5240.342, 5224.195, 5224.5967, 1010.9453, 382.83743]
2026-01-23 02:19:10,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 195.0, 79.0]
2026-01-23 02:19:10,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 94/100 (estimated time remaining: 14 minutes, 23 seconds)
2026-01-23 02:21:01,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:21:08,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 2554.38354 ± 1996.118
2026-01-23 02:21:08,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5251.267, 4419.07, 5256.2065, 3848.696, 122.93989, 3260.882, 186.29663, 1884.1428, 1132.8575, 181.47841]
2026-01-23 02:21:08,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 842.0, 1000.0, 731.0, 25.0, 620.0, 36.0, 351.0, 220.0, 35.0]
2026-01-23 02:21:08,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 95/100 (estimated time remaining: 12 minutes, 14 seconds)
2026-01-23 02:22:54,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:23:07,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4680.48340 ± 1251.543
2026-01-23 02:23:07,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5272.67, 5258.941, 5239.589, 5255.0234, 5216.1953, 5246.97, 3973.529, 5018.272, 5223.521, 1100.1302]
2026-01-23 02:23:07,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 769.0, 959.0, 1000.0, 216.0]
2026-01-23 02:23:07,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 96/100 (estimated time remaining: 10 minutes, 7 seconds)
2026-01-23 02:25:02,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:25:10,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 3014.81201 ± 1790.418
2026-01-23 02:25:10,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [2251.0732, 5091.9355, 1540.1522, 5181.711, 634.3877, 2215.2375, 370.2319, 4518.6147, 5184.1265, 3160.6514]
2026-01-23 02:25:10,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [435.0, 1000.0, 308.0, 1000.0, 138.0, 437.0, 73.0, 866.0, 1000.0, 622.0]
2026-01-23 02:25:10,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes, 10 seconds)
2026-01-23 02:27:03,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:27:10,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 2878.07178 ± 2087.883
2026-01-23 02:27:10,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5160.0137, 2555.5737, 429.14893, 461.40646, 550.0628, 725.2107, 5214.138, 5197.0796, 3289.4207, 5198.6646]
2026-01-23 02:27:10,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 500.0, 86.0, 91.0, 112.0, 136.0, 1000.0, 1000.0, 634.0, 1000.0]
2026-01-23 02:27:10,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 4 seconds)
2026-01-23 02:29:03,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:29:18,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4959.86572 ± 7.107
2026-01-23 02:29:18,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [4970.0566, 4961.494, 4966.948, 4951.2847, 4966.5015, 4952.6035, 4965.823, 4951.2944, 4951.634, 4961.0166]
2026-01-23 02:29:18,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:29:18,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 3 seconds)
2026-01-23 02:31:10,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:31:23,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4850.87891 ± 705.243
2026-01-23 02:31:23,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5240.595, 5242.4272, 5256.0894, 5204.56, 3394.438, 3504.8894, 5236.5117, 4976.95, 5227.998, 5224.3296]
2026-01-23 02:31:23,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 647.0, 671.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:31:23,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 2 seconds)
2026-01-23 02:33:10,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:33:17,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 2789.66333 ± 2194.061
2026-01-23 02:33:17,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [165.59865, 2191.827, 594.2496, 186.99446, 195.48427, 4978.983, 4924.787, 4736.5107, 4954.6733, 4967.5244]
2026-01-23 02:33:17,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [32.0, 437.0, 120.0, 36.0, 38.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:33:17,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1299 [DEBUG]: Training session finished
