2026-01-22 23:14:12,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-humanoid/DatasetOffice-bpql-mem5 
2026-01-22 23:14:12,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-humanoid/DatasetOffice-bpql-mem5 
2026-01-22 23:14:12,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x14766066a710>}
2026-01-22 23:14:12,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1159 [DEBUG]: using device: cuda
2026-01-22 23:14:12,701 baseline-bpql-noisy-humanoid:77 [WARNING]: args.assumed_delay != args.horizon: 5 != 32
2026-01-22 23:14:12,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1181 [INFO]: Creating new trainer
2026-01-22 23:14:12,719 baseline-bpql-noisy-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=461, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2026-01-22 23:14:12,719 baseline-bpql-noisy-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-22 23:14:14,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1242 [DEBUG]: Starting training session...
2026-01-22 23:14:14,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 1/100
2026-01-22 23:15:55,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:15:57,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 393.74261 ± 60.252
2026-01-22 23:15:57,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [373.52533, 338.89667, 306.09677, 472.2489, 490.9874, 376.845, 464.98358, 408.21313, 334.58258, 371.04678]
2026-01-22 23:15:57,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [72.0, 67.0, 63.0, 93.0, 95.0, 77.0, 94.0, 86.0, 68.0, 75.0]
2026-01-22 23:15:57,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (393.74) for latency DatasetOffice
2026-01-22 23:15:57,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 49 minutes, 39 seconds)
2026-01-22 23:17:47,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:17:48,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 345.88870 ± 73.890
2026-01-22 23:17:48,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [323.04672, 283.08627, 344.20496, 366.1156, 337.97302, 307.6082, 362.87766, 549.03564, 270.9337, 314.00516]
2026-01-22 23:17:48,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [61.0, 56.0, 65.0, 67.0, 62.0, 58.0, 68.0, 114.0, 52.0, 61.0]
2026-01-22 23:17:48,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 55 minutes, 10 seconds)
2026-01-22 23:19:39,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:19:40,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 401.66034 ± 84.616
2026-01-22 23:19:40,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [303.28952, 459.70355, 390.2747, 336.77402, 597.8633, 346.37808, 385.3657, 404.14127, 317.87677, 474.93646]
2026-01-22 23:19:40,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [56.0, 91.0, 74.0, 62.0, 117.0, 64.0, 74.0, 81.0, 61.0, 89.0]
2026-01-22 23:19:40,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (401.66) for latency DatasetOffice
2026-01-22 23:19:40,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 55 minutes, 49 seconds)
2026-01-22 23:21:32,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:21:33,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 435.02814 ± 58.299
2026-01-22 23:21:33,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [487.83224, 335.63052, 520.56793, 442.04733, 359.13083, 493.909, 485.74112, 405.3718, 413.49698, 406.55334]
2026-01-22 23:21:33,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [96.0, 63.0, 101.0, 94.0, 78.0, 102.0, 103.0, 76.0, 93.0, 91.0]
2026-01-22 23:21:33,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (435.03) for latency DatasetOffice
2026-01-22 23:21:33,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 55 minutes, 33 seconds)
2026-01-22 23:23:23,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:23:24,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 456.56113 ± 145.843
2026-01-22 23:23:24,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [339.39624, 389.9418, 676.4372, 233.89214, 380.18628, 441.3694, 389.6119, 724.6021, 570.0801, 420.09445]
2026-01-22 23:23:24,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [67.0, 74.0, 137.0, 45.0, 84.0, 84.0, 72.0, 140.0, 113.0, 93.0]
2026-01-22 23:23:24,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (456.56) for latency DatasetOffice
2026-01-22 23:23:24,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 54 minutes, 12 seconds)
2026-01-22 23:25:16,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:25:17,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 466.08423 ± 90.558
2026-01-22 23:25:17,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [430.91315, 531.5933, 521.7477, 327.0607, 644.8665, 333.40247, 475.94952, 423.40848, 456.13553, 515.7647]
2026-01-22 23:25:17,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [92.0, 101.0, 100.0, 69.0, 123.0, 74.0, 103.0, 88.0, 83.0, 104.0]
2026-01-22 23:25:17,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (466.08) for latency DatasetOffice
2026-01-22 23:25:17,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 55 minutes, 31 seconds)
2026-01-22 23:27:07,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:27:08,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 456.85980 ± 121.042
2026-01-22 23:27:08,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [370.39297, 404.22034, 398.79965, 533.3562, 514.5814, 754.2266, 333.0129, 333.38977, 415.27036, 511.34793]
2026-01-22 23:27:08,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [68.0, 73.0, 75.0, 99.0, 96.0, 147.0, 71.0, 64.0, 77.0, 95.0]
2026-01-22 23:27:08,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 53 minutes, 39 seconds)
2026-01-22 23:29:00,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:29:01,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 465.77252 ± 66.927
2026-01-22 23:29:01,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [380.415, 446.06113, 496.11044, 527.1194, 389.83817, 549.7614, 434.9415, 434.4109, 411.7837, 587.283]
2026-01-22 23:29:01,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [72.0, 90.0, 112.0, 98.0, 86.0, 108.0, 96.0, 84.0, 78.0, 112.0]
2026-01-22 23:29:01,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 51 minutes, 59 seconds)
2026-01-22 23:30:53,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:30:55,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 493.38672 ± 80.009
2026-01-22 23:30:55,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [440.2797, 497.87622, 517.0739, 453.40356, 416.1906, 545.7687, 469.87653, 411.47223, 700.5223, 481.40332]
2026-01-22 23:30:55,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [84.0, 107.0, 98.0, 99.0, 77.0, 105.0, 101.0, 79.0, 134.0, 91.0]
2026-01-22 23:30:55,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (493.39) for latency DatasetOffice
2026-01-22 23:30:55,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 50 minutes, 26 seconds)
2026-01-22 23:32:45,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:32:47,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 531.83234 ± 101.706
2026-01-22 23:32:47,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [647.0098, 619.4972, 640.7636, 387.47693, 354.76642, 523.3318, 626.534, 435.9915, 538.18744, 544.76465]
2026-01-22 23:32:47,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [121.0, 119.0, 126.0, 71.0, 64.0, 110.0, 118.0, 94.0, 99.0, 100.0]
2026-01-22 23:32:47,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (531.83) for latency DatasetOffice
2026-01-22 23:32:47,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 48 minutes, 47 seconds)
2026-01-22 23:34:38,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:34:40,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 648.96594 ± 172.754
2026-01-22 23:34:40,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [442.14792, 701.9175, 623.4557, 466.85962, 435.64478, 920.2601, 616.03253, 958.49023, 699.0884, 625.76245]
2026-01-22 23:34:40,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [96.0, 135.0, 118.0, 89.0, 82.0, 179.0, 127.0, 188.0, 137.0, 118.0]
2026-01-22 23:34:40,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (648.97) for latency DatasetOffice
2026-01-22 23:34:40,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 47 minutes, 1 second)
2026-01-22 23:36:31,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:36:32,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 537.17737 ± 53.449
2026-01-22 23:36:32,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [644.1219, 564.5627, 524.8934, 555.7045, 513.9018, 565.06934, 538.183, 556.1224, 437.7077, 471.5067]
2026-01-22 23:36:32,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [124.0, 107.0, 111.0, 109.0, 97.0, 106.0, 103.0, 106.0, 93.0, 98.0]
2026-01-22 23:36:32,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 45 minutes, 23 seconds)
2026-01-22 23:38:25,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:38:27,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 652.06293 ± 164.072
2026-01-22 23:38:27,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [567.19904, 994.9672, 931.6086, 533.63403, 521.3177, 533.7508, 551.77026, 638.39343, 692.2876, 555.70087]
2026-01-22 23:38:27,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [120.0, 193.0, 182.0, 114.0, 102.0, 113.0, 104.0, 117.0, 149.0, 103.0]
2026-01-22 23:38:27,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (652.06) for latency DatasetOffice
2026-01-22 23:38:27,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 44 minutes, 6 seconds)
2026-01-22 23:40:17,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:40:19,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 597.72278 ± 90.146
2026-01-22 23:40:19,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [615.41235, 522.9677, 674.9013, 666.80023, 631.15704, 593.29584, 600.2097, 566.4526, 724.06415, 381.96738]
2026-01-22 23:40:19,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [132.0, 114.0, 131.0, 124.0, 138.0, 110.0, 113.0, 125.0, 156.0, 72.0]
2026-01-22 23:40:19,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 41 minutes, 49 seconds)
2026-01-22 23:42:10,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:42:12,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 735.26428 ± 244.442
2026-01-22 23:42:12,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [468.23822, 1312.3322, 499.98944, 569.51447, 662.3041, 971.77905, 709.6502, 878.82635, 709.6628, 570.346]
2026-01-22 23:42:12,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [90.0, 258.0, 96.0, 108.0, 126.0, 189.0, 133.0, 179.0, 132.0, 107.0]
2026-01-22 23:42:12,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (735.26) for latency DatasetOffice
2026-01-22 23:42:12,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 40 minutes, 11 seconds)
2026-01-22 23:44:04,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:44:06,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 671.79150 ± 101.152
2026-01-22 23:44:06,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [703.1295, 664.9659, 893.476, 580.1594, 656.4654, 789.4301, 646.6219, 678.8005, 536.63934, 568.22723]
2026-01-22 23:44:06,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [138.0, 138.0, 188.0, 115.0, 126.0, 166.0, 124.0, 146.0, 106.0, 113.0]
2026-01-22 23:44:06,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 38 minutes, 26 seconds)
2026-01-22 23:45:58,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:46:00,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 820.28320 ± 166.254
2026-01-22 23:46:00,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [705.22833, 1034.7904, 683.61926, 1019.6319, 954.04694, 613.74243, 712.16876, 1030.9037, 833.2752, 615.4244]
2026-01-22 23:46:00,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [134.0, 194.0, 127.0, 192.0, 182.0, 119.0, 137.0, 200.0, 159.0, 115.0]
2026-01-22 23:46:00,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (820.28) for latency DatasetOffice
2026-01-22 23:46:00,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 37 minutes, 3 seconds)
2026-01-22 23:47:52,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:47:54,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 755.18604 ± 114.643
2026-01-22 23:47:54,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [690.61194, 865.8261, 814.46674, 969.2537, 621.238, 688.922, 704.9893, 839.62366, 573.76166, 783.1669]
2026-01-22 23:47:54,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [136.0, 186.0, 161.0, 194.0, 134.0, 138.0, 149.0, 177.0, 121.0, 165.0]
2026-01-22 23:47:54,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 35 minutes, 8 seconds)
2026-01-22 23:49:46,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:49:48,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 713.43011 ± 124.984
2026-01-22 23:49:48,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [738.79486, 1000.996, 739.5558, 733.58905, 666.54956, 639.5349, 831.2981, 553.8374, 667.1065, 563.0397]
2026-01-22 23:49:48,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [146.0, 207.0, 144.0, 143.0, 128.0, 120.0, 176.0, 104.0, 127.0, 110.0]
2026-01-22 23:49:48,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 33 minutes, 32 seconds)
2026-01-22 23:51:41,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:51:43,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 672.40344 ± 140.505
2026-01-22 23:51:43,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [821.74866, 554.78503, 470.7295, 546.91205, 748.04474, 590.35455, 722.6463, 724.70355, 588.6389, 955.471]
2026-01-22 23:51:43,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [161.0, 104.0, 91.0, 109.0, 144.0, 112.0, 140.0, 143.0, 111.0, 181.0]
2026-01-22 23:51:43,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 32 minutes, 12 seconds)
2026-01-22 23:53:33,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:53:35,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 637.77740 ± 113.983
2026-01-22 23:53:35,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [709.3389, 717.26166, 466.6394, 629.51135, 499.74768, 656.57825, 659.7463, 576.3121, 886.1743, 576.4639]
2026-01-22 23:53:35,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [136.0, 136.0, 86.0, 120.0, 108.0, 142.0, 124.0, 113.0, 175.0, 115.0]
2026-01-22 23:53:35,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 29 minutes, 55 seconds)
2026-01-22 23:55:28,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:55:30,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 783.75830 ± 273.084
2026-01-22 23:55:30,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [1323.4767, 852.98096, 1250.2833, 514.5817, 563.803, 566.83655, 748.6667, 557.23456, 676.1485, 783.57135]
2026-01-22 23:55:30,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [278.0, 184.0, 246.0, 96.0, 109.0, 109.0, 148.0, 118.0, 132.0, 153.0]
2026-01-22 23:55:30,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 28 minutes, 9 seconds)
2026-01-22 23:57:21,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:57:23,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 678.43079 ± 112.814
2026-01-22 23:57:23,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [794.6685, 610.54565, 752.1389, 537.0427, 849.1034, 455.6894, 744.0326, 701.29767, 672.42804, 667.36115]
2026-01-22 23:57:23,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [152.0, 117.0, 143.0, 104.0, 166.0, 86.0, 160.0, 132.0, 122.0, 127.0]
2026-01-22 23:57:23,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 25 minutes, 59 seconds)
2026-01-22 23:59:16,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:59:18,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 738.84467 ± 172.899
2026-01-22 23:59:18,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [836.8278, 764.55066, 589.61304, 682.87213, 524.6805, 839.6441, 537.9921, 819.952, 660.75006, 1131.5648]
2026-01-22 23:59:18,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [161.0, 146.0, 110.0, 124.0, 96.0, 161.0, 101.0, 155.0, 125.0, 220.0]
2026-01-22 23:59:18,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 24 minutes, 26 seconds)
2026-01-23 00:01:09,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:01:11,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 686.12463 ± 198.123
2026-01-23 00:01:11,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [511.4265, 648.1117, 621.1684, 689.1288, 644.0923, 676.5104, 1147.1957, 437.3143, 934.8243, 551.47363]
2026-01-23 00:01:11,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [106.0, 137.0, 116.0, 130.0, 126.0, 140.0, 235.0, 91.0, 198.0, 114.0]
2026-01-23 00:01:11,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 22 minutes, 8 seconds)
2026-01-23 00:03:04,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:03:06,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 797.34009 ± 253.355
2026-01-23 00:03:06,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [790.7502, 629.3724, 718.334, 676.9059, 987.98126, 1007.85376, 549.05927, 1402.3604, 531.75555, 679.0282]
2026-01-23 00:03:06,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [151.0, 124.0, 138.0, 128.0, 190.0, 189.0, 111.0, 276.0, 98.0, 126.0]
2026-01-23 00:03:06,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 20 minutes, 57 seconds)
2026-01-23 00:04:58,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:05:01,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 882.52814 ± 236.274
2026-01-23 00:05:01,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [844.21277, 843.94116, 1397.6283, 840.6453, 761.65515, 882.2278, 552.86115, 904.5857, 612.48346, 1185.0405]
2026-01-23 00:05:01,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [155.0, 163.0, 269.0, 157.0, 141.0, 170.0, 101.0, 179.0, 114.0, 247.0]
2026-01-23 00:05:01,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (882.53) for latency DatasetOffice
2026-01-23 00:05:01,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 18 minutes, 52 seconds)
2026-01-23 00:06:52,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:06:55,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1075.59399 ± 316.815
2026-01-23 00:06:55,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [1053.3134, 1177.8223, 678.9533, 1259.0437, 864.749, 895.2164, 1593.685, 708.8959, 906.936, 1617.3243]
2026-01-23 00:06:55,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [203.0, 224.0, 145.0, 251.0, 168.0, 177.0, 312.0, 133.0, 180.0, 319.0]
2026-01-23 00:06:55,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (1075.59) for latency DatasetOffice
2026-01-23 00:06:55,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 17 minutes, 16 seconds)
2026-01-23 00:08:48,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:08:50,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 860.12860 ± 445.936
2026-01-23 00:08:50,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [591.77875, 623.93536, 800.4254, 728.08215, 667.95874, 493.23245, 1998.2351, 1318.6713, 918.0216, 460.94498]
2026-01-23 00:08:50,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [116.0, 131.0, 165.0, 145.0, 120.0, 98.0, 411.0, 271.0, 203.0, 99.0]
2026-01-23 00:08:50,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 15 minutes, 21 seconds)
2026-01-23 00:10:43,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:10:46,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1106.44995 ± 357.256
2026-01-23 00:10:46,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [1380.7245, 673.4473, 925.3811, 1740.5488, 868.17566, 1117.3158, 752.928, 906.66705, 1004.2188, 1695.0935]
2026-01-23 00:10:46,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [278.0, 149.0, 193.0, 340.0, 160.0, 218.0, 159.0, 174.0, 189.0, 330.0]
2026-01-23 00:10:46,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (1106.45) for latency DatasetOffice
2026-01-23 00:10:46,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 14 minutes, 1 second)
2026-01-23 00:12:38,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:12:42,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1254.19995 ± 440.810
2026-01-23 00:12:42,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [957.9409, 1497.3362, 1389.6777, 1737.6913, 977.4761, 2169.1187, 1330.6951, 739.0983, 689.3352, 1053.6301]
2026-01-23 00:12:42,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [182.0, 293.0, 275.0, 356.0, 193.0, 428.0, 259.0, 144.0, 130.0, 197.0]
2026-01-23 00:12:42,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (1254.20) for latency DatasetOffice
2026-01-23 00:12:42,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 12 minutes, 20 seconds)
2026-01-23 00:14:34,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:14:36,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 982.57831 ± 470.560
2026-01-23 00:14:36,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [499.22778, 1682.1202, 890.9015, 411.8839, 792.5041, 1100.721, 758.3948, 1142.9401, 1929.9731, 617.1169]
2026-01-23 00:14:36,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [92.0, 330.0, 186.0, 88.0, 147.0, 231.0, 162.0, 227.0, 367.0, 134.0]
2026-01-23 00:14:36,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 10 minutes, 32 seconds)
2026-01-23 00:16:30,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:16:33,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1197.39197 ± 443.347
2026-01-23 00:16:33,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [863.5303, 850.369, 534.6539, 1584.5973, 1228.3202, 1018.28845, 921.47986, 2016.1232, 1800.1877, 1156.3704]
2026-01-23 00:16:33,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [163.0, 164.0, 113.0, 333.0, 233.0, 212.0, 177.0, 400.0, 352.0, 230.0]
2026-01-23 00:16:33,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 9 minutes, 2 seconds)
2026-01-23 00:18:30,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:18:33,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1302.12012 ± 605.142
2026-01-23 00:18:33,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [618.4558, 554.7702, 1887.5973, 966.4393, 1469.5082, 688.18317, 2479.9846, 1914.0903, 1203.272, 1238.9]
2026-01-23 00:18:33,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [118.0, 105.0, 368.0, 199.0, 283.0, 139.0, 482.0, 382.0, 232.0, 239.0]
2026-01-23 00:18:33,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (1302.12) for latency DatasetOffice
2026-01-23 00:18:33,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 8 minutes, 16 seconds)
2026-01-23 00:20:24,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:20:27,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1490.87866 ± 806.400
2026-01-23 00:20:27,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [2201.611, 630.4188, 2797.9114, 1200.2788, 2117.9531, 2383.8394, 377.02686, 1621.1226, 915.5386, 663.0853]
2026-01-23 00:20:27,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [453.0, 120.0, 529.0, 230.0, 409.0, 481.0, 77.0, 315.0, 172.0, 135.0]
2026-01-23 00:20:27,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (1490.88) for latency DatasetOffice
2026-01-23 00:20:27,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 6 minutes, 3 seconds)
2026-01-23 00:22:22,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:22:26,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1469.96741 ± 399.010
2026-01-23 00:22:26,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [1130.2866, 2045.9031, 1435.3848, 1931.5966, 1427.4691, 1494.1814, 856.3536, 940.6157, 1447.2743, 1990.6083]
2026-01-23 00:22:26,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [223.0, 417.0, 277.0, 376.0, 267.0, 310.0, 163.0, 184.0, 282.0, 383.0]
2026-01-23 00:22:26,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 4 minutes, 37 seconds)
2026-01-23 00:24:16,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:24:21,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1893.49670 ± 797.238
2026-01-23 00:24:21,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [2775.341, 1351.7329, 1284.4375, 2578.5562, 3664.4575, 1545.0247, 1876.5431, 1414.4467, 1055.3048, 1389.1207]
2026-01-23 00:24:21,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [551.0, 285.0, 261.0, 533.0, 755.0, 293.0, 384.0, 282.0, 191.0, 287.0]
2026-01-23 00:24:21,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (1893.50) for latency DatasetOffice
2026-01-23 00:24:21,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 2 minutes, 41 seconds)
2026-01-23 00:26:14,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:26:18,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1486.42224 ± 648.416
2026-01-23 00:26:18,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [1397.0613, 753.732, 1259.2625, 2463.7576, 2600.8936, 1371.9912, 1054.5001, 2139.0762, 1162.1637, 661.78394]
2026-01-23 00:26:18,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [264.0, 158.0, 250.0, 492.0, 510.0, 294.0, 208.0, 432.0, 231.0, 124.0]
2026-01-23 00:26:18,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 54 seconds)
2026-01-23 00:28:17,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:28:21,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1438.00659 ± 544.180
2026-01-23 00:28:21,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [2107.2307, 1394.6321, 1583.0037, 966.4353, 2133.3167, 825.501, 1589.2172, 2099.115, 1184.5542, 497.0597]
2026-01-23 00:28:21,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [418.0, 275.0, 321.0, 196.0, 431.0, 169.0, 309.0, 416.0, 242.0, 97.0]
2026-01-23 00:28:21,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 59 minutes, 29 seconds)
2026-01-23 00:30:09,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:30:13,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1780.23047 ± 824.701
2026-01-23 00:30:13,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [930.42804, 2767.6218, 2960.723, 3004.0146, 1236.9467, 1731.3573, 724.0269, 1775.0165, 885.9056, 1786.2645]
2026-01-23 00:30:13,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [176.0, 533.0, 565.0, 573.0, 243.0, 347.0, 144.0, 349.0, 171.0, 343.0]
2026-01-23 00:30:13,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 57 minutes, 12 seconds)
2026-01-23 00:32:07,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:32:12,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1996.09241 ± 1191.794
2026-01-23 00:32:12,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [1523.6317, 4714.1284, 1118.5674, 852.1078, 1245.7772, 1144.4622, 986.5387, 2343.776, 2963.8604, 3068.0737]
2026-01-23 00:32:12,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [290.0, 920.0, 213.0, 164.0, 232.0, 210.0, 185.0, 452.0, 592.0, 584.0]
2026-01-23 00:32:12,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (1996.09) for latency DatasetOffice
2026-01-23 00:32:12,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 55 minutes, 21 seconds)
2026-01-23 00:34:07,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:34:10,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1449.91174 ± 376.244
2026-01-23 00:34:10,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [1010.7046, 1932.9226, 933.2244, 1119.861, 1947.731, 1443.3895, 1062.9894, 1856.3579, 1659.4281, 1532.5085]
2026-01-23 00:34:10,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [192.0, 367.0, 181.0, 219.0, 379.0, 275.0, 199.0, 368.0, 310.0, 292.0]
2026-01-23 00:34:10,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 53 minutes, 59 seconds)
2026-01-23 00:36:00,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:36:03,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1427.04431 ± 548.795
2026-01-23 00:36:03,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [1728.7296, 2728.126, 1904.09, 1446.6617, 1351.3972, 667.9005, 1195.7412, 1149.1873, 1079.9072, 1018.7023]
2026-01-23 00:36:03,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [354.0, 538.0, 361.0, 280.0, 262.0, 125.0, 241.0, 225.0, 220.0, 187.0]
2026-01-23 00:36:03,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 51 minutes, 15 seconds)
2026-01-23 00:37:57,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:38:03,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 2104.86743 ± 894.970
2026-01-23 00:38:03,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [3967.8528, 2019.8744, 814.48737, 2298.889, 1744.5211, 1109.2402, 3024.9436, 1395.9884, 1981.7428, 2691.1345]
2026-01-23 00:38:03,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [767.0, 375.0, 168.0, 444.0, 319.0, 207.0, 582.0, 266.0, 380.0, 500.0]
2026-01-23 00:38:03,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (2104.87) for latency DatasetOffice
2026-01-23 00:38:03,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 48 minutes, 37 seconds)
2026-01-23 00:39:56,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:40:01,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1967.08167 ± 1118.260
2026-01-23 00:40:01,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [4789.907, 1143.5936, 1410.5186, 801.7181, 1715.5293, 1558.3596, 1773.4158, 3253.6934, 1699.8607, 1524.2217]
2026-01-23 00:40:01,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [940.0, 231.0, 281.0, 154.0, 326.0, 301.0, 343.0, 632.0, 334.0, 306.0]
2026-01-23 00:40:01,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 47 minutes, 41 seconds)
2026-01-23 00:41:57,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:42:01,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1644.78149 ± 739.332
2026-01-23 00:42:01,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [697.89966, 2424.176, 1341.9327, 875.504, 1208.1919, 1612.0062, 2223.25, 1622.3961, 1199.6661, 3242.7913]
2026-01-23 00:42:01,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [148.0, 489.0, 263.0, 167.0, 227.0, 307.0, 439.0, 343.0, 229.0, 636.0]
2026-01-23 00:42:01,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 45 minutes, 59 seconds)
2026-01-23 00:43:58,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:44:02,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1685.84900 ± 896.016
2026-01-23 00:44:02,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [974.8175, 1065.9022, 954.3606, 2169.1833, 1752.7954, 1457.3651, 1835.7084, 871.2559, 4057.7751, 1719.326]
2026-01-23 00:44:02,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [171.0, 202.0, 178.0, 443.0, 329.0, 283.0, 348.0, 163.0, 766.0, 347.0]
2026-01-23 00:44:02,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 44 minutes, 31 seconds)
2026-01-23 00:45:51,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:45:54,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1373.31433 ± 595.378
2026-01-23 00:45:54,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [2757.25, 883.45953, 1334.313, 501.49097, 1093.5602, 1540.2413, 1209.1135, 922.0064, 1828.2603, 1663.4485]
2026-01-23 00:45:54,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [527.0, 183.0, 241.0, 98.0, 204.0, 295.0, 230.0, 170.0, 342.0, 310.0]
2026-01-23 00:45:54,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 42 minutes, 23 seconds)
2026-01-23 00:47:51,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:47:59,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 2871.16748 ± 1343.223
2026-01-23 00:47:59,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5072.097, 2708.362, 4183.4375, 2190.8267, 693.59814, 4420.2417, 1827.5527, 1648.9865, 2196.6687, 3769.9036]
2026-01-23 00:47:59,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 531.0, 834.0, 438.0, 129.0, 885.0, 378.0, 331.0, 436.0, 761.0]
2026-01-23 00:47:59,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (2871.17) for latency DatasetOffice
2026-01-23 00:47:59,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 41 minutes, 23 seconds)
2026-01-23 00:49:51,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:49:58,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 2776.06665 ± 1506.195
2026-01-23 00:49:58,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [4981.936, 1342.3795, 2420.2627, 488.96857, 3810.9502, 3916.6235, 1885.7657, 5072.619, 2423.141, 1418.0198]
2026-01-23 00:49:58,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 257.0, 469.0, 96.0, 725.0, 791.0, 368.0, 1000.0, 494.0, 281.0]
2026-01-23 00:49:58,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 39 minutes, 33 seconds)
2026-01-23 00:51:52,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:51:56,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1523.70618 ± 815.147
2026-01-23 00:51:56,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [1539.864, 832.662, 1395.7731, 1823.9296, 1112.6411, 3494.685, 2387.45, 762.4262, 878.7544, 1008.87616]
2026-01-23 00:51:56,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [296.0, 166.0, 265.0, 347.0, 209.0, 653.0, 446.0, 145.0, 185.0, 210.0]
2026-01-23 00:51:56,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 37 minutes, 6 seconds)
2026-01-23 00:53:48,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:53:55,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 2521.89014 ± 1219.983
2026-01-23 00:53:55,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [2036.716, 854.9014, 2330.7087, 3734.1995, 2628.657, 4749.525, 1159.2703, 1793.656, 1823.6038, 4107.664]
2026-01-23 00:53:55,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [402.0, 164.0, 451.0, 750.0, 505.0, 921.0, 221.0, 345.0, 373.0, 851.0]
2026-01-23 00:53:55,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 34 minutes, 53 seconds)
2026-01-23 00:55:50,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:55:58,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 2808.24609 ± 1228.324
2026-01-23 00:55:58,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [3136.9492, 5007.517, 3636.4792, 2625.3403, 1185.0626, 2200.7866, 3595.4226, 2949.7505, 453.21503, 3291.94]
2026-01-23 00:55:58,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [650.0, 1000.0, 728.0, 532.0, 242.0, 452.0, 722.0, 589.0, 99.0, 654.0]
2026-01-23 00:55:58,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 34 minutes, 32 seconds)
2026-01-23 00:57:51,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:57:57,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 2178.77856 ± 1494.223
2026-01-23 00:57:57,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [3733.8965, 4719.4956, 3678.5906, 1233.6942, 3476.6365, 1691.7231, 342.50754, 1588.8026, 838.9591, 483.47922]
2026-01-23 00:57:57,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [764.0, 937.0, 757.0, 250.0, 694.0, 339.0, 65.0, 312.0, 163.0, 96.0]
2026-01-23 00:57:57,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 31 minutes, 44 seconds)
2026-01-23 00:59:53,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:00:03,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 3571.31885 ± 1233.234
2026-01-23 01:00:03,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [4978.629, 4086.683, 2023.3933, 3596.1033, 2145.9277, 1545.2942, 4954.5957, 3326.6243, 5087.276, 3968.662]
2026-01-23 01:00:03,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 818.0, 398.0, 705.0, 421.0, 297.0, 1000.0, 649.0, 1000.0, 808.0]
2026-01-23 01:00:03,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (3571.32) for latency DatasetOffice
2026-01-23 01:00:03,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 30 minutes, 46 seconds)
2026-01-23 01:02:03,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:02:12,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 3266.78662 ± 1769.348
2026-01-23 01:02:12,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5011.423, 5077.3965, 740.3997, 5038.6763, 4962.967, 4927.2295, 1874.1669, 1702.8219, 2066.1738, 1266.6119]
2026-01-23 01:02:12,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 139.0, 1000.0, 1000.0, 1000.0, 360.0, 315.0, 396.0, 239.0]
2026-01-23 01:02:12,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 30 minutes, 25 seconds)
2026-01-23 01:03:58,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:04:03,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1871.38477 ± 1159.478
2026-01-23 01:04:03,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [1177.1385, 1855.7238, 3202.0552, 985.2222, 2448.1316, 4282.3013, 2450.7012, 1207.1625, 607.06995, 498.33997]
2026-01-23 01:04:03,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [233.0, 368.0, 645.0, 187.0, 465.0, 832.0, 466.0, 225.0, 140.0, 98.0]
2026-01-23 01:04:03,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 27 minutes, 7 seconds)
2026-01-23 01:05:59,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:06:10,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 3987.21631 ± 1493.465
2026-01-23 01:06:10,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [2689.0732, 4972.569, 5007.023, 5039.1836, 1499.4585, 4716.311, 1144.1777, 4906.4907, 4907.9326, 4989.9434]
2026-01-23 01:06:10,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [547.0, 1000.0, 1000.0, 1000.0, 277.0, 937.0, 217.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:06:10,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (3987.22) for latency DatasetOffice
2026-01-23 01:06:10,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 25 minutes, 45 seconds)
2026-01-23 01:08:03,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:08:16,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4541.83691 ± 885.761
2026-01-23 01:08:16,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [4059.3933, 2921.506, 5131.8086, 5050.998, 5055.97, 2838.112, 5030.399, 5082.1123, 5075.649, 5172.417]
2026-01-23 01:08:16,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [795.0, 570.0, 1000.0, 1000.0, 1000.0, 548.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:08:16,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (4541.84) for latency DatasetOffice
2026-01-23 01:08:16,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 24 minutes, 32 seconds)
2026-01-23 01:10:20,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:10:26,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 2488.66650 ± 1359.167
2026-01-23 01:10:26,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [2770.9773, 5211.5986, 3681.6985, 1262.3145, 1119.6366, 836.61053, 2680.709, 2954.3232, 3411.1033, 957.694]
2026-01-23 01:10:26,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [527.0, 1000.0, 688.0, 250.0, 227.0, 165.0, 502.0, 572.0, 658.0, 186.0]
2026-01-23 01:10:26,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 23 minutes, 3 seconds)
2026-01-23 01:12:14,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:12:21,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 2737.63672 ± 1426.086
2026-01-23 01:12:21,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [1914.0879, 2193.0847, 1516.9446, 4382.235, 1227.0353, 2965.0923, 2471.9558, 909.726, 5106.592, 4689.6133]
2026-01-23 01:12:21,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [377.0, 420.0, 301.0, 835.0, 223.0, 562.0, 481.0, 189.0, 1000.0, 903.0]
2026-01-23 01:12:21,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 19 minutes, 5 seconds)
2026-01-23 01:14:21,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:14:33,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4163.72803 ± 1393.670
2026-01-23 01:14:33,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [4968.199, 5035.866, 1386.8379, 4974.542, 4749.3667, 4305.3574, 4947.7993, 1423.6857, 4856.6357, 4988.9937]
2026-01-23 01:14:33,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 261.0, 1000.0, 964.0, 860.0, 1000.0, 280.0, 1000.0, 1000.0]
2026-01-23 01:14:33,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 19 minutes, 48 seconds)
2026-01-23 01:16:24,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:16:34,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 3729.76367 ± 1963.980
2026-01-23 01:16:34,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [4962.091, 5073.8394, 5055.92, 4992.184, 913.4503, 396.938, 901.42816, 5061.353, 4919.0996, 5021.3335]
2026-01-23 01:16:34,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 175.0, 73.0, 169.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:16:34,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 16 minutes, 58 seconds)
2026-01-23 01:18:34,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:18:40,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 2471.05664 ± 1315.360
2026-01-23 01:18:40,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [1578.6411, 2684.6084, 2418.05, 2934.0388, 740.8325, 5252.7754, 858.24286, 1844.5778, 4052.7185, 2346.083]
2026-01-23 01:18:40,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [293.0, 529.0, 460.0, 559.0, 148.0, 1000.0, 184.0, 360.0, 821.0, 454.0]
2026-01-23 01:18:40,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 14 minutes, 54 seconds)
2026-01-23 01:20:36,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:20:48,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4225.34082 ± 1364.223
2026-01-23 01:20:48,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [3141.8882, 5111.822, 4834.779, 5160.7134, 1017.84503, 4831.877, 2731.2712, 5120.821, 5167.403, 5134.9854]
2026-01-23 01:20:48,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [613.0, 1000.0, 940.0, 1000.0, 207.0, 964.0, 546.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:20:48,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 12 minutes, 29 seconds)
2026-01-23 01:22:35,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:22:44,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 2851.54224 ± 1967.119
2026-01-23 01:22:44,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [2340.9248, 668.5389, 225.40422, 360.90454, 5050.533, 5015.834, 1751.133, 2967.799, 5006.9736, 5127.3794]
2026-01-23 01:22:44,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [462.0, 130.0, 46.0, 68.0, 1000.0, 1000.0, 344.0, 591.0, 1000.0, 1000.0]
2026-01-23 01:22:44,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 10 minutes, 35 seconds)
2026-01-23 01:24:42,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:24:53,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 3859.35791 ± 1731.757
2026-01-23 01:24:53,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [4998.437, 884.34045, 952.09436, 4973.907, 4990.9946, 5020.1445, 4992.79, 1890.7225, 4971.644, 4918.503]
2026-01-23 01:24:53,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 177.0, 189.0, 1000.0, 1000.0, 1000.0, 1000.0, 361.0, 1000.0, 1000.0]
2026-01-23 01:24:53,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 8 minutes, 13 seconds)
2026-01-23 01:26:44,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:26:58,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4855.46777 ± 850.083
2026-01-23 01:26:58,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5128.761, 5144.9307, 5151.521, 2305.4548, 5136.633, 5127.8794, 5125.6353, 5125.0522, 5159.9854, 5148.8257]
2026-01-23 01:26:58,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 443.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:26:58,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (4855.47) for latency DatasetOffice
2026-01-23 01:26:58,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 6 minutes, 33 seconds)
2026-01-23 01:28:59,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:29:09,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 3487.16016 ± 2104.461
2026-01-23 01:29:09,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [2946.4316, 230.04756, 910.6224, 161.72375, 5045.0903, 5138.732, 5073.701, 5116.343, 5095.7197, 5153.1904]
2026-01-23 01:29:09,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [578.0, 48.0, 183.0, 31.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:29:09,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 5 minutes, 1 second)
2026-01-23 01:31:01,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:31:16,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 5079.38574 ± 19.361
2026-01-23 01:31:16,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5057.8135, 5085.4004, 5062.0273, 5117.0273, 5064.7417, 5089.009, 5102.522, 5089.6104, 5067.0493, 5058.654]
2026-01-23 01:31:16,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:31:16,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (5079.39) for latency DatasetOffice
2026-01-23 01:31:16,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 2 minutes, 47 seconds)
2026-01-23 01:33:12,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:33:25,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4499.15527 ± 1211.655
2026-01-23 01:33:25,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [4941.472, 5073.308, 3441.6777, 5084.736, 5145.9233, 4944.066, 4988.354, 1166.2645, 5071.778, 5133.972]
2026-01-23 01:33:25,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 695.0, 1000.0, 1000.0, 1000.0, 1000.0, 231.0, 1000.0, 1000.0]
2026-01-23 01:33:25,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 2 minutes)
2026-01-23 01:35:19,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:35:33,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 5085.43066 ± 33.313
2026-01-23 01:35:33,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5113.374, 5127.002, 5086.1787, 5069.256, 5087.282, 5113.381, 5091.133, 4999.8403, 5074.057, 5092.806]
2026-01-23 01:35:33,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:35:33,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (5085.43) for latency DatasetOffice
2026-01-23 01:35:33,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 73/100 (estimated time remaining: 59 minutes, 45 seconds)
2026-01-23 01:37:25,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:37:39,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4661.83887 ± 1002.602
2026-01-23 01:37:39,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5119.2373, 3176.891, 2228.0476, 5158.4194, 5149.573, 5150.489, 5130.726, 5159.7393, 5143.889, 5201.3765]
2026-01-23 01:37:39,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 578.0, 409.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:37:39,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 74/100 (estimated time remaining: 57 minutes, 37 seconds)
2026-01-23 01:39:34,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:39:44,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 3407.32812 ± 2134.968
2026-01-23 01:39:44,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5109.3506, 5064.9077, 5109.376, 5065.7163, 5110.451, 5077.4443, 2388.4321, 429.97482, 592.96436, 124.66525]
2026-01-23 01:39:44,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 467.0, 82.0, 123.0, 24.0]
2026-01-23 01:39:44,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 75/100 (estimated time remaining: 54 minutes, 59 seconds)
2026-01-23 01:41:40,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:41:54,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4735.49414 ± 1260.741
2026-01-23 01:41:54,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5138.93, 5148.5635, 5198.6064, 5154.3315, 5171.1064, 5141.703, 5152.864, 953.6595, 5163.649, 5131.5264]
2026-01-23 01:41:54,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 172.0, 1000.0, 1000.0]
2026-01-23 01:41:54,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 76/100 (estimated time remaining: 53 minutes, 9 seconds)
2026-01-23 01:43:49,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:44:03,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4636.65918 ± 1286.846
2026-01-23 01:44:03,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5081.843, 5076.7197, 5077.229, 776.55914, 5091.703, 5057.3975, 5059.7295, 5021.5444, 5046.5645, 5077.301]
2026-01-23 01:44:03,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 154.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:44:03,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 77/100 (estimated time remaining: 50 minutes, 59 seconds)
2026-01-23 01:46:01,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:46:14,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4111.69092 ± 1487.419
2026-01-23 01:46:14,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [4945.2466, 5063.4834, 4959.644, 775.574, 1963.7876, 3285.867, 5028.3335, 5029.222, 5031.802, 5033.9487]
2026-01-23 01:46:14,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 173.0, 385.0, 646.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:46:14,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 78/100 (estimated time remaining: 49 minutes, 4 seconds)
2026-01-23 01:48:12,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:48:26,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4665.27344 ± 1253.582
2026-01-23 01:48:26,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5067.2236, 5088.6836, 5092.704, 5110.008, 5101.154, 5104.645, 904.9354, 5066.8926, 5061.6934, 5054.7964]
2026-01-23 01:48:26,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 172.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:48:26,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 79/100 (estimated time remaining: 47 minutes, 26 seconds)
2026-01-23 01:50:17,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:50:32,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4989.90381 ± 26.489
2026-01-23 01:50:32,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [4990.9863, 4971.5015, 4975.395, 5036.3228, 5019.331, 4941.942, 4976.1875, 5010.879, 4972.248, 5004.243]
2026-01-23 01:50:32,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:50:32,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 80/100 (estimated time remaining: 45 minutes, 19 seconds)
2026-01-23 01:52:26,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:52:39,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4523.27002 ± 1279.570
2026-01-23 01:52:39,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5099.984, 5083.455, 5074.945, 5095.3335, 5060.721, 5061.2695, 5072.701, 886.2547, 5090.4087, 3707.625]
2026-01-23 01:52:39,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 171.0, 1000.0, 734.0]
2026-01-23 01:52:39,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 81/100 (estimated time remaining: 43 minutes, 1 second)
2026-01-23 01:54:35,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:54:49,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4480.73145 ± 1256.492
2026-01-23 01:54:49,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [4884.819, 4882.092, 713.57056, 4874.83, 4885.9556, 4880.9067, 4893.045, 4874.2363, 5029.6216, 4888.243]
2026-01-23 01:54:49,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 165.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:54:49,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 82/100 (estimated time remaining: 40 minutes, 57 seconds)
2026-01-23 01:56:46,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:56:55,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 3361.68213 ± 2093.121
2026-01-23 01:56:55,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5181.924, 5155.3276, 2006.6755, 811.4106, 421.24872, 235.25204, 5161.461, 5132.515, 5116.0127, 4394.9907]
2026-01-23 01:56:55,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 388.0, 158.0, 79.0, 50.0, 1000.0, 1000.0, 1000.0, 856.0]
2026-01-23 01:56:55,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 83/100 (estimated time remaining: 38 minutes, 29 seconds)
2026-01-23 01:58:55,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:59:08,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4598.41406 ± 1205.062
2026-01-23 01:59:08,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5070.8687, 4942.8975, 985.46136, 4989.215, 4963.048, 5069.8975, 5016.636, 4947.099, 5009.778, 4989.2427]
2026-01-23 01:59:08,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 207.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:59:08,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 84/100 (estimated time remaining: 36 minutes, 24 seconds)
2026-01-23 02:00:57,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:01:10,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4788.45215 ± 1249.224
2026-01-23 02:01:10,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5224.8164, 5237.232, 5150.8857, 1041.7218, 5160.4185, 5232.845, 5212.115, 5228.5, 5196.5806, 5199.4053]
2026-01-23 02:01:10,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 202.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:01:10,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 85/100 (estimated time remaining: 34 minutes, 4 seconds)
2026-01-23 02:03:09,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:03:22,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4687.66162 ± 1188.593
2026-01-23 02:03:22,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [4535.3374, 1164.5613, 5154.884, 5168.4116, 5153.5522, 5147.244, 5098.0063, 5161.933, 5134.004, 5158.685]
2026-01-23 02:03:22,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [888.0, 233.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:03:22,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 86/100 (estimated time remaining: 32 minutes, 10 seconds)
2026-01-23 02:05:21,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:05:36,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 5054.92725 ± 17.496
2026-01-23 02:05:36,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5049.9077, 5062.7124, 5064.084, 5051.2812, 5059.5664, 5074.5454, 5012.2456, 5077.2925, 5054.9497, 5042.6924]
2026-01-23 02:05:36,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:05:36,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 87/100 (estimated time remaining: 30 minutes, 9 seconds)
2026-01-23 02:07:22,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:07:34,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4292.69629 ± 1416.835
2026-01-23 02:07:34,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [4960.067, 4978.27, 4976.2393, 5006.3857, 4969.819, 4960.908, 4941.7324, 4924.2026, 2606.224, 603.1156]
2026-01-23 02:07:34,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 962.0, 1000.0, 522.0, 117.0]
2026-01-23 02:07:34,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 88/100 (estimated time remaining: 27 minutes, 42 seconds)
2026-01-23 02:09:24,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:09:36,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4604.60156 ± 1400.536
2026-01-23 02:09:36,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5240.198, 5215.4316, 5226.0464, 5173.723, 5245.0854, 5219.785, 673.17426, 3569.484, 5250.5317, 5232.555]
2026-01-23 02:09:36,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 131.0, 701.0, 1000.0, 1000.0]
2026-01-23 02:09:36,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 89/100 (estimated time remaining: 25 minutes, 6 seconds)
2026-01-23 02:11:31,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:11:44,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4680.16748 ± 1037.154
2026-01-23 02:11:44,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5132.759, 5015.526, 5039.69, 5092.1714, 5023.5874, 5067.9663, 4723.88, 1584.9034, 5071.1973, 5049.997]
2026-01-23 02:11:44,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 939.0, 311.0, 1000.0, 1000.0]
2026-01-23 02:11:44,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 90/100 (estimated time remaining: 23 minutes, 14 seconds)
2026-01-23 02:13:29,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:13:37,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 3254.22827 ± 2041.998
2026-01-23 02:13:37,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5108.7354, 5077.9243, 5106.3813, 5216.2563, 5046.7524, 3313.343, 711.9906, 531.44464, 156.38895, 2273.0657]
2026-01-23 02:13:37,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 648.0, 139.0, 111.0, 30.0, 424.0]
2026-01-23 02:13:37,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 91/100 (estimated time remaining: 20 minutes, 29 seconds)
2026-01-23 02:15:30,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:15:44,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 5017.01025 ± 31.822
2026-01-23 02:15:44,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [4938.5083, 5024.6763, 5043.9946, 5045.064, 5053.2393, 5013.669, 5036.464, 4997.607, 4999.3784, 5017.5034]
2026-01-23 02:15:44,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:15:44,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 92/100 (estimated time remaining: 18 minutes, 14 seconds)
2026-01-23 02:17:34,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:17:48,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4971.21875 ± 10.315
2026-01-23 02:17:48,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [4970.3525, 4963.71, 4958.9062, 4950.504, 4986.7446, 4973.2456, 4979.837, 4978.401, 4971.865, 4978.6177]
2026-01-23 02:17:48,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:17:48,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 93/100 (estimated time remaining: 16 minutes, 22 seconds)
2026-01-23 02:19:37,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:19:46,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 3382.19775 ± 1921.201
2026-01-23 02:19:46,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5005.641, 3279.7014, 165.42026, 3162.5166, 185.46146, 1809.5873, 5085.3267, 5066.283, 5000.405, 5061.636]
2026-01-23 02:19:46,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 648.0, 32.0, 628.0, 36.0, 357.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:19:46,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 94/100 (estimated time remaining: 14 minutes, 14 seconds)
2026-01-23 02:21:36,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:21:49,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4527.28223 ± 1067.834
2026-01-23 02:21:49,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [4889.9443, 4901.0215, 4908.88, 4880.1387, 4882.3594, 4864.2266, 4934.247, 1324.9688, 4869.0835, 4817.9575]
2026-01-23 02:21:49,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 250.0, 1000.0, 1000.0]
2026-01-23 02:21:49,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 95/100 (estimated time remaining: 12 minutes, 6 seconds)
2026-01-23 02:23:43,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:23:55,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4116.29395 ± 1790.102
2026-01-23 02:23:55,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5020.202, 5017.4893, 323.23773, 5022.7856, 4993.48, 5000.141, 759.69055, 5019.6074, 5002.3057, 5003.998]
2026-01-23 02:23:55,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 71.0, 1000.0, 1000.0, 1000.0, 167.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:23:55,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 96/100 (estimated time remaining: 10 minutes, 17 seconds)
2026-01-23 02:25:41,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:25:54,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4748.86182 ± 962.907
2026-01-23 02:25:54,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5039.4653, 5148.5737, 5089.2036, 5055.793, 5079.492, 5024.143, 5039.881, 1861.9713, 5096.571, 5053.5244]
2026-01-23 02:25:54,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 374.0, 1000.0, 1000.0]
2026-01-23 02:25:54,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes, 7 seconds)
2026-01-23 02:27:46,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:28:00,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4928.71387 ± 334.416
2026-01-23 02:28:00,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [4935.0254, 5059.042, 3932.2166, 5052.595, 5034.432, 5043.017, 5035.195, 5085.855, 5075.5146, 5034.248]
2026-01-23 02:28:00,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 790.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:28:00,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 7 seconds)
2026-01-23 02:29:55,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:30:03,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 2937.54102 ± 2161.892
2026-01-23 02:30:03,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [464.00452, 5106.5713, 2637.2986, 535.65186, 333.9998, 384.38528, 5031.264, 5069.475, 5102.5376, 4710.223]
2026-01-23 02:30:03,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [93.0, 1000.0, 516.0, 110.0, 75.0, 71.0, 1000.0, 1000.0, 1000.0, 928.0]
2026-01-23 02:30:03,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 6 seconds)
2026-01-23 02:31:51,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:32:04,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4464.04834 ± 1052.518
2026-01-23 02:32:04,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [4801.661, 4764.7407, 4921.6206, 4851.3477, 4767.819, 4839.7236, 4770.6987, 1309.7538, 4770.985, 4842.1333]
2026-01-23 02:32:04,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 270.0, 1000.0, 1000.0]
2026-01-23 02:32:04,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 3 seconds)
2026-01-23 02:33:54,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:34:08,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4983.14648 ± 20.432
2026-01-23 02:34:08,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [4967.5825, 4993.038, 4972.316, 4965.7446, 4956.367, 4990.0557, 4963.708, 4993.47, 5004.903, 5024.2803]
2026-01-23 02:34:08,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:34:08,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1299 [DEBUG]: Training session finished
