2026-01-22 23:14:22,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1156 [DEBUG]: logdir: _logs/benchmark-v3-tc10/noisy-humanoid/DatasetOffice-bpql-mem2
2026-01-22 23:14:22,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1157 [DEBUG]: trainer_prefix: benchmark-v3-tc10/noisy-humanoid/DatasetOffice-bpql-mem2
2026-01-22 23:14:22,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1158 [DEBUG]: args.trainer_eval_latencies: {'DatasetOffice': <latency_env.delayed_mdp.DatasetDelay object at 0x14c14f10fad0>}
2026-01-22 23:14:22,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1159 [DEBUG]: using device: cuda
2026-01-22 23:14:22,770 baseline-bpql-noisy-humanoid:77 [WARNING]: args.assumed_delay != args.horizon: 2 != 32
2026-01-22 23:14:22,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1181 [INFO]: Creating new trainer
2026-01-22 23:14:22,788 baseline-bpql-noisy-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=410, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2026-01-22 23:14:22,788 baseline-bpql-noisy-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2026-01-22 23:14:24,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1242 [DEBUG]: Starting training session...
2026-01-22 23:14:24,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 1/100
2026-01-22 23:16:05,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:16:05,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 205.69174 ± 7.172
2026-01-22 23:16:05,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [199.52792, 193.95622, 216.55573, 211.55443, 203.0008, 216.07843, 205.08516, 199.28944, 202.07947, 209.78984]
2026-01-22 23:16:05,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [43.0, 41.0, 46.0, 43.0, 42.0, 45.0, 41.0, 41.0, 41.0, 42.0]
2026-01-22 23:16:05,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (205.69) for latency DatasetOffice
2026-01-22 23:16:05,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 47 minutes, 30 seconds)
2026-01-22 23:17:56,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:17:57,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 363.96240 ± 71.946
2026-01-22 23:17:57,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [304.7754, 289.42722, 509.87515, 444.07526, 295.16635, 337.8277, 307.72382, 390.20767, 329.30508, 431.2403]
2026-01-22 23:17:57,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [60.0, 59.0, 99.0, 86.0, 62.0, 70.0, 63.0, 82.0, 71.0, 81.0]
2026-01-22 23:17:57,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (363.96) for latency DatasetOffice
2026-01-22 23:17:57,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 53 minutes, 35 seconds)
2026-01-22 23:19:47,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:19:49,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 513.60156 ± 107.946
2026-01-22 23:19:49,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [498.7742, 474.80466, 637.251, 748.4482, 555.0782, 437.2362, 465.1168, 390.22574, 379.63226, 549.4482]
2026-01-22 23:19:49,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [98.0, 88.0, 122.0, 142.0, 105.0, 83.0, 90.0, 72.0, 77.0, 103.0]
2026-01-22 23:19:49,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (513.60) for latency DatasetOffice
2026-01-22 23:19:49,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 54 minutes, 57 seconds)
2026-01-22 23:21:38,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:21:39,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 321.67218 ± 33.389
2026-01-22 23:21:39,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [296.35742, 327.36496, 298.20496, 299.40628, 302.18692, 295.70123, 367.46527, 306.2448, 324.45605, 399.334]
2026-01-22 23:21:39,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [56.0, 62.0, 58.0, 56.0, 61.0, 58.0, 74.0, 65.0, 64.0, 79.0]
2026-01-22 23:21:39,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 54 minutes, 5 seconds)
2026-01-22 23:23:30,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:23:31,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 455.45761 ± 59.849
2026-01-22 23:23:31,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [379.6081, 489.26398, 428.93546, 610.2512, 450.98978, 482.55582, 427.77353, 414.46902, 424.17313, 446.55624]
2026-01-22 23:23:31,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [70.0, 95.0, 81.0, 117.0, 85.0, 93.0, 83.0, 78.0, 81.0, 98.0]
2026-01-22 23:23:31,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 53 minutes, 11 seconds)
2026-01-22 23:25:21,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:25:22,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 487.87802 ± 149.601
2026-01-22 23:25:22,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [444.21664, 601.67346, 289.09955, 797.29376, 360.537, 414.68622, 529.1571, 385.6856, 392.71158, 663.7192]
2026-01-22 23:25:22,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [82.0, 119.0, 54.0, 163.0, 67.0, 76.0, 101.0, 75.0, 76.0, 125.0]
2026-01-22 23:25:22,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 54 minutes, 23 seconds)
2026-01-22 23:27:14,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:27:15,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 526.56909 ± 75.850
2026-01-22 23:27:15,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [518.1296, 502.21124, 480.5293, 543.2595, 486.39346, 530.8435, 565.4527, 470.44012, 440.11707, 728.31396]
2026-01-22 23:27:15,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [105.0, 103.0, 106.0, 102.0, 90.0, 111.0, 109.0, 103.0, 95.0, 139.0]
2026-01-22 23:27:15,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (526.57) for latency DatasetOffice
2026-01-22 23:27:15,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 53 minutes, 5 seconds)
2026-01-22 23:29:05,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:29:07,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 586.69495 ± 158.280
2026-01-22 23:29:07,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [814.50134, 624.5786, 513.3393, 472.62787, 471.58176, 479.1798, 439.94843, 460.5546, 916.761, 673.87695]
2026-01-22 23:29:07,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [164.0, 132.0, 100.0, 91.0, 90.0, 87.0, 83.0, 85.0, 180.0, 132.0]
2026-01-22 23:29:07,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (586.69) for latency DatasetOffice
2026-01-22 23:29:07,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 51 minutes, 6 seconds)
2026-01-22 23:30:58,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:30:59,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 552.16821 ± 159.454
2026-01-22 23:30:59,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [870.80414, 336.01163, 612.9216, 509.69235, 371.62393, 446.8962, 606.12384, 575.0365, 753.6112, 438.96048]
2026-01-22 23:30:59,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [164.0, 63.0, 117.0, 96.0, 67.0, 86.0, 125.0, 114.0, 146.0, 81.0]
2026-01-22 23:30:59,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 49 minutes, 55 seconds)
2026-01-22 23:32:49,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:32:50,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 584.52124 ± 106.409
2026-01-22 23:32:50,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [761.58307, 504.38492, 480.59204, 443.65024, 504.2351, 604.97504, 586.452, 577.7388, 603.3181, 778.2826]
2026-01-22 23:32:50,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [153.0, 95.0, 94.0, 97.0, 95.0, 136.0, 111.0, 108.0, 131.0, 174.0]
2026-01-22 23:32:50,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 47 minutes, 50 seconds)
2026-01-22 23:34:42,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:34:43,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 567.53363 ± 138.061
2026-01-22 23:34:43,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [637.59406, 407.65573, 435.75937, 526.4376, 629.9921, 536.487, 638.14874, 508.27872, 905.02374, 449.95898]
2026-01-22 23:34:43,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [123.0, 79.0, 94.0, 99.0, 124.0, 102.0, 123.0, 99.0, 178.0, 84.0]
2026-01-22 23:34:43,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 46 minutes, 28 seconds)
2026-01-22 23:36:33,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:36:35,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 628.10468 ± 138.682
2026-01-22 23:36:35,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [456.80975, 924.53174, 602.60767, 488.77405, 618.84546, 614.2126, 687.50433, 491.948, 805.05615, 590.75714]
2026-01-22 23:36:35,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [98.0, 192.0, 124.0, 106.0, 119.0, 120.0, 133.0, 94.0, 155.0, 120.0]
2026-01-22 23:36:35,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (628.10) for latency DatasetOffice
2026-01-22 23:36:35,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 44 minutes, 16 seconds)
2026-01-22 23:38:26,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:38:27,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 567.07410 ± 76.746
2026-01-22 23:38:27,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [597.9695, 535.84656, 546.18427, 496.16357, 468.26947, 548.07434, 755.71014, 635.10516, 549.9286, 537.4888]
2026-01-22 23:38:27,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [115.0, 98.0, 102.0, 92.0, 86.0, 104.0, 146.0, 123.0, 106.0, 100.0]
2026-01-22 23:38:27,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 42 minutes, 31 seconds)
2026-01-22 23:40:18,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:40:20,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 603.16022 ± 99.709
2026-01-22 23:40:20,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [540.63336, 846.7711, 583.62213, 521.8749, 534.50226, 634.5302, 581.20074, 484.82373, 605.7051, 697.9384]
2026-01-22 23:40:20,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [100.0, 176.0, 108.0, 98.0, 99.0, 124.0, 124.0, 89.0, 113.0, 132.0]
2026-01-22 23:40:20,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 40 minutes, 40 seconds)
2026-01-22 23:42:10,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:42:12,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 609.84875 ± 92.876
2026-01-22 23:42:12,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [654.4202, 659.69904, 741.41473, 660.3556, 694.15356, 436.9846, 587.8642, 624.0862, 456.4377, 583.07166]
2026-01-22 23:42:12,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [125.0, 119.0, 139.0, 132.0, 135.0, 97.0, 112.0, 122.0, 91.0, 118.0]
2026-01-22 23:42:12,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 39 minutes, 6 seconds)
2026-01-22 23:44:03,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:44:05,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 644.61115 ± 174.983
2026-01-22 23:44:05,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [706.8715, 453.2886, 376.60846, 748.9317, 573.2857, 765.3442, 486.86432, 551.2432, 953.0571, 830.6166]
2026-01-22 23:44:05,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [131.0, 97.0, 74.0, 144.0, 110.0, 153.0, 95.0, 106.0, 199.0, 167.0]
2026-01-22 23:44:05,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (644.61) for latency DatasetOffice
2026-01-22 23:44:05,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 37 minutes, 15 seconds)
2026-01-22 23:45:56,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:45:57,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 619.25891 ± 155.794
2026-01-22 23:45:57,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [627.91675, 422.39737, 758.6515, 567.0957, 493.82568, 596.80975, 572.4306, 711.5744, 979.88904, 461.99774]
2026-01-22 23:45:57,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [129.0, 86.0, 140.0, 104.0, 91.0, 113.0, 109.0, 135.0, 188.0, 86.0]
2026-01-22 23:45:57,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 35 minutes, 33 seconds)
2026-01-22 23:47:49,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:47:50,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 577.89734 ± 65.980
2026-01-22 23:47:50,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [603.58325, 694.6406, 513.3657, 473.03082, 585.71985, 577.6828, 626.10474, 508.52295, 653.22723, 543.0963]
2026-01-22 23:47:50,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [129.0, 132.0, 98.0, 101.0, 106.0, 107.0, 118.0, 100.0, 131.0, 104.0]
2026-01-22 23:47:50,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 33 minutes, 56 seconds)
2026-01-22 23:49:41,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:49:42,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 701.11542 ± 187.880
2026-01-22 23:49:42,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [547.71783, 636.9834, 407.4411, 639.5214, 863.23645, 716.07544, 1139.8644, 584.5253, 736.8376, 738.9515]
2026-01-22 23:49:42,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [103.0, 123.0, 87.0, 135.0, 162.0, 133.0, 217.0, 120.0, 135.0, 142.0]
2026-01-22 23:49:42,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (701.12) for latency DatasetOffice
2026-01-22 23:49:42,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 31 minutes, 50 seconds)
2026-01-22 23:51:33,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:51:34,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 631.23016 ± 83.009
2026-01-22 23:51:34,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [572.9306, 729.5598, 491.0425, 579.41547, 632.1187, 645.59125, 772.41943, 563.38074, 604.84076, 721.002]
2026-01-22 23:51:34,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [110.0, 141.0, 98.0, 109.0, 119.0, 122.0, 145.0, 117.0, 115.0, 134.0]
2026-01-22 23:51:35,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 30 minutes, 1 second)
2026-01-22 23:53:26,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:53:27,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 640.31488 ± 103.632
2026-01-22 23:53:27,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [553.2932, 644.4273, 685.73145, 909.23755, 658.5607, 660.1367, 545.7768, 594.8887, 627.0683, 524.0279]
2026-01-22 23:53:27,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [103.0, 122.0, 128.0, 165.0, 117.0, 134.0, 104.0, 109.0, 118.0, 99.0]
2026-01-22 23:53:27,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 28 minutes, 9 seconds)
2026-01-22 23:55:17,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:55:19,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 910.41809 ± 233.954
2026-01-22 23:55:19,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [529.88043, 874.25287, 1050.7717, 919.7601, 1297.4714, 1210.8055, 774.91986, 1061.8052, 642.6072, 741.9064]
2026-01-22 23:55:19,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [103.0, 171.0, 201.0, 170.0, 249.0, 230.0, 150.0, 198.0, 134.0, 144.0]
2026-01-22 23:55:19,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (910.42) for latency DatasetOffice
2026-01-22 23:55:19,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 26 minutes, 9 seconds)
2026-01-22 23:57:11,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:57:12,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 775.61597 ± 225.991
2026-01-22 23:57:12,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [1001.60767, 419.61975, 1248.1068, 659.62585, 649.6433, 837.753, 629.40137, 850.59174, 579.40216, 880.4078]
2026-01-22 23:57:12,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [198.0, 81.0, 229.0, 121.0, 119.0, 154.0, 129.0, 158.0, 117.0, 175.0]
2026-01-22 23:57:12,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 24 minutes, 16 seconds)
2026-01-22 23:59:02,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-22 23:59:03,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 643.87708 ± 153.032
2026-01-22 23:59:03,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [712.3967, 749.2028, 658.31635, 415.81186, 686.2614, 934.5179, 566.06415, 408.57922, 558.17804, 749.4417]
2026-01-22 23:59:03,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [136.0, 143.0, 130.0, 82.0, 132.0, 182.0, 118.0, 79.0, 111.0, 140.0]
2026-01-22 23:59:03,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 22 minutes, 6 seconds)
2026-01-23 00:00:54,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:00:56,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 921.24188 ± 232.120
2026-01-23 00:00:56,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [1224.8683, 769.4514, 946.44867, 983.8885, 1090.6166, 1149.6256, 867.4934, 536.9977, 1111.1167, 531.912]
2026-01-23 00:00:56,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [226.0, 145.0, 182.0, 205.0, 213.0, 214.0, 166.0, 105.0, 210.0, 101.0]
2026-01-23 00:00:56,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (921.24) for latency DatasetOffice
2026-01-23 00:00:56,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 20 minutes, 17 seconds)
2026-01-23 00:02:46,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:02:47,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 646.16278 ± 177.335
2026-01-23 00:02:47,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [455.07562, 524.6642, 686.3942, 658.06647, 665.75366, 480.6314, 563.35645, 801.50964, 542.0375, 1084.139]
2026-01-23 00:02:47,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [89.0, 103.0, 133.0, 130.0, 130.0, 94.0, 114.0, 167.0, 106.0, 214.0]
2026-01-23 00:02:47,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 18 minutes, 4 seconds)
2026-01-23 00:04:39,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:04:41,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 725.42346 ± 277.529
2026-01-23 00:04:41,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [643.9723, 830.4461, 1083.9893, 949.7982, 816.485, 883.6923, 699.4557, 900.2535, 269.7004, 176.44206]
2026-01-23 00:04:41,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [121.0, 150.0, 211.0, 177.0, 156.0, 172.0, 133.0, 170.0, 56.0, 34.0]
2026-01-23 00:04:41,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 16 minutes, 36 seconds)
2026-01-23 00:06:31,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:06:34,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1069.79285 ± 364.929
2026-01-23 00:06:34,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [919.79553, 521.86633, 684.32935, 1082.4609, 1310.6298, 1706.1667, 1188.2563, 1562.2845, 1024.7518, 697.38794]
2026-01-23 00:06:34,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [182.0, 91.0, 133.0, 211.0, 251.0, 329.0, 234.0, 303.0, 194.0, 134.0]
2026-01-23 00:06:34,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (1069.79) for latency DatasetOffice
2026-01-23 00:06:34,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 14 minutes, 41 seconds)
2026-01-23 00:08:26,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:08:28,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1110.30298 ± 294.662
2026-01-23 00:08:28,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [1188.578, 1358.907, 1235.2952, 1078.1832, 1736.7433, 837.64825, 712.81116, 768.8589, 948.57513, 1237.4287]
2026-01-23 00:08:28,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [227.0, 267.0, 242.0, 212.0, 339.0, 161.0, 143.0, 143.0, 191.0, 248.0]
2026-01-23 00:08:28,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (1110.30) for latency DatasetOffice
2026-01-23 00:08:28,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 13 minutes, 43 seconds)
2026-01-23 00:10:18,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:10:21,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1193.43921 ± 306.268
2026-01-23 00:10:21,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [1501.9902, 1296.4683, 967.1996, 1750.343, 1173.2814, 884.091, 1450.8912, 800.8733, 1295.2798, 813.97424]
2026-01-23 00:10:21,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [294.0, 243.0, 199.0, 344.0, 218.0, 182.0, 274.0, 157.0, 252.0, 150.0]
2026-01-23 00:10:21,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (1193.44) for latency DatasetOffice
2026-01-23 00:10:21,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 11 minutes, 59 seconds)
2026-01-23 00:12:11,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:12:13,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 713.36395 ± 404.810
2026-01-23 00:12:13,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [812.4448, 780.43085, 880.2236, 456.17282, 292.47855, 563.50085, 449.141, 177.09691, 1620.2582, 1101.892]
2026-01-23 00:12:13,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [159.0, 150.0, 175.0, 86.0, 57.0, 106.0, 91.0, 34.0, 321.0, 215.0]
2026-01-23 00:12:13,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 10 minutes, 7 seconds)
2026-01-23 00:14:04,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:14:06,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 956.73065 ± 247.910
2026-01-23 00:14:06,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [798.3363, 739.33234, 1144.6388, 1446.4117, 989.4361, 1234.6078, 960.5044, 901.5514, 556.9575, 795.53107]
2026-01-23 00:14:06,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [155.0, 143.0, 238.0, 293.0, 198.0, 244.0, 184.0, 172.0, 104.0, 144.0]
2026-01-23 00:14:06,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 8 minutes, 9 seconds)
2026-01-23 00:15:58,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:16:03,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1938.02856 ± 1017.567
2026-01-23 00:16:03,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [1814.3846, 748.6851, 1169.2007, 615.8634, 4088.4382, 1916.2437, 1934.6692, 3306.0132, 2099.8655, 1686.9225]
2026-01-23 00:16:03,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [353.0, 152.0, 232.0, 132.0, 799.0, 366.0, 392.0, 643.0, 406.0, 326.0]
2026-01-23 00:16:03,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (1938.03) for latency DatasetOffice
2026-01-23 00:16:03,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 7 minutes, 11 seconds)
2026-01-23 00:17:55,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:17:58,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1465.68359 ± 702.709
2026-01-23 00:17:58,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [1484.5574, 1565.63, 1065.0883, 711.0246, 2753.4473, 996.7279, 2797.6099, 801.6286, 1201.5149, 1279.6068]
2026-01-23 00:17:58,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [293.0, 324.0, 211.0, 150.0, 550.0, 205.0, 559.0, 156.0, 246.0, 247.0]
2026-01-23 00:17:58,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 5 minutes, 26 seconds)
2026-01-23 00:19:51,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:19:55,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1476.03955 ± 798.122
2026-01-23 00:19:55,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [1460.352, 921.4771, 2625.8645, 510.9943, 980.89435, 2671.0955, 841.20496, 2411.908, 1745.88, 590.72504]
2026-01-23 00:19:55,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [296.0, 183.0, 511.0, 106.0, 199.0, 535.0, 172.0, 463.0, 343.0, 129.0]
2026-01-23 00:19:55,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 4 minutes, 13 seconds)
2026-01-23 00:21:45,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:21:49,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1552.05249 ± 493.971
2026-01-23 00:21:49,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [1695.5973, 1426.7141, 1066.6866, 815.0025, 1930.4308, 2587.8547, 1783.0706, 1841.26, 1135.596, 1238.3118]
2026-01-23 00:21:49,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [344.0, 275.0, 225.0, 160.0, 370.0, 485.0, 336.0, 356.0, 239.0, 253.0]
2026-01-23 00:21:49,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 2 minutes, 48 seconds)
2026-01-23 00:23:42,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:23:49,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 2957.08569 ± 1448.328
2026-01-23 00:23:49,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [4293.4854, 1481.8253, 2876.422, 5153.0166, 1464.5294, 3701.218, 2492.686, 1569.0596, 5149.0483, 1389.5664]
2026-01-23 00:23:49,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [846.0, 290.0, 572.0, 1000.0, 291.0, 714.0, 497.0, 327.0, 1000.0, 271.0]
2026-01-23 00:23:49,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (2957.09) for latency DatasetOffice
2026-01-23 00:23:50,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 2 minutes, 32 seconds)
2026-01-23 00:25:50,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:25:54,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1459.51746 ± 1243.130
2026-01-23 00:25:54,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [1651.5677, 587.9669, 560.1346, 496.27713, 1028.2141, 4239.321, 899.6029, 3439.5935, 744.31006, 948.1866]
2026-01-23 00:25:54,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [330.0, 132.0, 103.0, 109.0, 221.0, 835.0, 194.0, 677.0, 163.0, 204.0]
2026-01-23 00:25:54,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 2 minutes, 10 seconds)
2026-01-23 00:27:37,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:27:45,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 3297.81177 ± 1211.269
2026-01-23 00:27:45,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [2506.4695, 950.06116, 3594.869, 5194.4, 3789.8833, 3912.8643, 4106.829, 3060.9817, 1656.5225, 4205.2373]
2026-01-23 00:27:45,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [478.0, 201.0, 689.0, 1000.0, 738.0, 733.0, 785.0, 590.0, 325.0, 807.0]
2026-01-23 00:27:45,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (3297.81) for latency DatasetOffice
2026-01-23 00:27:45,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 59 minutes, 16 seconds)
2026-01-23 00:29:38,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:29:44,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 2060.39453 ± 1350.550
2026-01-23 00:29:44,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5082.726, 4099.7944, 1240.3542, 1275.4293, 1548.6022, 2123.469, 1975.6367, 1679.4393, 851.3472, 727.1454]
2026-01-23 00:29:44,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 830.0, 258.0, 256.0, 313.0, 422.0, 391.0, 342.0, 180.0, 151.0]
2026-01-23 00:29:44,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 57 minutes, 51 seconds)
2026-01-23 00:31:41,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:31:49,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 3269.16943 ± 1328.128
2026-01-23 00:31:49,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5148.4077, 4530.7886, 2139.1555, 5159.704, 3801.6204, 3792.5898, 2612.7324, 1262.8752, 2219.2808, 2024.5432]
2026-01-23 00:31:49,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 866.0, 418.0, 1000.0, 741.0, 734.0, 503.0, 240.0, 414.0, 397.0]
2026-01-23 00:31:49,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 58 minutes, 8 seconds)
2026-01-23 00:33:39,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:33:43,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1544.12012 ± 1797.023
2026-01-23 00:33:43,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5114.979, 3857.6172, 3212.1155, 170.72209, 140.43105, 2125.2703, 377.36484, 160.473, 125.921005, 156.30788]
2026-01-23 00:33:43,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 757.0, 615.0, 33.0, 27.0, 424.0, 76.0, 31.0, 27.0, 30.0]
2026-01-23 00:33:43,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 54 minutes, 43 seconds)
2026-01-23 00:35:34,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:35:44,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 3839.80420 ± 1090.096
2026-01-23 00:35:44,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [2949.2783, 5172.6987, 4067.0327, 4663.9414, 2378.1792, 1600.6687, 4422.541, 4573.932, 4003.973, 4565.7974]
2026-01-23 00:35:44,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [558.0, 1000.0, 783.0, 897.0, 448.0, 306.0, 849.0, 884.0, 778.0, 880.0]
2026-01-23 00:35:44,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (3839.80) for latency DatasetOffice
2026-01-23 00:35:44,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 52 minutes, 7 seconds)
2026-01-23 00:37:41,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:37:49,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 3085.32910 ± 1454.109
2026-01-23 00:37:49,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [2884.3545, 2941.0464, 2446.8604, 3702.8022, 4356.429, 5227.637, 1338.8689, 925.1227, 1782.5983, 5247.5728]
2026-01-23 00:37:49,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [548.0, 563.0, 478.0, 686.0, 831.0, 1000.0, 256.0, 184.0, 335.0, 1000.0]
2026-01-23 00:37:49,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 52 minutes, 46 seconds)
2026-01-23 00:39:42,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:39:48,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 2134.33447 ± 2135.399
2026-01-23 00:39:48,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5242.2725, 2459.8872, 5200.7617, 1849.6864, 72.74102, 210.98743, 793.733, 192.19157, 182.87398, 5138.209]
2026-01-23 00:39:48,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 464.0, 1000.0, 359.0, 15.0, 44.0, 149.0, 37.0, 36.0, 1000.0]
2026-01-23 00:39:48,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 50 minutes, 45 seconds)
2026-01-23 00:41:44,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:41:56,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4519.90527 ± 905.264
2026-01-23 00:41:56,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [4653.129, 3154.3372, 5120.3237, 4553.253, 5144.769, 5132.002, 5167.1035, 5141.1978, 2434.476, 4698.463]
2026-01-23 00:41:56,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [906.0, 617.0, 1000.0, 886.0, 1000.0, 1000.0, 1000.0, 1000.0, 465.0, 908.0]
2026-01-23 00:41:56,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (4519.91) for latency DatasetOffice
2026-01-23 00:41:56,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 49 minutes, 13 seconds)
2026-01-23 00:43:53,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:44:04,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4310.08740 ± 1235.306
2026-01-23 00:44:04,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5263.6914, 5262.416, 2584.5835, 5252.569, 5250.9917, 5279.284, 2251.3665, 2773.4749, 5296.8057, 3885.6953]
2026-01-23 00:44:04,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 491.0, 1000.0, 1000.0, 1000.0, 430.0, 529.0, 1000.0, 734.0]
2026-01-23 00:44:04,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 49 minutes, 43 seconds)
2026-01-23 00:45:47,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:45:54,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 2993.23535 ± 1947.573
2026-01-23 00:45:54,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5152.297, 5195.0117, 4622.85, 5269.3535, 3233.2395, 568.97437, 3445.015, 1031.2487, 975.9587, 438.40594]
2026-01-23 00:45:54,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 880.0, 1000.0, 632.0, 110.0, 654.0, 196.0, 186.0, 90.0]
2026-01-23 00:45:54,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 45 minutes, 44 seconds)
2026-01-23 00:47:55,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:48:08,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4770.52686 ± 835.810
2026-01-23 00:48:08,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5194.3726, 5155.362, 5188.0107, 5192.4976, 5141.865, 5162.509, 3654.3909, 5190.1416, 5162.4224, 2663.6995]
2026-01-23 00:48:08,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 721.0, 1000.0, 1000.0, 521.0]
2026-01-23 00:48:08,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (4770.53) for latency DatasetOffice
2026-01-23 00:48:08,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 45 minutes, 14 seconds)
2026-01-23 00:49:55,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:50:04,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 3244.22437 ± 1925.218
2026-01-23 00:50:04,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5218.8457, 1219.9702, 5191.0063, 901.00653, 1709.486, 534.2228, 5249.7754, 5252.258, 4532.5703, 2633.1035]
2026-01-23 00:50:04,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 234.0, 1000.0, 179.0, 334.0, 111.0, 1000.0, 1000.0, 889.0, 510.0]
2026-01-23 00:50:04,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 42 minutes, 36 seconds)
2026-01-23 00:52:01,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:52:09,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 2990.15942 ± 2280.142
2026-01-23 00:52:09,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [2388.4678, 144.67067, 423.4207, 603.28015, 361.04236, 5182.0425, 5198.412, 5165.0254, 5225.822, 5209.4097]
2026-01-23 00:52:09,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [453.0, 28.0, 82.0, 116.0, 68.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:52:09,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 40 minutes, 8 seconds)
2026-01-23 00:53:59,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:54:13,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 5228.88379 ± 22.813
2026-01-23 00:54:13,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5275.364, 5203.0605, 5229.602, 5252.627, 5245.031, 5203.5664, 5227.045, 5206.015, 5210.583, 5235.942]
2026-01-23 00:54:13,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 00:54:13,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (5228.88) for latency DatasetOffice
2026-01-23 00:54:13,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 37 minutes, 22 seconds)
2026-01-23 00:56:08,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:56:20,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4652.11133 ± 1006.793
2026-01-23 00:56:20,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5266.242, 5270.417, 5287.2544, 3721.0032, 3431.3757, 5243.9795, 5269.4917, 5296.8154, 5322.812, 2411.7214]
2026-01-23 00:56:20,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 715.0, 658.0, 1000.0, 1000.0, 1000.0, 1000.0, 461.0]
2026-01-23 00:56:20,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 37 minutes, 56 seconds)
2026-01-23 00:58:09,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 00:58:17,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 3064.61011 ± 2325.708
2026-01-23 00:58:17,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [440.28073, 151.01547, 234.33105, 156.20071, 4263.807, 5265.483, 5236.618, 4452.0312, 5238.8726, 5207.46]
2026-01-23 00:58:17,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [89.0, 29.0, 46.0, 30.0, 846.0, 1000.0, 1000.0, 850.0, 1000.0, 1000.0]
2026-01-23 00:58:17,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 33 minutes, 21 seconds)
2026-01-23 01:00:19,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:00:28,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 3329.00586 ± 2063.428
2026-01-23 01:00:28,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [709.39905, 688.373, 1811.1154, 5335.12, 5169.0825, 3013.9517, 646.0667, 5346.685, 5228.6636, 5341.6016]
2026-01-23 01:00:28,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [155.0, 151.0, 359.0, 1000.0, 1000.0, 583.0, 118.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:00:28,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 33 minutes, 33 seconds)
2026-01-23 01:02:13,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:02:24,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4336.60938 ± 1506.008
2026-01-23 01:02:24,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5138.9946, 5159.856, 4474.8003, 2014.4576, 5157.6133, 5127.768, 5131.6646, 790.6865, 5216.311, 5153.936]
2026-01-23 01:02:24,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 880.0, 395.0, 1000.0, 1000.0, 1000.0, 155.0, 1000.0, 1000.0]
2026-01-23 01:02:24,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 30 minutes, 12 seconds)
2026-01-23 01:04:26,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:04:40,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 5056.67676 ± 193.233
2026-01-23 01:04:40,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5157.531, 4482.044, 5089.189, 5099.3457, 5130.611, 5157.715, 5083.9155, 5105.8545, 5145.5337, 5115.0293]
2026-01-23 01:04:40,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 871.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:04:40,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 29 minutes, 54 seconds)
2026-01-23 01:06:26,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:06:35,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 3783.78638 ± 1608.902
2026-01-23 01:06:35,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5307.2114, 2623.9077, 1853.7212, 5225.5967, 1763.1882, 1444.7272, 5268.65, 3732.65, 5287.889, 5330.3228]
2026-01-23 01:06:35,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 497.0, 352.0, 1000.0, 336.0, 272.0, 1000.0, 701.0, 1000.0, 1000.0]
2026-01-23 01:06:35,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 26 minutes, 11 seconds)
2026-01-23 01:08:29,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:08:36,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 2790.10181 ± 2232.803
2026-01-23 01:08:36,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5229.12, 5282.1855, 2245.1257, 5306.9873, 389.7037, 5301.119, 3273.7454, 254.12297, 418.2306, 200.67734]
2026-01-23 01:08:36,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 434.0, 1000.0, 78.0, 1000.0, 627.0, 53.0, 85.0, 41.0]
2026-01-23 01:08:36,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 24 minutes, 34 seconds)
2026-01-23 01:10:29,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:10:41,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4485.64990 ± 1267.733
2026-01-23 01:10:41,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5274.2896, 5247.5923, 5279.587, 5193.972, 2274.52, 5326.6406, 1980.3136, 3715.1904, 5257.263, 5307.132]
2026-01-23 01:10:41,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 440.0, 1000.0, 371.0, 703.0, 1000.0, 1000.0]
2026-01-23 01:10:41,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 21 minutes, 45 seconds)
2026-01-23 01:12:36,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:12:50,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 5005.18115 ± 925.900
2026-01-23 01:12:50,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5334.4067, 5309.1147, 5333.1133, 5254.4644, 5317.98, 5350.5625, 5312.7705, 2228.759, 5337.2183, 5273.4214]
2026-01-23 01:12:50,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 421.0, 1000.0, 1000.0]
2026-01-23 01:12:50,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 21 minutes, 16 seconds)
2026-01-23 01:14:47,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:14:54,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 2585.43115 ± 2211.757
2026-01-23 01:14:54,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5228.969, 5228.0215, 5186.101, 5217.429, 1355.396, 2120.4534, 442.2759, 388.9997, 510.05136, 176.61787]
2026-01-23 01:14:54,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 260.0, 418.0, 87.0, 70.0, 94.0, 34.0]
2026-01-23 01:14:54,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 17 minutes, 47 seconds)
2026-01-23 01:16:47,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:16:59,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4506.25781 ± 1506.093
2026-01-23 01:16:59,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5246.4297, 5238.891, 652.0954, 5153.36, 5269.3013, 2592.6467, 5208.35, 5233.0845, 5228.599, 5239.8228]
2026-01-23 01:16:59,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 139.0, 1000.0, 1000.0, 502.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:16:59,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 16 minutes, 58 seconds)
2026-01-23 01:18:50,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:19:03,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4959.98926 ± 979.253
2026-01-23 01:19:03,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5264.472, 5303.233, 5285.5913, 5276.079, 2023.1001, 5335.7896, 5275.046, 5274.323, 5249.209, 5313.0513]
2026-01-23 01:19:03,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 383.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:19:03,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 15 minutes, 12 seconds)
2026-01-23 01:20:54,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:21:02,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 3092.11475 ± 2019.686
2026-01-23 01:21:02,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5296.059, 5331.798, 5367.7773, 5320.807, 2942.1504, 509.1381, 3089.5032, 186.38478, 1728.066, 1149.4642]
2026-01-23 01:21:02,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 556.0, 96.0, 586.0, 36.0, 328.0, 214.0]
2026-01-23 01:21:02,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 12 minutes, 27 seconds)
2026-01-23 01:22:50,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:23:03,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4777.11621 ± 1386.561
2026-01-23 01:23:03,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5263.6284, 617.96094, 5223.56, 5209.0854, 5222.663, 5238.608, 5243.711, 5210.6216, 5266.5635, 5274.7603]
2026-01-23 01:23:03,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 130.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:23:03,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 9 minutes, 30 seconds)
2026-01-23 01:25:05,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:25:16,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4330.11914 ± 1628.637
2026-01-23 01:25:16,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5342.7236, 5357.361, 2763.1199, 5327.777, 5310.3022, 5358.146, 2541.4868, 623.0699, 5345.222, 5331.9834]
2026-01-23 01:25:16,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 518.0, 1000.0, 1000.0, 1000.0, 481.0, 116.0, 1000.0, 1000.0]
2026-01-23 01:25:16,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 8 minutes, 26 seconds)
2026-01-23 01:27:10,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:27:21,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4275.72168 ± 1780.543
2026-01-23 01:27:21,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [730.4293, 5377.9336, 3271.4211, 5390.003, 5401.745, 1157.8184, 5383.2905, 5312.1367, 5348.354, 5384.0864]
2026-01-23 01:27:21,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [153.0, 1000.0, 605.0, 1000.0, 1000.0, 236.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:27:21,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 6 minutes, 17 seconds)
2026-01-23 01:29:10,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:29:24,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4849.43457 ± 1355.156
2026-01-23 01:29:24,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5294.1924, 5286.6206, 5317.715, 5243.054, 5326.948, 5281.4556, 784.81555, 5347.586, 5290.186, 5321.772]
2026-01-23 01:29:24,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 171.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:29:24,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 4 minutes, 7 seconds)
2026-01-23 01:31:17,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:31:27,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4151.18945 ± 1529.686
2026-01-23 01:31:27,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [1345.5524, 2041.5159, 3670.0576, 5298.483, 5349.4214, 5335.857, 5316.577, 2528.8052, 5289.267, 5336.358]
2026-01-23 01:31:27,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [272.0, 386.0, 695.0, 1000.0, 1000.0, 1000.0, 1000.0, 472.0, 1000.0, 1000.0]
2026-01-23 01:31:27,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 2 minutes, 34 seconds)
2026-01-23 01:33:19,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:33:30,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4620.21582 ± 1394.863
2026-01-23 01:33:30,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5474.612, 5542.859, 5482.3613, 5451.6978, 1462.8563, 5455.1396, 5418.5806, 2695.9854, 3762.2148, 5455.849]
2026-01-23 01:33:30,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 275.0, 1000.0, 1000.0, 506.0, 690.0, 1000.0]
2026-01-23 01:33:30,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 38 seconds)
2026-01-23 01:35:26,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:35:36,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 3695.67725 ± 1837.205
2026-01-23 01:35:36,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [940.6694, 4955.149, 5370.705, 5249.694, 1093.1874, 5111.197, 3605.794, 982.93396, 5401.452, 4245.99]
2026-01-23 01:35:36,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [198.0, 961.0, 1000.0, 1000.0, 227.0, 1000.0, 668.0, 214.0, 1000.0, 812.0]
2026-01-23 01:35:36,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 73/100 (estimated time remaining: 57 minutes, 51 seconds)
2026-01-23 01:37:25,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:37:38,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4683.49805 ± 1395.819
2026-01-23 01:37:38,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5315.72, 943.6329, 5316.267, 5358.968, 5327.587, 5341.776, 5369.3765, 5340.1333, 5293.4644, 3228.0552]
2026-01-23 01:37:38,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 181.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 606.0]
2026-01-23 01:37:38,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 74/100 (estimated time remaining: 55 minutes, 30 seconds)
2026-01-23 01:39:30,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:39:45,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 5255.48340 ± 13.962
2026-01-23 01:39:45,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5241.2163, 5253.0522, 5251.9927, 5285.5903, 5249.151, 5266.319, 5241.0225, 5251.6396, 5242.7236, 5272.13]
2026-01-23 01:39:45,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:39:45,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (5255.48) for latency DatasetOffice
2026-01-23 01:39:45,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 75/100 (estimated time remaining: 53 minutes, 50 seconds)
2026-01-23 01:41:44,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:41:57,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4677.99316 ± 794.094
2026-01-23 01:41:57,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5233.586, 5182.32, 3523.253, 5221.4062, 5215.1147, 5149.2656, 5194.784, 5177.404, 3321.6135, 3561.1877]
2026-01-23 01:41:57,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 675.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 637.0, 680.0]
2026-01-23 01:41:57,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 76/100 (estimated time remaining: 52 minutes, 28 seconds)
2026-01-23 01:43:44,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:43:51,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 2586.82690 ± 2073.625
2026-01-23 01:43:51,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5351.727, 5348.053, 5364.9946, 2855.948, 348.482, 3257.973, 181.0376, 1974.8915, 1018.35077, 166.8126]
2026-01-23 01:43:51,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 535.0, 67.0, 613.0, 35.0, 368.0, 197.0, 32.0]
2026-01-23 01:43:51,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 77/100 (estimated time remaining: 49 minutes, 37 seconds)
2026-01-23 01:45:50,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:46:01,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4182.33447 ± 1756.696
2026-01-23 01:46:01,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5366.6753, 5237.63, 1309.836, 5335.446, 5318.399, 5360.59, 5309.628, 2106.2412, 5328.067, 1150.8322]
2026-01-23 01:46:01,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 272.0, 1000.0, 1000.0, 1000.0, 1000.0, 393.0, 1000.0, 228.0]
2026-01-23 01:46:01,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 78/100 (estimated time remaining: 47 minutes, 54 seconds)
2026-01-23 01:47:47,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:48:00,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4489.08398 ± 1248.147
2026-01-23 01:48:00,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5152.242, 1796.2418, 5213.6387, 5246.709, 5226.662, 4397.0034, 5167.0645, 5158.1797, 2302.8916, 5230.2085]
2026-01-23 01:48:00,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 342.0, 1000.0, 1000.0, 1000.0, 851.0, 1000.0, 1000.0, 443.0, 1000.0]
2026-01-23 01:48:00,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 79/100 (estimated time remaining: 45 minutes, 36 seconds)
2026-01-23 01:49:50,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:49:56,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 2248.17456 ± 2430.185
2026-01-23 01:49:56,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [314.11218, 211.32115, 389.10257, 378.45517, 151.79819, 145.36528, 5235.075, 5209.331, 5180.092, 5267.094]
2026-01-23 01:49:56,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [62.0, 43.0, 85.0, 76.0, 29.0, 28.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:49:56,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 80/100 (estimated time remaining: 42 minutes, 48 seconds)
2026-01-23 01:51:55,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:52:08,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4940.64307 ± 1282.931
2026-01-23 01:52:08,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [1092.6345, 5391.643, 5359.586, 5346.0576, 5394.027, 5403.1064, 5315.8696, 5353.9688, 5356.351, 5393.1865]
2026-01-23 01:52:08,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [209.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:52:08,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 81/100 (estimated time remaining: 40 minutes, 41 seconds)
2026-01-23 01:54:03,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:54:15,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4484.87891 ± 1582.703
2026-01-23 01:54:15,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5392.428, 5412.833, 5454.131, 5398.764, 3062.6108, 568.069, 3253.8545, 5425.619, 5439.4595, 5441.0215]
2026-01-23 01:54:15,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 570.0, 108.0, 610.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:54:15,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 82/100 (estimated time remaining: 39 minutes, 31 seconds)
2026-01-23 01:56:08,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:56:14,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 2305.12622 ± 2199.775
2026-01-23 01:56:14,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [3347.3137, 185.7396, 1833.3092, 345.81226, 713.099, 156.5317, 396.2614, 5367.993, 5336.64, 5368.5615]
2026-01-23 01:56:14,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [633.0, 36.0, 346.0, 69.0, 136.0, 30.0, 76.0, 1000.0, 1000.0, 1000.0]
2026-01-23 01:56:14,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 83/100 (estimated time remaining: 36 minutes, 48 seconds)
2026-01-23 01:58:04,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 01:58:17,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4851.57324 ± 1062.731
2026-01-23 01:58:17,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5378.7544, 5354.71, 5368.9297, 5326.004, 2428.09, 5401.8564, 5434.958, 3063.863, 5400.119, 5358.4526]
2026-01-23 01:58:17,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 467.0, 1000.0, 1000.0, 581.0, 1000.0, 1000.0]
2026-01-23 01:58:17,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 84/100 (estimated time remaining: 34 minutes, 58 seconds)
2026-01-23 02:00:08,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:00:19,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 3921.52808 ± 1251.890
2026-01-23 02:00:19,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [2791.8193, 4509.594, 5086.287, 1920.5441, 2908.633, 2317.3398, 5288.384, 5178.729, 5199.306, 4014.645]
2026-01-23 02:00:19,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [535.0, 869.0, 1000.0, 406.0, 547.0, 441.0, 1000.0, 1000.0, 1000.0, 813.0]
2026-01-23 02:00:19,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 85/100 (estimated time remaining: 33 minutes, 13 seconds)
2026-01-23 02:02:13,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:02:28,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 5293.64600 ± 24.051
2026-01-23 02:02:28,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5283.8813, 5292.2637, 5345.465, 5291.4077, 5304.534, 5304.6743, 5265.6367, 5307.082, 5251.4663, 5290.051]
2026-01-23 02:02:28,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:02:28,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (5293.65) for latency DatasetOffice
2026-01-23 02:02:28,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 86/100 (estimated time remaining: 30 minutes, 59 seconds)
2026-01-23 02:04:17,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:04:31,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 5479.83301 ± 25.172
2026-01-23 02:04:31,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5510.5435, 5461.786, 5499.5566, 5436.071, 5515.027, 5460.863, 5472.9116, 5468.877, 5464.993, 5507.7017]
2026-01-23 02:04:31,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:04:31,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1274 [INFO]: New best (5479.83) for latency DatasetOffice
2026-01-23 02:04:31,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 87/100 (estimated time remaining: 28 minutes, 45 seconds)
2026-01-23 02:06:25,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:06:34,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 3212.60229 ± 2246.521
2026-01-23 02:06:34,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5269.035, 863.4047, 5278.914, 5387.123, 3221.1904, 446.8884, 722.3796, 248.76535, 5365.0005, 5323.324]
2026-01-23 02:06:34,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 158.0, 1000.0, 1000.0, 607.0, 82.0, 142.0, 51.0, 1000.0, 1000.0]
2026-01-23 02:06:34,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 88/100 (estimated time remaining: 26 minutes, 49 seconds)
2026-01-23 02:08:33,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:08:46,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4688.24316 ± 1279.429
2026-01-23 02:08:46,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5377.766, 5289.814, 2209.0159, 5307.311, 2052.4634, 5326.3677, 5348.2466, 5322.648, 5313.658, 5335.146]
2026-01-23 02:08:46,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 424.0, 1000.0, 392.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:08:46,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 89/100 (estimated time remaining: 25 minutes, 8 seconds)
2026-01-23 02:10:28,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:10:40,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4440.71045 ± 1383.115
2026-01-23 02:10:40,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5305.567, 5296.5044, 5275.6787, 5301.2104, 5299.184, 2213.1497, 5245.608, 5281.5312, 3676.0107, 1512.6611]
2026-01-23 02:10:40,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 421.0, 1000.0, 1000.0, 696.0, 299.0]
2026-01-23 02:10:40,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 90/100 (estimated time remaining: 22 minutes, 45 seconds)
2026-01-23 02:12:38,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:12:43,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 1974.89282 ± 2238.112
2026-01-23 02:12:43,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5277.9727, 5288.7866, 5231.1587, 2319.2979, 145.73055, 331.37494, 444.39694, 362.40784, 155.36566, 192.43459]
2026-01-23 02:12:43,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 438.0, 30.0, 62.0, 82.0, 76.0, 30.0, 37.0]
2026-01-23 02:12:43,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 91/100 (estimated time remaining: 20 minutes, 31 seconds)
2026-01-23 02:14:31,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:14:42,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4250.56445 ± 1721.230
2026-01-23 02:14:42,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5397.055, 5364.0303, 2214.5525, 5336.458, 1642.7435, 5388.995, 1091.5104, 5363.7803, 5360.014, 5346.5034]
2026-01-23 02:14:42,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 418.0, 1000.0, 310.0, 1000.0, 216.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:14:42,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 92/100 (estimated time remaining: 18 minutes, 19 seconds)
2026-01-23 02:16:33,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:16:46,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4802.75977 ± 1253.743
2026-01-23 02:16:46,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5180.4365, 5241.333, 5161.6675, 5230.507, 5238.829, 1042.4093, 5242.1675, 5211.799, 5230.4937, 5247.957]
2026-01-23 02:16:46,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 209.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:16:46,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 93/100 (estimated time remaining: 16 minutes, 20 seconds)
2026-01-23 02:18:33,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:18:43,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 3378.16846 ± 1832.176
2026-01-23 02:18:43,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [383.97675, 119.8756, 5027.3926, 5013.7817, 5024.768, 4213.5273, 5029.5454, 3893.3892, 3161.2124, 1914.2136]
2026-01-23 02:18:43,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [75.0, 26.0, 1000.0, 1000.0, 1000.0, 842.0, 1000.0, 769.0, 626.0, 360.0]
2026-01-23 02:18:43,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 94/100 (estimated time remaining: 13 minutes, 56 seconds)
2026-01-23 02:20:43,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:20:56,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4952.12842 ± 857.053
2026-01-23 02:20:56,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5375.714, 5350.449, 5370.87, 5337.8564, 5387.9097, 5381.239, 3782.1287, 5341.053, 5386.725, 2807.3406]
2026-01-23 02:20:56,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 716.0, 1000.0, 1000.0, 529.0]
2026-01-23 02:20:56,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 95/100 (estimated time remaining: 12 minutes, 19 seconds)
2026-01-23 02:22:42,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:22:52,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 3869.14209 ± 1884.633
2026-01-23 02:22:52,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5361.759, 5382.3203, 1587.7612, 2533.3433, 5357.837, 5370.823, 5368.831, 1725.3265, 636.2722, 5367.147]
2026-01-23 02:22:52,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 302.0, 484.0, 1000.0, 1000.0, 1000.0, 329.0, 112.0, 1000.0]
2026-01-23 02:22:52,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 96/100 (estimated time remaining: 10 minutes, 8 seconds)
2026-01-23 02:24:40,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:24:48,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 3128.17334 ± 2365.837
2026-01-23 02:24:48,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [328.74728, 279.43262, 729.54004, 186.14667, 5384.99, 5437.8384, 5282.317, 5404.4395, 5423.6343, 2824.646]
2026-01-23 02:24:48,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [62.0, 55.0, 140.0, 36.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 535.0]
2026-01-23 02:24:48,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes, 4 seconds)
2026-01-23 02:26:41,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:26:55,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 5293.23779 ± 11.539
2026-01-23 02:26:55,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5283.332, 5300.3413, 5286.6553, 5308.4756, 5305.4404, 5297.5283, 5283.195, 5272.0625, 5288.97, 5306.378]
2026-01-23 02:26:55,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:26:55,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 5 seconds)
2026-01-23 02:28:47,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:28:59,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4201.64990 ± 1551.993
2026-01-23 02:28:59,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [5231.117, 5264.5156, 3848.8398, 3454.0776, 5242.688, 5254.3, 5230.5996, 5217.1006, 3010.2507, 263.01306]
2026-01-23 02:28:59,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 728.0, 669.0, 1000.0, 1000.0, 1000.0, 1000.0, 572.0, 52.0]
2026-01-23 02:28:59,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 6 seconds)
2026-01-23 02:30:46,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:30:57,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 4544.87354 ± 1831.003
2026-01-23 02:30:57,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [561.9659, 5440.7686, 5431.682, 5475.1836, 5436.647, 5457.892, 5465.5522, 1228.5364, 5445.0835, 5505.426]
2026-01-23 02:30:57,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [105.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 243.0, 1000.0, 1000.0]
2026-01-23 02:30:57,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1247 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes)
2026-01-23 02:32:59,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1262 [DEBUG]: Evaluating for latency DatasetOffice...
2026-01-23 02:33:12,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1269 [DEBUG]: Total Reward: 5034.62695 ± 766.082
2026-01-23 02:33:12,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1270 [DEBUG]: All rewards: [2737.0525, 5269.4966, 5301.7773, 5311.9893, 5256.934, 5281.8096, 5304.645, 5277.8267, 5285.69, 5319.048]
2026-01-23 02:33:12,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1271 [DEBUG]: All trajectory lengths: [526.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2026-01-23 02:33:12,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1299 [DEBUG]: Training session finished
