2025-09-16 12:17:45,398 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.000-delay_15
2025-09-16 12:17:45,398 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.000-delay_15
2025-09-16 12:17:45,398 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'15': <latency_env.delayed_mdp.ConstantDelay object at 0x1552bd924a10>}
2025-09-16 12:17:45,398 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 12:17:45,402 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 12:17:45,421 baseline-bpql-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=631, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 12:17:45,421 baseline-bpql-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 12:17:47,154 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 12:17:47,155 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 12:19:34,818 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:19:35,735 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 329.42343 ± 29.176
2025-09-16 12:19:35,735 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [333.96317, 338.38312, 291.72794, 281.62137, 352.28622, 346.9832, 348.31552, 292.00653, 375.24054, 333.70673]
2025-09-16 12:19:35,735 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [63.0, 64.0, 55.0, 53.0, 67.0, 66.0, 66.0, 55.0, 71.0, 63.0]
2025-09-16 12:19:35,735 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (329.42) for latency 15
2025-09-16 12:19:35,742 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 59 minutes, 10 seconds)
2025-09-16 12:21:32,356 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:21:33,613 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 473.41510 ± 84.178
2025-09-16 12:21:33,613 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [466.17926, 474.0096, 600.67755, 481.7272, 362.59204, 611.11786, 548.09296, 396.74792, 394.0533, 398.95334]
2025-09-16 12:21:33,613 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 90.0, 125.0, 91.0, 69.0, 119.0, 106.0, 77.0, 76.0, 76.0]
2025-09-16 12:21:33,613 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (473.42) for latency 15
2025-09-16 12:21:33,620 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 4 minutes, 56 seconds)
2025-09-16 12:23:28,378 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:23:29,412 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 376.55255 ± 31.355
2025-09-16 12:23:29,412 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [434.74524, 359.82047, 387.21027, 369.24277, 368.87137, 339.08978, 425.61658, 373.4288, 330.528, 376.9719]
2025-09-16 12:23:29,413 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 70.0, 79.0, 73.0, 72.0, 69.0, 83.0, 75.0, 64.0, 74.0]
2025-09-16 12:23:29,418 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 4 minutes, 26 seconds)
2025-09-16 12:25:25,550 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:25:26,575 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 370.34930 ± 22.646
2025-09-16 12:25:26,576 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [401.15076, 385.32526, 357.32538, 364.60535, 361.9989, 375.0697, 341.7313, 398.25815, 328.96487, 389.06357]
2025-09-16 12:25:26,576 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 82.0, 74.0, 74.0, 74.0, 75.0, 67.0, 79.0, 64.0, 78.0]
2025-09-16 12:25:26,579 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 3 minutes, 46 seconds)
2025-09-16 12:27:23,041 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:27:23,996 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 344.56580 ± 9.098
2025-09-16 12:27:23,996 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [358.80518, 343.7565, 334.56528, 339.23303, 351.0983, 346.8192, 342.0152, 326.39667, 348.6702, 354.29837]
2025-09-16 12:27:23,996 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 72.0, 68.0, 72.0, 73.0, 73.0, 70.0, 67.0, 72.0, 74.0]
2025-09-16 12:27:23,999 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 2 minutes, 40 seconds)
2025-09-16 12:29:20,258 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:29:21,641 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 524.16144 ± 76.467
2025-09-16 12:29:21,641 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [697.5428, 465.15198, 559.32007, 473.00073, 490.12885, 499.02002, 547.4734, 396.6124, 543.3136, 570.05066]
2025-09-16 12:29:21,641 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [133.0, 90.0, 104.0, 95.0, 93.0, 98.0, 104.0, 74.0, 101.0, 108.0]
2025-09-16 12:29:21,641 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (524.16) for latency 15
2025-09-16 12:29:21,650 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 3 minutes, 35 seconds)
2025-09-16 12:31:18,308 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:31:19,691 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 524.05481 ± 115.397
2025-09-16 12:31:19,691 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [379.3998, 497.47516, 617.6582, 607.3258, 434.0631, 331.67517, 737.04364, 499.29172, 588.73804, 547.87726]
2025-09-16 12:31:19,691 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [72.0, 93.0, 118.0, 114.0, 81.0, 68.0, 143.0, 96.0, 111.0, 103.0]
2025-09-16 12:31:19,693 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 1 minute, 40 seconds)
2025-09-16 12:33:15,473 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:33:17,105 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 591.19507 ± 228.883
2025-09-16 12:33:17,105 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [552.56165, 527.84106, 394.5423, 576.5456, 454.82626, 465.8455, 1226.0347, 648.6717, 406.12024, 658.9621]
2025-09-16 12:33:17,105 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [113.0, 99.0, 76.0, 116.0, 95.0, 98.0, 242.0, 126.0, 77.0, 134.0]
2025-09-16 12:33:17,105 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (591.20) for latency 15
2025-09-16 12:33:17,124 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 13 seconds)
2025-09-16 12:35:14,975 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:35:16,366 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 503.69415 ± 110.258
2025-09-16 12:35:16,366 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [494.02057, 384.20215, 447.4637, 717.1507, 497.3267, 454.83328, 455.22522, 394.84256, 484.5237, 707.3523]
2025-09-16 12:35:16,366 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 80.0, 92.0, 137.0, 104.0, 87.0, 97.0, 84.0, 101.0, 137.0]
2025-09-16 12:35:16,382 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 58 minutes, 54 seconds)
2025-09-16 12:37:12,146 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:37:13,556 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 511.44189 ± 77.274
2025-09-16 12:37:13,556 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [592.3872, 500.6737, 352.86066, 519.3656, 485.7733, 459.30402, 520.85706, 465.04938, 569.377, 648.77106]
2025-09-16 12:37:13,556 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [113.0, 97.0, 67.0, 97.0, 95.0, 87.0, 105.0, 99.0, 110.0, 141.0]
2025-09-16 12:37:13,576 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 56 minutes, 52 seconds)
2025-09-16 12:39:11,088 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:39:12,517 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 536.65179 ± 137.053
2025-09-16 12:39:12,517 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [408.73587, 680.0949, 401.84268, 401.55487, 507.28482, 569.7327, 857.9992, 570.7094, 448.18973, 520.3735]
2025-09-16 12:39:12,517 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 132.0, 76.0, 76.0, 112.0, 109.0, 165.0, 108.0, 84.0, 99.0]
2025-09-16 12:39:12,520 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 55 minutes, 17 seconds)
2025-09-16 12:41:08,628 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:41:10,132 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 566.23621 ± 130.148
2025-09-16 12:41:10,132 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [547.4262, 674.0186, 435.5028, 697.67303, 858.53284, 496.80716, 450.3855, 484.82428, 453.3856, 563.80554]
2025-09-16 12:41:10,132 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 125.0, 82.0, 134.0, 162.0, 94.0, 84.0, 91.0, 84.0, 121.0]
2025-09-16 12:41:10,137 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 53 minutes, 11 seconds)
2025-09-16 12:43:09,238 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:43:10,808 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 555.93372 ± 143.666
2025-09-16 12:43:10,808 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [699.60266, 630.9691, 553.0368, 384.668, 403.12222, 862.50793, 504.44513, 406.9686, 622.60205, 491.41507]
2025-09-16 12:43:10,808 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [148.0, 122.0, 103.0, 78.0, 77.0, 180.0, 108.0, 87.0, 133.0, 104.0]
2025-09-16 12:43:10,819 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 52 minutes, 10 seconds)
2025-09-16 12:45:08,848 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:45:10,485 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 620.88037 ± 123.493
2025-09-16 12:45:10,485 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [526.832, 631.739, 432.363, 777.3865, 678.05176, 417.30963, 795.8682, 688.4561, 671.8188, 588.97906]
2025-09-16 12:45:10,485 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [102.0, 131.0, 81.0, 144.0, 126.0, 78.0, 149.0, 131.0, 127.0, 115.0]
2025-09-16 12:45:10,485 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (620.88) for latency 15
2025-09-16 12:45:10,491 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 50 minutes, 18 seconds)
2025-09-16 12:47:07,585 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:47:09,681 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 731.33221 ± 144.651
2025-09-16 12:47:09,681 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [951.5778, 637.11914, 644.72534, 1029.3503, 637.38513, 573.4355, 628.2823, 822.26117, 704.4693, 684.71606]
2025-09-16 12:47:09,681 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [197.0, 122.0, 124.0, 217.0, 136.0, 123.0, 135.0, 157.0, 145.0, 143.0]
2025-09-16 12:47:09,681 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (731.33) for latency 15
2025-09-16 12:47:09,684 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 48 minutes, 53 seconds)
2025-09-16 12:49:07,914 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:49:09,572 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 588.33673 ± 106.540
2025-09-16 12:49:09,573 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [447.26645, 518.86316, 442.0769, 593.3158, 698.99036, 558.41516, 563.8071, 813.45575, 608.4371, 638.73944]
2025-09-16 12:49:09,573 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 96.0, 95.0, 125.0, 137.0, 116.0, 118.0, 170.0, 114.0, 123.0]
2025-09-16 12:49:09,576 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 47 minutes, 10 seconds)
2025-09-16 12:51:08,584 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:51:10,434 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 656.90985 ± 143.283
2025-09-16 12:51:10,435 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [599.8095, 552.6349, 607.6379, 621.712, 454.11948, 909.58685, 498.1933, 673.85767, 822.3639, 829.1832]
2025-09-16 12:51:10,435 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [115.0, 106.0, 129.0, 132.0, 99.0, 178.0, 106.0, 145.0, 157.0, 160.0]
2025-09-16 12:51:10,437 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 46 minutes, 4 seconds)
2025-09-16 12:53:08,164 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:53:09,695 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 564.16895 ± 101.945
2025-09-16 12:53:09,695 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [527.9884, 459.41663, 540.3914, 493.44244, 579.7424, 738.3528, 770.9626, 547.82587, 460.2735, 523.2936]
2025-09-16 12:53:09,695 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 86.0, 117.0, 90.0, 109.0, 155.0, 146.0, 104.0, 86.0, 95.0]
2025-09-16 12:53:09,706 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 43 minutes, 41 seconds)
2025-09-16 12:55:08,856 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:55:10,595 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 606.80017 ± 171.489
2025-09-16 12:55:10,595 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [641.203, 714.37665, 1037.4607, 470.6561, 422.21878, 488.78217, 644.5253, 484.1882, 658.6762, 505.9145]
2025-09-16 12:55:10,595 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [137.0, 140.0, 208.0, 88.0, 92.0, 93.0, 118.0, 106.0, 124.0, 109.0]
2025-09-16 12:55:10,599 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 42 minutes, 1 second)
2025-09-16 12:57:09,460 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:57:11,328 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 636.54504 ± 196.351
2025-09-16 12:57:11,329 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [616.1739, 761.7751, 484.27744, 515.0749, 1031.6322, 716.12976, 452.7503, 516.77386, 389.52206, 881.3406]
2025-09-16 12:57:11,329 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [130.0, 155.0, 100.0, 109.0, 202.0, 148.0, 96.0, 110.0, 81.0, 180.0]
2025-09-16 12:57:11,351 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 40 minutes, 26 seconds)
2025-09-16 12:59:08,552 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:59:10,506 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 679.12921 ± 193.718
2025-09-16 12:59:10,506 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [604.4482, 551.31885, 565.95605, 544.4982, 502.52908, 478.29257, 972.45996, 745.3852, 753.69226, 1072.7109]
2025-09-16 12:59:10,506 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [118.0, 119.0, 104.0, 113.0, 108.0, 103.0, 183.0, 159.0, 141.0, 215.0]
2025-09-16 12:59:10,529 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 38 minutes, 15 seconds)
2025-09-16 13:01:09,158 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:01:10,986 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 667.55505 ± 106.409
2025-09-16 13:01:10,986 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [656.08453, 683.2673, 819.09845, 667.5706, 486.30225, 573.13104, 546.6249, 719.442, 681.5296, 842.49927]
2025-09-16 13:01:10,986 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [122.0, 133.0, 152.0, 143.0, 91.0, 109.0, 101.0, 140.0, 146.0, 165.0]
2025-09-16 13:01:11,012 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 36 minutes, 8 seconds)
2025-09-16 13:03:09,610 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:03:11,604 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 688.63928 ± 151.195
2025-09-16 13:03:11,604 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [762.9313, 453.73193, 492.12286, 908.14526, 853.4699, 572.5583, 624.8604, 621.6682, 736.06726, 860.83716]
2025-09-16 13:03:11,604 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [150.0, 99.0, 107.0, 173.0, 167.0, 123.0, 135.0, 131.0, 138.0, 184.0]
2025-09-16 13:03:11,629 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 34 minutes, 29 seconds)
2025-09-16 13:05:10,853 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:05:12,872 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 716.83484 ± 168.870
2025-09-16 13:05:12,872 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [781.79596, 421.74832, 776.10114, 649.19464, 738.61206, 524.3681, 1061.8252, 652.7521, 872.9481, 689.0026]
2025-09-16 13:05:12,872 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [143.0, 90.0, 143.0, 133.0, 148.0, 112.0, 218.0, 122.0, 172.0, 135.0]
2025-09-16 13:05:12,876 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 32 minutes, 34 seconds)
2025-09-16 13:07:12,052 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:07:14,228 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 802.19293 ± 197.278
2025-09-16 13:07:14,228 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1042.7767, 797.48505, 1234.9377, 605.06616, 658.7841, 636.4107, 898.6918, 711.55585, 605.3277, 830.89355]
2025-09-16 13:07:14,228 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [210.0, 147.0, 239.0, 111.0, 139.0, 122.0, 176.0, 129.0, 112.0, 167.0]
2025-09-16 13:07:14,228 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (802.19) for latency 15
2025-09-16 13:07:14,243 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 30 minutes, 43 seconds)
2025-09-16 13:09:12,266 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:09:14,284 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 729.10400 ± 132.362
2025-09-16 13:09:14,284 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1055.0903, 825.7067, 654.04266, 697.26825, 738.83704, 534.0315, 775.4484, 677.33655, 699.5795, 633.69934]
2025-09-16 13:09:14,284 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [200.0, 175.0, 120.0, 131.0, 153.0, 118.0, 141.0, 130.0, 146.0, 115.0]
2025-09-16 13:09:14,288 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 28 minutes, 55 seconds)
2025-09-16 13:11:13,991 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:11:16,063 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 765.79065 ± 131.423
2025-09-16 13:11:16,063 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [784.3834, 718.1755, 868.5639, 1000.50006, 546.28644, 753.64667, 772.637, 919.0668, 599.80475, 694.842]
2025-09-16 13:11:16,063 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [146.0, 130.0, 177.0, 196.0, 100.0, 137.0, 146.0, 167.0, 130.0, 131.0]
2025-09-16 13:11:16,067 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 27 minutes, 13 seconds)
2025-09-16 13:13:13,329 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:13:15,323 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 709.81848 ± 191.805
2025-09-16 13:13:15,323 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [580.9195, 602.798, 652.30676, 1063.3257, 584.54517, 578.7295, 560.9606, 1095.2975, 630.0575, 749.2444]
2025-09-16 13:13:15,323 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [110.0, 112.0, 122.0, 226.0, 121.0, 112.0, 102.0, 209.0, 118.0, 139.0]
2025-09-16 13:13:15,333 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 24 minutes, 53 seconds)
2025-09-16 13:15:14,496 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:15:16,614 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 736.32715 ± 197.918
2025-09-16 13:15:16,614 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [576.4584, 589.13074, 812.0027, 714.02045, 662.3285, 632.4851, 714.2118, 927.97174, 1221.2147, 513.44745]
2025-09-16 13:15:16,614 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [123.0, 127.0, 156.0, 137.0, 126.0, 120.0, 146.0, 194.0, 247.0, 99.0]
2025-09-16 13:15:16,621 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 22 minutes, 53 seconds)
2025-09-16 13:17:15,013 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:17:17,611 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 906.75842 ± 195.922
2025-09-16 13:17:17,611 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [889.7238, 945.01013, 1276.8556, 865.01227, 615.50006, 621.902, 900.3598, 1183.0925, 879.2585, 890.8701]
2025-09-16 13:17:17,611 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [177.0, 181.0, 267.0, 175.0, 131.0, 118.0, 191.0, 240.0, 168.0, 172.0]
2025-09-16 13:17:17,611 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (906.76) for latency 15
2025-09-16 13:17:17,617 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 20 minutes, 47 seconds)
2025-09-16 13:19:16,606 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:19:19,220 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 908.42737 ± 210.749
2025-09-16 13:19:19,220 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1043.4121, 1071.4519, 670.8712, 1039.0798, 646.8072, 711.70087, 867.84155, 1068.0555, 682.0303, 1283.023]
2025-09-16 13:19:19,220 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [207.0, 217.0, 131.0, 201.0, 128.0, 148.0, 178.0, 211.0, 147.0, 263.0]
2025-09-16 13:19:19,220 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (908.43) for latency 15
2025-09-16 13:19:19,259 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 19 minutes, 8 seconds)
2025-09-16 13:21:18,760 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:21:20,742 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 719.86389 ± 123.212
2025-09-16 13:21:20,742 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [795.8116, 701.77637, 656.13696, 670.1269, 804.4595, 521.0709, 627.2719, 999.68964, 658.8078, 763.4871]
2025-09-16 13:21:20,742 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [148.0, 142.0, 122.0, 124.0, 158.0, 102.0, 120.0, 208.0, 126.0, 156.0]
2025-09-16 13:21:20,746 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 17 minutes, 3 seconds)
2025-09-16 13:23:18,305 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:23:21,256 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1007.46692 ± 340.134
2025-09-16 13:23:21,256 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [489.82858, 1178.1729, 1474.0809, 1167.12, 863.84644, 898.72675, 1408.7482, 476.65594, 797.87646, 1319.6127]
2025-09-16 13:23:21,256 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [111.0, 226.0, 298.0, 240.0, 168.0, 171.0, 287.0, 105.0, 154.0, 269.0]
2025-09-16 13:23:21,256 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (1007.47) for latency 15
2025-09-16 13:23:21,260 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 15 minutes, 19 seconds)
2025-09-16 13:25:19,799 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:25:21,802 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 708.60785 ± 204.344
2025-09-16 13:25:21,802 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [638.7726, 528.2863, 636.14124, 486.7015, 643.5799, 692.97205, 1189.2443, 667.0093, 613.0885, 990.2832]
2025-09-16 13:25:21,802 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [136.0, 112.0, 131.0, 101.0, 120.0, 131.0, 236.0, 144.0, 116.0, 192.0]
2025-09-16 13:25:21,809 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 13 minutes, 8 seconds)
2025-09-16 13:27:21,699 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:27:24,660 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1054.39526 ± 268.943
2025-09-16 13:27:24,660 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [978.90955, 1487.565, 966.1612, 1636.771, 850.5352, 967.0169, 934.83417, 1086.7358, 874.5065, 760.917]
2025-09-16 13:27:24,660 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [186.0, 318.0, 179.0, 316.0, 163.0, 179.0, 182.0, 216.0, 164.0, 157.0]
2025-09-16 13:27:24,660 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (1054.40) for latency 15
2025-09-16 13:27:24,667 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 11 minutes, 31 seconds)
2025-09-16 13:29:24,044 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:29:26,558 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 922.04266 ± 192.581
2025-09-16 13:29:26,558 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [713.88715, 830.0384, 901.2229, 982.71277, 1069.8947, 775.0892, 1040.1678, 657.0144, 1354.0785, 896.32117]
2025-09-16 13:29:26,558 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [130.0, 153.0, 167.0, 188.0, 210.0, 147.0, 202.0, 121.0, 271.0, 166.0]
2025-09-16 13:29:26,563 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 9 minutes, 33 seconds)
2025-09-16 13:31:26,062 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:31:28,609 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 913.12708 ± 136.913
2025-09-16 13:31:28,609 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [793.3854, 902.1674, 975.9121, 1171.2332, 858.05743, 702.5661, 934.02997, 1119.3143, 819.77325, 854.8315]
2025-09-16 13:31:28,609 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [158.0, 178.0, 201.0, 245.0, 161.0, 135.0, 195.0, 213.0, 159.0, 166.0]
2025-09-16 13:31:28,629 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 7 minutes, 39 seconds)
2025-09-16 13:33:27,429 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:33:30,321 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1005.03107 ± 270.777
2025-09-16 13:33:30,321 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1355.681, 1076.8389, 828.9455, 927.2904, 717.0134, 763.93506, 1314.3966, 801.3035, 1491.1144, 773.7918]
2025-09-16 13:33:30,321 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [270.0, 231.0, 177.0, 180.0, 149.0, 164.0, 254.0, 164.0, 298.0, 143.0]
2025-09-16 13:33:30,342 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 5 minutes, 52 seconds)
2025-09-16 13:35:29,206 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:35:32,080 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1034.17529 ± 331.874
2025-09-16 13:35:32,081 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [854.45874, 1415.7837, 912.37256, 805.5328, 888.3417, 1099.0852, 535.00775, 1064.3953, 972.95874, 1793.8167]
2025-09-16 13:35:32,081 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [164.0, 281.0, 169.0, 171.0, 170.0, 201.0, 115.0, 199.0, 207.0, 352.0]
2025-09-16 13:35:32,085 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 4 minutes, 5 seconds)
2025-09-16 13:37:31,147 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:37:34,231 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1078.81226 ± 337.915
2025-09-16 13:37:34,231 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [792.55615, 1113.3645, 1011.4808, 835.33295, 1130.6564, 1920.7919, 947.821, 1409.6576, 724.89014, 901.57135]
2025-09-16 13:37:34,231 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [166.0, 223.0, 203.0, 169.0, 242.0, 380.0, 187.0, 273.0, 154.0, 172.0]
2025-09-16 13:37:34,231 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (1078.81) for latency 15
2025-09-16 13:37:34,250 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 1 minute, 55 seconds)
2025-09-16 13:39:32,141 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:39:35,462 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1148.42273 ± 361.809
2025-09-16 13:39:35,462 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1125.3424, 1283.9803, 1172.2604, 1230.829, 762.2205, 1209.4213, 878.103, 928.9319, 2091.1182, 802.0202]
2025-09-16 13:39:35,462 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [216.0, 246.0, 232.0, 246.0, 156.0, 254.0, 163.0, 176.0, 432.0, 176.0]
2025-09-16 13:39:35,462 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (1148.42) for latency 15
2025-09-16 13:39:35,474 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 59 minutes, 45 seconds)
2025-09-16 13:41:35,876 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:41:39,137 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1129.43872 ± 364.396
2025-09-16 13:41:39,137 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1106.6381, 693.8787, 1039.5529, 1242.5973, 626.0002, 1485.0682, 1774.76, 1155.4703, 687.85767, 1482.5637]
2025-09-16 13:41:39,137 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [238.0, 149.0, 207.0, 242.0, 120.0, 316.0, 346.0, 223.0, 139.0, 283.0]
2025-09-16 13:41:39,145 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 58 minutes, 1 second)
2025-09-16 13:43:39,018 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:43:41,699 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 956.37909 ± 344.681
2025-09-16 13:43:41,699 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [984.16486, 879.2269, 872.3846, 1131.59, 782.2158, 506.58133, 840.93, 816.5545, 861.53735, 1888.6055]
2025-09-16 13:43:41,699 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [187.0, 185.0, 160.0, 238.0, 161.0, 100.0, 153.0, 180.0, 183.0, 356.0]
2025-09-16 13:43:41,729 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 56 minutes, 9 seconds)
2025-09-16 13:45:40,354 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:45:43,960 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1295.57837 ± 423.717
2025-09-16 13:45:43,960 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [2183.6814, 1517.894, 1615.1946, 902.37976, 981.45886, 1204.1931, 1492.5188, 1467.6526, 906.50397, 684.30707]
2025-09-16 13:45:43,960 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [423.0, 286.0, 311.0, 171.0, 192.0, 245.0, 289.0, 287.0, 186.0, 134.0]
2025-09-16 13:45:43,960 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (1295.58) for latency 15
2025-09-16 13:45:43,972 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 54 minutes, 13 seconds)
2025-09-16 13:47:43,458 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:47:46,284 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 984.11377 ± 336.374
2025-09-16 13:47:46,284 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [463.50748, 948.85565, 881.7848, 1655.8759, 577.52277, 1343.0532, 979.37195, 786.6848, 967.32837, 1237.1527]
2025-09-16 13:47:46,284 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [95.0, 203.0, 171.0, 335.0, 114.0, 268.0, 194.0, 166.0, 203.0, 257.0]
2025-09-16 13:47:46,294 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 52 minutes, 12 seconds)
2025-09-16 13:49:45,290 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:49:48,058 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1014.16931 ± 253.030
2025-09-16 13:49:48,058 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1038.5282, 1499.7319, 669.40137, 806.6235, 753.8665, 1205.2019, 1093.4125, 970.41376, 800.672, 1303.8413]
2025-09-16 13:49:48,058 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [197.0, 286.0, 146.0, 150.0, 164.0, 228.0, 228.0, 175.0, 143.0, 249.0]
2025-09-16 13:49:48,066 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 50 minutes, 15 seconds)
2025-09-16 13:51:47,572 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:51:51,531 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1378.84692 ± 350.772
2025-09-16 13:51:51,531 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1559.541, 1120.3801, 996.0542, 1321.8226, 1282.7131, 1261.5009, 1743.1837, 1393.2256, 937.3134, 2172.7349]
2025-09-16 13:51:51,531 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [328.0, 242.0, 186.0, 273.0, 243.0, 257.0, 369.0, 261.0, 177.0, 409.0]
2025-09-16 13:51:51,531 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (1378.85) for latency 15
2025-09-16 13:51:51,536 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 48 minutes, 11 seconds)
2025-09-16 13:53:50,714 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:53:53,982 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1164.74927 ± 453.616
2025-09-16 13:53:53,982 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [940.0056, 1588.7693, 622.8336, 2141.8623, 979.9892, 901.0569, 842.18585, 1692.1663, 1060.6195, 878.00476]
2025-09-16 13:53:53,982 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [196.0, 289.0, 133.0, 431.0, 209.0, 170.0, 172.0, 312.0, 204.0, 166.0]
2025-09-16 13:53:53,988 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 46 minutes, 7 seconds)
2025-09-16 13:55:53,204 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:55:56,706 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1251.63208 ± 342.427
2025-09-16 13:55:56,706 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1438.4915, 971.8726, 1642.4298, 933.2647, 1696.1246, 633.7886, 1022.76855, 1146.5105, 1623.41, 1407.6602]
2025-09-16 13:55:56,706 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [270.0, 181.0, 323.0, 186.0, 345.0, 129.0, 205.0, 221.0, 317.0, 285.0]
2025-09-16 13:55:56,711 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 44 minutes, 9 seconds)
2025-09-16 13:57:58,972 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:58:02,972 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1439.33301 ± 371.832
2025-09-16 13:58:02,972 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1310.8175, 967.7675, 1473.5327, 1572.3398, 1570.8741, 1915.3739, 1982.6792, 1724.5292, 961.22375, 914.19214]
2025-09-16 13:58:02,973 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [246.0, 177.0, 298.0, 316.0, 301.0, 386.0, 363.0, 323.0, 186.0, 175.0]
2025-09-16 13:58:02,973 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (1439.33) for latency 15
2025-09-16 13:58:02,978 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 42 minutes, 46 seconds)
2025-09-16 14:00:00,302 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:00:04,100 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1359.59827 ± 479.473
2025-09-16 14:00:04,100 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1313.2844, 1212.4174, 824.16986, 1125.9586, 2096.2368, 1197.8533, 1636.6504, 2302.0237, 779.08514, 1108.3033]
2025-09-16 14:00:04,100 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [267.0, 224.0, 167.0, 215.0, 400.0, 225.0, 304.0, 456.0, 164.0, 213.0]
2025-09-16 14:00:04,105 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 40 minutes, 37 seconds)
2025-09-16 14:02:03,302 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:02:07,834 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1639.85571 ± 462.862
2025-09-16 14:02:07,834 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1764.5663, 2046.5139, 1316.5167, 779.2231, 2359.461, 1575.6586, 1152.5435, 2207.5012, 1723.0764, 1473.4961]
2025-09-16 14:02:07,834 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [326.0, 395.0, 257.0, 164.0, 439.0, 307.0, 230.0, 409.0, 319.0, 303.0]
2025-09-16 14:02:07,834 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (1639.86) for latency 15
2025-09-16 14:02:07,843 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 38 minutes, 36 seconds)
2025-09-16 14:04:11,492 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:04:16,250 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1678.84436 ± 754.031
2025-09-16 14:04:16,250 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1854.2863, 1294.166, 1795.5369, 872.8974, 1122.3813, 828.52325, 1885.2257, 2228.1055, 3518.094, 1389.2274]
2025-09-16 14:04:16,250 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [353.0, 244.0, 354.0, 182.0, 211.0, 154.0, 374.0, 453.0, 706.0, 257.0]
2025-09-16 14:04:16,250 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (1678.84) for latency 15
2025-09-16 14:04:16,283 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 37 minutes, 29 seconds)
2025-09-16 14:06:13,408 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:06:16,438 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1090.45142 ± 522.555
2025-09-16 14:06:16,438 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [624.42896, 1351.4491, 1490.5485, 692.5207, 678.46216, 721.58026, 1995.989, 791.69775, 1927.3088, 630.5285]
2025-09-16 14:06:16,438 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [119.0, 243.0, 293.0, 140.0, 130.0, 135.0, 381.0, 145.0, 397.0, 122.0]
2025-09-16 14:06:16,449 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 35 minutes, 1 second)
2025-09-16 14:08:16,522 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:08:24,643 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2910.01709 ± 1382.815
2025-09-16 14:08:24,643 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1974.7893, 1721.3696, 2977.3088, 3473.0408, 930.59863, 2133.7817, 4088.6394, 1723.9803, 5299.8545, 4776.808]
2025-09-16 14:08:24,643 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [382.0, 334.0, 575.0, 671.0, 170.0, 413.0, 739.0, 313.0, 1000.0, 876.0]
2025-09-16 14:08:24,643 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (2910.02) for latency 15
2025-09-16 14:08:24,650 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 33 minutes, 15 seconds)
2025-09-16 14:10:23,621 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:10:29,711 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2222.20068 ± 649.901
2025-09-16 14:10:29,711 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [2903.561, 1392.854, 2010.6597, 2642.0881, 2587.391, 1707.904, 955.1089, 2553.3955, 2408.5144, 3060.5312]
2025-09-16 14:10:29,711 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [544.0, 288.0, 382.0, 490.0, 472.0, 314.0, 185.0, 482.0, 466.0, 583.0]
2025-09-16 14:10:29,718 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 31 minutes, 45 seconds)
2025-09-16 14:12:29,137 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:12:38,083 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3240.58521 ± 1435.179
2025-09-16 14:12:38,084 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [2494.6545, 1180.9532, 5214.385, 3601.9866, 3212.917, 2492.1624, 1764.6558, 4994.9746, 2068.27, 5380.8936]
2025-09-16 14:12:38,084 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [466.0, 213.0, 1000.0, 682.0, 603.0, 447.0, 321.0, 946.0, 362.0, 1000.0]
2025-09-16 14:12:38,084 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (3240.59) for latency 15
2025-09-16 14:12:38,097 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 30 minutes, 20 seconds)
2025-09-16 14:14:37,407 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:14:44,610 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2571.06836 ± 1460.552
2025-09-16 14:14:44,611 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [2555.5684, 757.3589, 5154.5166, 2276.2385, 1273.0846, 2619.434, 4807.8975, 828.3527, 1850.259, 3587.9736]
2025-09-16 14:14:44,611 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [466.0, 144.0, 987.0, 427.0, 259.0, 486.0, 900.0, 160.0, 357.0, 680.0]
2025-09-16 14:14:44,617 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 27 minutes, 57 seconds)
2025-09-16 14:16:44,591 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:16:53,826 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3277.26758 ± 1592.485
2025-09-16 14:16:53,827 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [2546.599, 5217.0474, 1415.5332, 5069.5137, 1268.9763, 5174.1523, 2894.2139, 2563.2275, 5057.401, 1566.0125]
2025-09-16 14:16:53,827 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [487.0, 1000.0, 264.0, 955.0, 230.0, 1000.0, 563.0, 483.0, 932.0, 280.0]
2025-09-16 14:16:53,827 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (3277.27) for latency 15
2025-09-16 14:16:53,838 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 27 minutes, 6 seconds)
2025-09-16 14:18:54,024 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:19:02,501 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2971.37183 ± 1705.397
2025-09-16 14:19:02,501 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [2461.8445, 5210.291, 1776.8629, 4183.176, 1277.7036, 916.80945, 2611.5884, 949.66974, 5117.395, 5208.3794]
2025-09-16 14:19:02,501 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [461.0, 1000.0, 338.0, 805.0, 240.0, 177.0, 488.0, 183.0, 974.0, 1000.0]
2025-09-16 14:19:02,509 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 25 minutes, 2 seconds)
2025-09-16 14:21:07,828 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:21:17,128 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3273.63623 ± 1430.195
2025-09-16 14:21:17,128 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5298.235, 1655.2777, 3835.8145, 1876.1934, 1965.2959, 3446.4993, 5128.7456, 4834.698, 1315.0201, 3380.5845]
2025-09-16 14:21:17,128 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 325.0, 710.0, 357.0, 352.0, 665.0, 1000.0, 913.0, 248.0, 630.0]
2025-09-16 14:21:17,136 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 24 minutes, 9 seconds)
2025-09-16 14:23:16,091 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:23:21,770 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1996.37402 ± 1395.999
2025-09-16 14:23:21,770 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1638.3384, 1963.3364, 1558.3438, 1131.5177, 3853.5908, 5312.183, 1892.4572, 977.6317, 1092.1987, 544.1419]
2025-09-16 14:23:21,770 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [303.0, 373.0, 288.0, 216.0, 741.0, 998.0, 353.0, 173.0, 201.0, 111.0]
2025-09-16 14:23:21,776 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 21 minutes, 31 seconds)
2025-09-16 14:25:20,064 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:25:30,126 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3596.29248 ± 1751.381
2025-09-16 14:25:30,126 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [2389.23, 1689.5148, 4830.044, 5382.3496, 3080.2231, 5300.8687, 1141.5448, 1335.1208, 5393.0195, 5421.01]
2025-09-16 14:25:30,126 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [473.0, 325.0, 901.0, 1000.0, 617.0, 1000.0, 211.0, 247.0, 1000.0, 1000.0]
2025-09-16 14:25:30,126 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (3596.29) for latency 15
2025-09-16 14:25:30,134 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 19 minutes, 36 seconds)
2025-09-16 14:27:31,095 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:27:40,517 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3337.88354 ± 1723.899
2025-09-16 14:27:40,518 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5352.053, 1875.4713, 3456.8674, 1701.1008, 5405.8623, 5292.542, 1190.2849, 1582.0559, 5290.7847, 2231.814]
2025-09-16 14:27:40,518 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 359.0, 667.0, 313.0, 1000.0, 1000.0, 253.0, 291.0, 1000.0, 411.0]
2025-09-16 14:27:40,526 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 17 minutes, 36 seconds)
2025-09-16 14:29:47,645 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:29:59,088 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3941.40674 ± 1516.030
2025-09-16 14:29:59,088 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1170.4491, 5192.7544, 1707.9594, 5345.556, 3272.2505, 5141.3477, 5424.917, 5284.124, 3001.4658, 3873.244]
2025-09-16 14:29:59,088 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [221.0, 1000.0, 346.0, 1000.0, 678.0, 955.0, 1000.0, 1000.0, 594.0, 713.0]
2025-09-16 14:29:59,088 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (3941.41) for latency 15
2025-09-16 14:29:59,096 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 16 minutes, 36 seconds)
2025-09-16 14:31:58,375 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:32:09,496 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3868.66553 ± 1116.718
2025-09-16 14:32:09,497 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5261.3677, 4159.1055, 3277.8232, 2924.6868, 4378.947, 2455.5146, 2457.4136, 5352.732, 5332.6025, 3086.4644]
2025-09-16 14:32:09,497 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 795.0, 643.0, 563.0, 823.0, 491.0, 449.0, 1000.0, 1000.0, 609.0]
2025-09-16 14:32:09,505 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 13 minutes, 56 seconds)
2025-09-16 14:34:14,967 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:34:24,324 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3342.20239 ± 1473.491
2025-09-16 14:34:24,325 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [2277.523, 5301.396, 1241.615, 5303.587, 3166.6648, 4837.4443, 1141.3031, 3123.3894, 4309.0557, 2720.0483]
2025-09-16 14:34:24,325 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [427.0, 1000.0, 233.0, 1000.0, 597.0, 891.0, 217.0, 586.0, 795.0, 527.0]
2025-09-16 14:34:24,332 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 12 minutes, 52 seconds)
2025-09-16 14:36:16,072 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:36:29,787 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4619.03467 ± 947.269
2025-09-16 14:36:29,787 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5261.3823, 3078.8088, 3187.0713, 5211.8276, 3255.7231, 5203.245, 5216.725, 5273.0005, 5283.344, 5219.2188]
2025-09-16 14:36:29,787 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 592.0, 594.0, 1000.0, 638.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:36:29,787 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (4619.03) for latency 15
2025-09-16 14:36:29,799 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 10 minutes, 21 seconds)
2025-09-16 14:38:38,875 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:38:52,548 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4687.56152 ± 1336.581
2025-09-16 14:38:52,548 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5343.925, 5331.092, 5256.863, 4925.4644, 819.23334, 5145.264, 4130.5176, 5279.9414, 5323.873, 5319.4463]
2025-09-16 14:38:52,548 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 932.0, 149.0, 1000.0, 806.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:38:52,548 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (4687.56) for latency 15
2025-09-16 14:38:52,580 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 9 minutes, 26 seconds)
2025-09-16 14:40:44,814 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:40:56,414 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4014.13013 ± 1458.386
2025-09-16 14:40:56,414 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [2698.2776, 4388.853, 4890.466, 5415.0527, 5292.2373, 1059.9647, 2166.5913, 3813.4958, 5226.102, 5190.2617]
2025-09-16 14:40:56,414 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [524.0, 849.0, 927.0, 1000.0, 1000.0, 193.0, 426.0, 724.0, 1000.0, 1000.0]
2025-09-16 14:40:56,420 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 5 minutes, 43 seconds)
2025-09-16 14:42:56,630 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:43:08,706 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4365.84912 ± 1595.077
2025-09-16 14:43:08,707 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [2313.4429, 1088.0807, 5457.3994, 5398.1807, 2586.9172, 5408.497, 5449.7725, 5471.1177, 5094.282, 5390.8057]
2025-09-16 14:43:08,707 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [435.0, 199.0, 1000.0, 1000.0, 473.0, 1000.0, 1000.0, 1000.0, 929.0, 1000.0]
2025-09-16 14:43:08,716 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 3 minutes, 43 seconds)
2025-09-16 14:45:11,177 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:45:20,407 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3431.32812 ± 1844.223
2025-09-16 14:45:20,407 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5418.365, 5564.482, 3350.5547, 2801.8813, 2738.8652, 1364.7194, 1270.9064, 780.3681, 5582.2812, 5440.8564]
2025-09-16 14:45:20,407 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 603.0, 500.0, 489.0, 239.0, 232.0, 142.0, 1000.0, 1000.0]
2025-09-16 14:45:20,415 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 1 minute, 14 seconds)
2025-09-16 14:47:19,953 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:47:33,001 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4585.20312 ± 1472.422
2025-09-16 14:47:33,001 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5364.4395, 5284.5977, 5310.6855, 2110.6904, 1225.5664, 5232.189, 5349.7246, 5372.527, 5294.525, 5307.088]
2025-09-16 14:47:33,001 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 404.0, 230.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:47:33,008 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 59 minutes, 41 seconds)
2025-09-16 14:49:33,781 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:49:47,135 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4444.33398 ± 1233.126
2025-09-16 14:49:47,135 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5186.927, 3235.6575, 5214.0234, 5220.904, 5229.3896, 5346.325, 2054.7104, 5167.7715, 5259.1846, 2528.4443]
2025-09-16 14:49:47,135 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 642.0, 1000.0, 1000.0, 1000.0, 1000.0, 409.0, 1000.0, 1000.0, 479.0]
2025-09-16 14:49:47,141 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 56 minutes, 43 seconds)
2025-09-16 14:51:50,933 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:52:04,966 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4818.17139 ± 1266.320
2025-09-16 14:52:04,967 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [4537.009, 5373.95, 5360.791, 1086.8485, 5330.017, 5380.9243, 5340.285, 5239.9336, 5253.997, 5277.9614]
2025-09-16 14:52:04,967 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [848.0, 1000.0, 1000.0, 209.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:52:04,967 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (4818.17) for latency 15
2025-09-16 14:52:04,973 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 55 minutes, 42 seconds)
2025-09-16 14:54:12,321 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:54:27,863 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5329.51074 ± 62.238
2025-09-16 14:54:27,864 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5437.1904, 5309.769, 5366.558, 5270.57, 5289.3647, 5241.936, 5329.589, 5422.404, 5272.003, 5355.7217]
2025-09-16 14:54:27,864 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:54:27,864 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (5329.51) for latency 15
2025-09-16 14:54:27,871 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 54 minutes, 19 seconds)
2025-09-16 14:56:31,228 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:56:47,055 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5235.79199 ± 49.810
2025-09-16 14:56:47,055 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5228.22, 5255.1416, 5216.728, 5181.5015, 5308.5864, 5174.728, 5339.0576, 5202.8027, 5231.882, 5219.271]
2025-09-16 14:56:47,055 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:56:47,062 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 52 minutes, 38 seconds)
2025-09-16 14:58:46,015 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:58:57,450 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4303.55566 ± 1169.800
2025-09-16 14:58:57,451 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5539.389, 5552.535, 4865.1997, 2676.4744, 4496.018, 5588.9165, 5144.8394, 2565.8171, 3631.8618, 2974.502]
2025-09-16 14:58:57,451 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [997.0, 1000.0, 849.0, 490.0, 808.0, 1000.0, 926.0, 453.0, 641.0, 529.0]
2025-09-16 14:58:57,458 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 50 minutes, 11 seconds)
2025-09-16 15:00:51,103 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:00:59,252 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2737.32764 ± 1612.933
2025-09-16 15:00:59,252 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1344.9198, 1651.6715, 5099.0728, 4724.334, 856.6412, 5155.3667, 2269.3853, 2524.8699, 2929.8438, 817.17224]
2025-09-16 15:00:59,252 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [266.0, 338.0, 1000.0, 922.0, 171.0, 1000.0, 447.0, 503.0, 594.0, 155.0]
2025-09-16 15:00:59,259 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 47 minutes, 2 seconds)
2025-09-16 15:03:05,284 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:03:18,180 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4592.66748 ± 1527.567
2025-09-16 15:03:18,180 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5379.7153, 5200.292, 5499.377, 5377.761, 5474.502, 5284.8604, 5281.3223, 1183.9742, 5306.8613, 1938.012]
2025-09-16 15:03:18,180 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 949.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 220.0, 1000.0, 347.0]
2025-09-16 15:03:18,190 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 44 minutes, 52 seconds)
2025-09-16 15:05:23,310 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:05:37,037 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4567.34473 ± 1360.803
2025-09-16 15:05:37,037 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5166.6587, 3020.4927, 5169.442, 985.6207, 5182.472, 5241.8193, 5196.6797, 5222.6626, 5225.497, 5262.103]
2025-09-16 15:05:37,037 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 609.0, 1000.0, 200.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:05:37,049 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 42 minutes, 22 seconds)
2025-09-16 15:07:30,442 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:07:42,379 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4247.71582 ± 1823.482
2025-09-16 15:07:42,379 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5450.727, 5384.4033, 2521.1462, 5432.89, 1304.0579, 5388.429, 5440.6533, 768.0236, 5431.6855, 5355.1396]
2025-09-16 15:07:42,379 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 470.0, 1000.0, 246.0, 1000.0, 1000.0, 143.0, 1000.0, 1000.0]
2025-09-16 15:07:42,388 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 39 minutes, 19 seconds)
2025-09-16 15:09:53,724 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:10:01,818 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2941.68286 ± 1844.382
2025-09-16 15:10:01,818 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1987.903, 1379.014, 5586.369, 5604.8916, 2826.5586, 2160.8325, 805.51154, 802.9181, 5543.3877, 2719.4421]
2025-09-16 15:10:01,818 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [356.0, 244.0, 1000.0, 997.0, 515.0, 398.0, 152.0, 147.0, 1000.0, 499.0]
2025-09-16 15:10:01,826 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 37 minutes, 38 seconds)
2025-09-16 15:11:58,210 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:12:11,376 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4698.09033 ± 727.714
2025-09-16 15:12:11,376 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [3831.4954, 5452.369, 4151.146, 3694.4111, 4728.174, 5526.5815, 5405.9707, 3886.9182, 5605.195, 4698.641]
2025-09-16 15:12:11,376 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [683.0, 1000.0, 747.0, 680.0, 871.0, 1000.0, 1000.0, 699.0, 1000.0, 874.0]
2025-09-16 15:12:11,385 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 35 minutes, 50 seconds)
2025-09-16 15:14:11,780 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:14:27,471 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5286.32324 ± 20.210
2025-09-16 15:14:27,471 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5279.087, 5291.713, 5253.631, 5312.6953, 5293.5034, 5281.668, 5320.883, 5296.774, 5260.0356, 5273.246]
2025-09-16 15:14:27,471 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:14:27,480 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 33 minutes, 27 seconds)
2025-09-16 15:16:26,873 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:16:41,131 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4739.99316 ± 957.798
2025-09-16 15:16:41,131 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5261.844, 5284.9775, 5161.622, 2781.4395, 5247.201, 5146.4644, 5227.5273, 2871.836, 5236.2886, 5180.7275]
2025-09-16 15:16:41,131 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 547.0, 1000.0, 1000.0, 1000.0, 566.0, 1000.0, 1000.0]
2025-09-16 15:16:41,138 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 30 minutes, 59 seconds)
2025-09-16 15:18:42,512 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:18:57,172 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5148.82080 ± 733.485
2025-09-16 15:18:57,172 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5431.629, 5421.4424, 5422.162, 5431.1055, 5354.8735, 5306.266, 2953.0557, 5466.0933, 5365.363, 5336.2144]
2025-09-16 15:18:57,172 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 538.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:18:57,178 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 29 minutes, 14 seconds)
2025-09-16 15:20:59,312 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:21:14,100 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4952.05273 ± 692.273
2025-09-16 15:21:14,100 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5254.045, 5197.2095, 5216.1694, 5222.629, 4968.6074, 5228.5225, 5319.9834, 2894.1265, 5135.645, 5083.595]
2025-09-16 15:21:14,100 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 947.0, 1000.0, 1000.0, 577.0, 1000.0, 1000.0]
2025-09-16 15:21:14,108 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 26 minutes, 53 seconds)
2025-09-16 15:23:23,402 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:23:37,918 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5188.27148 ± 783.260
2025-09-16 15:23:37,918 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5398.5933, 2841.0466, 5399.44, 5475.4927, 5487.8745, 5497.2905, 5430.1616, 5459.3994, 5487.4683, 5405.9497]
2025-09-16 15:23:37,918 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 517.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:23:37,924 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 25 minutes, 10 seconds)
2025-09-16 15:25:41,160 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:25:56,118 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5382.39404 ± 429.486
2025-09-16 15:25:56,118 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5448.0273, 5624.944, 5479.564, 5569.144, 5510.7085, 4110.1772, 5501.4214, 5505.696, 5425.622, 5648.6377]
2025-09-16 15:25:56,118 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 994.0, 1000.0, 1000.0, 1000.0, 730.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:25:56,118 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (5382.39) for latency 15
2025-09-16 15:25:56,128 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 22 minutes, 57 seconds)
2025-09-16 15:27:58,087 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:28:12,473 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4965.66553 ± 956.394
2025-09-16 15:28:12,474 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5428.534, 4909.9487, 5270.3105, 5238.1587, 5256.2065, 5341.2935, 5342.571, 5372.491, 2125.8018, 5371.342]
2025-09-16 15:28:12,474 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 897.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 381.0, 1000.0]
2025-09-16 15:28:12,482 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 20 minutes, 44 seconds)
2025-09-16 15:30:05,780 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:30:20,422 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4987.04688 ± 990.123
2025-09-16 15:30:20,422 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5347.387, 5325.229, 5312.1494, 2022.3682, 5328.2734, 5353.553, 5354.19, 5271.2417, 5397.5073, 5158.57]
2025-09-16 15:30:20,422 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 387.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:30:20,431 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 18 minutes, 13 seconds)
2025-09-16 15:32:32,947 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:32:46,652 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4725.17041 ± 1363.274
2025-09-16 15:32:46,652 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [2088.121, 5448.693, 5375.225, 5429.661, 5400.89, 5339.1733, 5381.097, 5488.417, 5386.7773, 1913.6461]
2025-09-16 15:32:46,652 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [391.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 347.0]
2025-09-16 15:32:46,659 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 16 minutes, 9 seconds)
2025-09-16 15:34:41,952 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:34:57,680 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5383.75537 ± 49.086
2025-09-16 15:34:57,680 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5360.469, 5414.313, 5294.847, 5403.0586, 5328.4077, 5450.578, 5376.363, 5374.53, 5370.6094, 5464.377]
2025-09-16 15:34:57,681 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:34:57,681 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (5383.76) for latency 15
2025-09-16 15:34:57,689 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 13 minutes, 35 seconds)
2025-09-16 15:36:59,653 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:37:14,326 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5045.54590 ± 473.529
2025-09-16 15:37:14,326 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5371.5933, 5228.744, 5255.6167, 5331.064, 5244.491, 5257.605, 5289.4746, 4344.281, 3902.37, 5230.219]
2025-09-16 15:37:14,326 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 808.0, 720.0, 1000.0]
2025-09-16 15:37:14,335 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 11 minutes, 18 seconds)
2025-09-16 15:39:18,821 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:39:33,031 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4854.96387 ± 1176.453
2025-09-16 15:39:33,031 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5174.923, 5070.856, 5303.537, 1337.8534, 5268.5957, 5191.6626, 5316.3384, 5388.8066, 5130.125, 5366.941]
2025-09-16 15:39:33,031 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 942.0, 1000.0, 257.0, 1000.0, 983.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:39:33,044 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 9 minutes, 4 seconds)
2025-09-16 15:41:38,711 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:41:52,800 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4865.09766 ± 1325.442
2025-09-16 15:41:52,800 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5272.423, 5367.3066, 5277.791, 5321.3633, 5372.2583, 5185.3604, 5347.3413, 892.2421, 5262.2256, 5352.663]
2025-09-16 15:41:52,801 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 955.0, 1000.0, 186.0, 1000.0, 1000.0]
2025-09-16 15:41:52,809 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 55 seconds)
2025-09-16 15:43:42,791 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:43:57,916 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5078.20752 ± 848.803
2025-09-16 15:43:57,917 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5412.3374, 5399.897, 5342.519, 5336.7373, 5347.7534, 2533.312, 5309.0586, 5358.1387, 5357.258, 5385.068]
2025-09-16 15:43:57,917 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 464.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:43:57,937 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 28 seconds)
2025-09-16 15:46:04,845 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:46:18,032 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4583.70703 ± 1828.854
2025-09-16 15:46:18,033 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5548.556, 831.3909, 5514.8286, 5516.965, 5461.1855, 5463.963, 1023.4446, 5525.104, 5482.134, 5469.498]
2025-09-16 15:46:18,033 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 159.0, 1000.0, 1000.0, 1000.0, 1000.0, 196.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:46:18,040 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 16 seconds)
2025-09-16 15:48:20,116 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:48:36,128 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5221.09570 ± 39.156
2025-09-16 15:48:36,128 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5240.6904, 5181.387, 5202.2383, 5239.1387, 5190.626, 5262.509, 5291.0894, 5150.115, 5231.548, 5221.614]
2025-09-16 15:48:36,128 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:48:36,135 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1251 [DEBUG]: Training session finished
