2025-09-16 11:56:59,180 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.000-delay_9
2025-09-16 11:56:59,181 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.000-delay_9
2025-09-16 11:56:59,181 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'9': <latency_env.delayed_mdp.ConstantDelay object at 0x152640c34590>}
2025-09-16 11:56:59,181 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 11:56:59,185 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 11:56:59,205 baseline-bpql-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=529, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 11:56:59,205 baseline-bpql-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 11:57:00,722 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 11:57:00,722 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 11:58:50,990 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 11:58:52,180 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 393.99365 ± 52.810
2025-09-16 11:58:52,180 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [449.4248, 410.25674, 327.73074, 443.36804, 327.06656, 388.14545, 329.24902, 354.57404, 453.30414, 456.81696]
2025-09-16 11:58:52,180 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 79.0, 63.0, 87.0, 64.0, 76.0, 63.0, 69.0, 89.0, 89.0]
2025-09-16 11:58:52,180 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (393.99) for latency 9
2025-09-16 11:58:52,183 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 3 minutes, 54 seconds)
2025-09-16 12:00:51,650 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:00:52,361 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 318.14899 ± 19.816
2025-09-16 12:00:52,361 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [307.42874, 301.33655, 321.23642, 306.92612, 308.07092, 361.51184, 322.8009, 300.4555, 347.74658, 303.97662]
2025-09-16 12:00:52,361 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [58.0, 56.0, 62.0, 57.0, 59.0, 67.0, 61.0, 57.0, 67.0, 58.0]
2025-09-16 12:00:52,366 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 9 minutes, 10 seconds)
2025-09-16 12:02:51,829 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:02:52,650 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 349.17047 ± 34.233
2025-09-16 12:02:52,650 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [367.51282, 340.40283, 348.41354, 304.5585, 320.42545, 379.6956, 324.90536, 321.6019, 427.61246, 356.57587]
2025-09-16 12:02:52,651 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [72.0, 64.0, 66.0, 59.0, 61.0, 73.0, 62.0, 62.0, 84.0, 68.0]
2025-09-16 12:02:52,653 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 9 minutes, 39 seconds)
2025-09-16 12:04:52,761 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:04:54,002 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 465.86188 ± 68.906
2025-09-16 12:04:54,002 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [377.24417, 551.55695, 549.06995, 348.92203, 424.90363, 417.58792, 514.4306, 536.7933, 457.5146, 480.59592]
2025-09-16 12:04:54,002 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [70.0, 106.0, 104.0, 65.0, 78.0, 77.0, 97.0, 100.0, 86.0, 90.0]
2025-09-16 12:04:54,003 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (465.86) for latency 9
2025-09-16 12:04:54,020 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 9 minutes, 19 seconds)
2025-09-16 12:06:52,889 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:06:54,163 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 490.44742 ± 89.467
2025-09-16 12:06:54,163 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [529.5093, 450.08896, 652.6735, 422.92035, 344.04144, 471.4562, 478.58206, 478.4118, 636.7116, 440.0789]
2025-09-16 12:06:54,163 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 84.0, 132.0, 93.0, 75.0, 102.0, 89.0, 96.0, 121.0, 96.0]
2025-09-16 12:06:54,163 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (490.45) for latency 9
2025-09-16 12:06:54,194 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 7 minutes, 55 seconds)
2025-09-16 12:08:54,361 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:08:55,628 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 501.58261 ± 89.899
2025-09-16 12:08:55,628 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [484.3696, 417.1118, 449.0648, 564.45447, 385.5925, 559.9201, 595.9435, 669.37317, 501.3774, 388.61847]
2025-09-16 12:08:55,628 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 80.0, 88.0, 105.0, 80.0, 109.0, 122.0, 126.0, 93.0, 80.0]
2025-09-16 12:08:55,628 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (501.58) for latency 9
2025-09-16 12:08:55,632 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 9 minutes, 4 seconds)
2025-09-16 12:10:55,049 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:10:56,230 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 489.90811 ± 56.908
2025-09-16 12:10:56,230 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [469.26566, 453.42477, 463.72543, 483.2643, 454.6039, 463.5751, 628.61444, 567.6202, 474.4989, 440.48843]
2025-09-16 12:10:56,230 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 86.0, 87.0, 90.0, 86.0, 87.0, 117.0, 116.0, 92.0, 96.0]
2025-09-16 12:10:56,234 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 7 minutes, 11 seconds)
2025-09-16 12:12:55,002 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:12:56,225 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 470.80731 ± 91.254
2025-09-16 12:12:56,225 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [327.97427, 383.2558, 543.8696, 621.0628, 393.29675, 441.6974, 505.5964, 532.6182, 566.6992, 392.00302]
2025-09-16 12:12:56,225 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [72.0, 75.0, 106.0, 132.0, 84.0, 85.0, 112.0, 106.0, 114.0, 84.0]
2025-09-16 12:12:56,233 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 5 minutes, 5 seconds)
2025-09-16 12:14:56,816 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:14:58,197 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 536.93982 ± 52.851
2025-09-16 12:14:58,197 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [605.79614, 511.36316, 635.6557, 564.9016, 524.8505, 511.30984, 538.13, 490.9064, 441.9201, 544.56464]
2025-09-16 12:14:58,197 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 111.0, 123.0, 111.0, 98.0, 96.0, 117.0, 108.0, 92.0, 103.0]
2025-09-16 12:14:58,197 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (536.94) for latency 9
2025-09-16 12:14:58,200 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 3 hours, 3 minutes, 16 seconds)
2025-09-16 12:17:00,596 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:17:02,156 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 584.21759 ± 122.233
2025-09-16 12:17:02,156 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [534.1665, 496.18045, 531.36224, 558.66437, 532.8749, 548.1027, 783.5381, 446.09244, 558.12964, 853.0642]
2025-09-16 12:17:02,156 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [115.0, 109.0, 120.0, 119.0, 101.0, 120.0, 149.0, 94.0, 108.0, 183.0]
2025-09-16 12:17:02,156 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (584.22) for latency 9
2025-09-16 12:17:02,160 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 3 hours, 2 minutes, 23 seconds)
2025-09-16 12:19:04,041 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:19:05,548 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 542.76837 ± 102.808
2025-09-16 12:19:05,548 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [534.0566, 417.64105, 657.85675, 434.5167, 674.8048, 571.1578, 485.53027, 530.07043, 707.06476, 414.98386]
2025-09-16 12:19:05,548 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [108.0, 91.0, 128.0, 82.0, 145.0, 121.0, 92.0, 101.0, 131.0, 79.0]
2025-09-16 12:19:05,552 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 3 hours, 56 seconds)
2025-09-16 12:21:05,862 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:21:07,399 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 615.50256 ± 186.480
2025-09-16 12:21:07,400 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [443.55972, 450.74765, 643.87115, 545.47186, 718.8788, 350.06693, 766.4763, 659.3018, 543.6432, 1033.0088]
2025-09-16 12:21:07,400 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 89.0, 139.0, 106.0, 137.0, 74.0, 146.0, 125.0, 103.0, 202.0]
2025-09-16 12:21:07,400 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (615.50) for latency 9
2025-09-16 12:21:07,426 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 59 minutes, 16 seconds)
2025-09-16 12:23:08,072 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:23:09,833 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 611.80487 ± 140.885
2025-09-16 12:23:09,833 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [467.19113, 505.668, 780.02673, 730.109, 745.55096, 495.33542, 489.69672, 584.8563, 469.65494, 849.9595]
2025-09-16 12:23:09,833 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [91.0, 95.0, 152.0, 162.0, 139.0, 110.0, 92.0, 118.0, 101.0, 183.0]
2025-09-16 12:23:09,837 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 57 minutes, 56 seconds)
2025-09-16 12:25:12,206 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:25:14,039 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 660.15399 ± 112.552
2025-09-16 12:25:14,039 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [812.6729, 686.63574, 744.17816, 865.10486, 554.04584, 485.72516, 623.7551, 653.8773, 595.65314, 579.89154]
2025-09-16 12:25:14,039 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [161.0, 148.0, 153.0, 164.0, 101.0, 92.0, 133.0, 121.0, 117.0, 107.0]
2025-09-16 12:25:14,039 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (660.15) for latency 9
2025-09-16 12:25:14,044 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 56 minutes, 32 seconds)
2025-09-16 12:27:17,572 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:27:19,083 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 604.25793 ± 73.131
2025-09-16 12:27:19,083 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [706.58167, 534.10095, 626.25806, 485.65173, 644.1071, 483.6897, 669.7444, 656.5972, 620.31213, 615.53705]
2025-09-16 12:27:19,083 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [136.0, 102.0, 123.0, 91.0, 121.0, 91.0, 135.0, 125.0, 134.0, 115.0]
2025-09-16 12:27:19,087 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 54 minutes, 47 seconds)
2025-09-16 12:29:20,946 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:29:22,816 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 719.45038 ± 139.251
2025-09-16 12:29:22,816 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [696.9234, 640.41425, 567.75946, 626.3396, 984.23755, 691.20825, 980.6826, 599.9926, 731.24493, 675.70123]
2025-09-16 12:29:22,816 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [150.0, 124.0, 124.0, 134.0, 185.0, 147.0, 190.0, 117.0, 148.0, 147.0]
2025-09-16 12:29:22,816 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (719.45) for latency 9
2025-09-16 12:29:22,822 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 52 minutes, 50 seconds)
2025-09-16 12:31:23,465 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:31:25,119 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 694.24768 ± 114.262
2025-09-16 12:31:25,119 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [583.3869, 698.2921, 648.1107, 698.159, 796.0909, 655.0116, 696.1793, 838.43427, 462.38968, 866.4222]
2025-09-16 12:31:25,119 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [106.0, 129.0, 118.0, 129.0, 150.0, 121.0, 127.0, 158.0, 86.0, 164.0]
2025-09-16 12:31:25,125 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 50 minutes, 53 seconds)
2025-09-16 12:33:29,543 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:33:31,308 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 689.74042 ± 145.001
2025-09-16 12:33:31,308 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [631.3272, 573.8541, 800.06915, 551.569, 758.2079, 730.8289, 580.40826, 727.38947, 1025.754, 517.996]
2025-09-16 12:33:31,308 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [118.0, 120.0, 156.0, 118.0, 150.0, 158.0, 126.0, 141.0, 201.0, 103.0]
2025-09-16 12:33:31,326 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 49 minutes, 52 seconds)
2025-09-16 12:35:33,203 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:35:34,876 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 633.55353 ± 73.398
2025-09-16 12:35:34,876 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [624.7541, 634.40063, 643.9162, 439.38986, 672.2264, 632.2979, 689.6106, 736.66943, 649.8955, 612.3745]
2025-09-16 12:35:34,876 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [115.0, 119.0, 118.0, 84.0, 128.0, 119.0, 131.0, 134.0, 117.0, 112.0]
2025-09-16 12:35:34,882 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 47 minutes, 37 seconds)
2025-09-16 12:37:37,352 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:37:39,213 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 732.33337 ± 155.852
2025-09-16 12:37:39,213 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [627.5801, 642.5849, 830.8221, 645.8584, 741.09296, 828.9318, 1104.6064, 717.17786, 494.18185, 690.498]
2025-09-16 12:37:39,213 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [118.0, 122.0, 178.0, 119.0, 135.0, 177.0, 214.0, 135.0, 91.0, 128.0]
2025-09-16 12:37:39,213 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (732.33) for latency 9
2025-09-16 12:37:39,216 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 45 minutes, 22 seconds)
2025-09-16 12:39:40,491 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:39:42,464 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 702.06653 ± 138.094
2025-09-16 12:39:42,465 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [631.49164, 914.3829, 558.52795, 756.71106, 777.57526, 623.4875, 901.92017, 553.0755, 513.9612, 789.5323]
2025-09-16 12:39:42,465 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [133.0, 171.0, 117.0, 143.0, 158.0, 121.0, 174.0, 120.0, 99.0, 149.0]
2025-09-16 12:39:42,484 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 43 minutes, 10 seconds)
2025-09-16 12:41:45,607 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:41:47,643 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 723.03986 ± 259.740
2025-09-16 12:41:47,643 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [782.97034, 586.1308, 579.57184, 670.0849, 624.84296, 684.15924, 599.12714, 1482.3512, 627.437, 593.7234]
2025-09-16 12:41:47,643 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [155.0, 127.0, 108.0, 126.0, 128.0, 140.0, 114.0, 287.0, 124.0, 111.0]
2025-09-16 12:41:47,659 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 41 minutes, 51 seconds)
2025-09-16 12:43:49,828 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:43:51,959 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 840.87823 ± 212.692
2025-09-16 12:43:51,959 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [750.03046, 692.402, 880.38196, 814.0785, 774.0543, 1333.6373, 1015.41785, 468.74728, 882.3013, 797.73096]
2025-09-16 12:43:51,959 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [148.0, 132.0, 171.0, 158.0, 150.0, 265.0, 198.0, 103.0, 173.0, 154.0]
2025-09-16 12:43:51,959 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (840.88) for latency 9
2025-09-16 12:43:51,993 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 39 minutes, 18 seconds)
2025-09-16 12:45:56,074 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:45:58,195 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 850.32043 ± 199.516
2025-09-16 12:45:58,195 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [620.7742, 1217.7183, 1129.6721, 785.6454, 699.2398, 1064.762, 669.73755, 771.23804, 707.81067, 836.6069]
2025-09-16 12:45:58,195 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [123.0, 235.0, 225.0, 154.0, 143.0, 212.0, 133.0, 137.0, 128.0, 163.0]
2025-09-16 12:45:58,195 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (850.32) for latency 9
2025-09-16 12:45:58,199 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 37 minutes, 54 seconds)
2025-09-16 12:48:00,851 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:48:03,611 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1067.88843 ± 168.597
2025-09-16 12:48:03,611 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1143.518, 984.941, 1196.0685, 1072.4126, 1094.3837, 645.4254, 1271.4591, 974.9691, 1228.9596, 1066.7462]
2025-09-16 12:48:03,611 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [224.0, 198.0, 228.0, 214.0, 210.0, 139.0, 232.0, 184.0, 262.0, 217.0]
2025-09-16 12:48:03,611 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (1067.89) for latency 9
2025-09-16 12:48:03,615 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 36 minutes, 5 seconds)
2025-09-16 12:50:06,137 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:50:08,224 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 808.28552 ± 168.894
2025-09-16 12:50:08,224 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1082.6051, 1069.959, 803.0364, 792.8674, 666.6345, 897.19684, 763.21735, 507.9695, 665.41266, 833.95685]
2025-09-16 12:50:08,224 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [224.0, 217.0, 177.0, 148.0, 135.0, 166.0, 149.0, 111.0, 128.0, 180.0]
2025-09-16 12:50:08,230 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 34 minutes, 21 seconds)
2025-09-16 12:52:09,335 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:52:11,341 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 764.53558 ± 245.204
2025-09-16 12:52:11,341 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [745.97595, 838.57886, 453.21335, 669.6003, 692.05695, 669.106, 863.7457, 689.6011, 1422.5018, 600.9759]
2025-09-16 12:52:11,341 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [140.0, 173.0, 91.0, 142.0, 136.0, 133.0, 162.0, 143.0, 287.0, 130.0]
2025-09-16 12:52:11,348 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 31 minutes, 45 seconds)
2025-09-16 12:54:14,851 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:54:17,416 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 893.77264 ± 173.751
2025-09-16 12:54:17,416 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [714.93414, 1054.4225, 820.2292, 700.7122, 1076.8938, 1089.2544, 1140.7233, 853.67584, 848.041, 638.8399]
2025-09-16 12:54:17,416 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [135.0, 209.0, 169.0, 146.0, 219.0, 205.0, 226.0, 160.0, 167.0, 118.0]
2025-09-16 12:54:17,452 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 30 minutes, 6 seconds)
2025-09-16 12:56:19,093 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:56:21,736 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 898.34845 ± 250.361
2025-09-16 12:56:21,736 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1148.0435, 1230.9968, 854.756, 852.1394, 508.46274, 1205.2445, 922.3614, 1011.61176, 767.51044, 482.35757]
2025-09-16 12:56:21,736 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [226.0, 239.0, 179.0, 179.0, 111.0, 241.0, 187.0, 202.0, 151.0, 107.0]
2025-09-16 12:56:21,750 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 27 minutes, 34 seconds)
2025-09-16 12:58:23,603 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:58:25,874 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 867.36591 ± 265.398
2025-09-16 12:58:25,874 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1534.4983, 898.043, 1018.5079, 939.37, 781.0233, 945.0848, 655.79724, 562.4278, 650.1571, 688.7503]
2025-09-16 12:58:25,874 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [311.0, 180.0, 206.0, 184.0, 165.0, 176.0, 139.0, 119.0, 124.0, 144.0]
2025-09-16 12:58:25,883 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 25 minutes, 11 seconds)
2025-09-16 13:00:29,473 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:00:31,829 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 958.95129 ± 367.157
2025-09-16 13:00:31,830 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [609.13086, 665.59375, 1305.2738, 1184.8129, 837.2878, 1729.5865, 702.63354, 985.8621, 1125.8386, 443.49203]
2025-09-16 13:00:31,830 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [110.0, 121.0, 255.0, 218.0, 156.0, 330.0, 145.0, 181.0, 210.0, 84.0]
2025-09-16 13:00:31,835 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 23 minutes, 25 seconds)
2025-09-16 13:02:35,258 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:02:38,418 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1202.18860 ± 402.068
2025-09-16 13:02:38,418 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1483.8214, 849.82837, 915.72906, 1425.3108, 806.17267, 1306.5265, 2161.0256, 1205.3207, 1092.8317, 775.3194]
2025-09-16 13:02:38,418 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [307.0, 180.0, 167.0, 275.0, 169.0, 265.0, 406.0, 250.0, 222.0, 144.0]
2025-09-16 13:02:38,419 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (1202.19) for latency 9
2025-09-16 13:02:38,439 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 22 minutes, 8 seconds)
2025-09-16 13:04:39,981 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:04:43,215 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1239.18396 ± 517.105
2025-09-16 13:04:43,215 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [861.15234, 776.9785, 1380.8004, 2634.9897, 1119.1063, 1301.3691, 730.2334, 1158.6863, 1035.2892, 1393.234]
2025-09-16 13:04:43,215 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [161.0, 143.0, 295.0, 541.0, 231.0, 250.0, 152.0, 221.0, 201.0, 279.0]
2025-09-16 13:04:43,215 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (1239.18) for latency 9
2025-09-16 13:04:43,222 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 19 minutes, 45 seconds)
2025-09-16 13:06:47,359 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:06:50,123 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1052.29260 ± 336.308
2025-09-16 13:06:50,123 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1147.4513, 741.4124, 880.0138, 1612.6067, 592.5862, 1309.0393, 1592.6776, 772.70703, 999.5859, 874.84595]
2025-09-16 13:06:50,123 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [222.0, 136.0, 171.0, 314.0, 121.0, 273.0, 308.0, 149.0, 196.0, 183.0]
2025-09-16 13:06:50,141 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 18 minutes, 14 seconds)
2025-09-16 13:08:55,302 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:08:57,568 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 913.84930 ± 310.385
2025-09-16 13:08:57,568 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [767.92487, 760.6087, 860.82495, 1627.7196, 1246.0615, 1020.5679, 939.6073, 449.64175, 769.5381, 695.9992]
2025-09-16 13:08:57,568 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [151.0, 145.0, 157.0, 318.0, 253.0, 185.0, 169.0, 87.0, 141.0, 131.0]
2025-09-16 13:08:57,574 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 16 minutes, 51 seconds)
2025-09-16 13:10:59,249 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:11:03,230 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1378.07751 ± 540.345
2025-09-16 13:11:03,230 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1075.7433, 929.8263, 1709.1569, 971.5541, 1743.4907, 1878.6755, 515.9382, 1848.5215, 2240.2075, 867.66187]
2025-09-16 13:11:03,230 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [226.0, 176.0, 337.0, 193.0, 340.0, 351.0, 115.0, 370.0, 441.0, 182.0]
2025-09-16 13:11:03,230 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (1378.08) for latency 9
2025-09-16 13:11:03,235 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 14 minutes, 41 seconds)
2025-09-16 13:13:05,917 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:13:10,454 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1675.29370 ± 464.602
2025-09-16 13:13:10,454 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1291.4678, 1834.7697, 1893.4004, 1677.4353, 1526.3937, 1648.1368, 2910.9255, 1343.6337, 1222.3578, 1404.418]
2025-09-16 13:13:10,454 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [246.0, 363.0, 378.0, 333.0, 302.0, 338.0, 597.0, 265.0, 245.0, 279.0]
2025-09-16 13:13:10,454 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (1675.29) for latency 9
2025-09-16 13:13:10,460 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 12 minutes, 43 seconds)
2025-09-16 13:15:15,024 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:15:18,467 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1303.09644 ± 799.510
2025-09-16 13:15:18,467 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [825.27814, 1357.1002, 2824.5063, 828.4335, 940.4167, 1033.3853, 2903.8533, 838.11017, 783.183, 696.6974]
2025-09-16 13:15:18,467 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [181.0, 265.0, 561.0, 171.0, 192.0, 196.0, 567.0, 157.0, 152.0, 144.0]
2025-09-16 13:15:18,472 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 11 minutes, 17 seconds)
2025-09-16 13:17:20,211 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:17:25,614 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1941.61096 ± 768.671
2025-09-16 13:17:25,614 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1675.3618, 1480.7916, 2379.462, 1766.9502, 1812.4545, 3949.0012, 1432.6005, 2373.915, 1182.8784, 1362.695]
2025-09-16 13:17:25,614 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [339.0, 294.0, 449.0, 336.0, 362.0, 810.0, 288.0, 480.0, 246.0, 268.0]
2025-09-16 13:17:25,614 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (1941.61) for latency 9
2025-09-16 13:17:25,623 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 9 minutes, 12 seconds)
2025-09-16 13:19:30,696 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:19:34,232 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1243.32129 ± 388.683
2025-09-16 13:19:34,232 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1408.6738, 1144.3113, 1474.7163, 1290.1577, 2027.0527, 858.96814, 714.8239, 1648.4219, 1032.8976, 833.1891]
2025-09-16 13:19:34,232 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [267.0, 220.0, 292.0, 251.0, 395.0, 165.0, 140.0, 320.0, 223.0, 165.0]
2025-09-16 13:19:34,239 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 7 minutes, 19 seconds)
2025-09-16 13:21:42,074 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:21:45,137 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1194.57239 ± 564.001
2025-09-16 13:21:45,137 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1336.4625, 2527.3396, 752.2168, 1066.5494, 1884.8916, 594.5306, 771.2869, 826.3495, 1076.6887, 1109.4078]
2025-09-16 13:21:45,137 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [265.0, 487.0, 137.0, 222.0, 355.0, 109.0, 147.0, 153.0, 212.0, 237.0]
2025-09-16 13:21:45,158 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 6 minutes, 14 seconds)
2025-09-16 13:23:49,198 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:23:53,905 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1773.36450 ± 790.379
2025-09-16 13:23:53,905 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1399.6101, 1826.1729, 3362.7422, 2363.4446, 971.15906, 944.0746, 2144.8047, 2615.2344, 966.7435, 1139.6584]
2025-09-16 13:23:53,905 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [281.0, 354.0, 685.0, 455.0, 192.0, 202.0, 421.0, 495.0, 177.0, 237.0]
2025-09-16 13:23:53,912 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 4 minutes, 24 seconds)
2025-09-16 13:26:03,563 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:26:08,292 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1777.91174 ± 1293.570
2025-09-16 13:26:08,292 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1646.5167, 1077.2786, 1466.3176, 885.83844, 4956.7676, 1159.4781, 3527.819, 827.8256, 1151.5021, 1079.7742]
2025-09-16 13:26:08,292 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [314.0, 206.0, 284.0, 165.0, 993.0, 222.0, 693.0, 163.0, 219.0, 208.0]
2025-09-16 13:26:08,304 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 3 minutes, 28 seconds)
2025-09-16 13:28:02,527 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:28:12,321 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3457.16211 ± 1228.832
2025-09-16 13:28:12,321 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [3631.526, 1861.9001, 2881.0903, 4989.103, 3425.2913, 2284.825, 4516.7646, 1470.4305, 4878.8726, 4631.8174]
2025-09-16 13:28:12,321 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [710.0, 377.0, 572.0, 1000.0, 699.0, 450.0, 909.0, 294.0, 950.0, 931.0]
2025-09-16 13:28:12,321 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (3457.16) for latency 9
2025-09-16 13:28:12,326 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 43 seconds)
2025-09-16 13:30:15,521 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:30:22,026 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2187.53394 ± 1120.222
2025-09-16 13:30:22,026 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1498.9911, 2552.6553, 1951.3282, 5107.486, 2863.0417, 2096.543, 1518.1917, 1759.1836, 1791.9686, 735.95013]
2025-09-16 13:30:22,026 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [274.0, 517.0, 392.0, 1000.0, 558.0, 426.0, 309.0, 348.0, 343.0, 134.0]
2025-09-16 13:30:22,037 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 58 minutes, 45 seconds)
2025-09-16 13:32:33,773 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:32:41,063 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2751.38599 ± 1633.972
2025-09-16 13:32:41,063 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1321.2338, 4062.878, 1428.8544, 3691.6746, 5247.147, 1399.9054, 1096.9524, 4985.254, 3512.8994, 767.0627]
2025-09-16 13:32:41,063 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [254.0, 793.0, 296.0, 712.0, 1000.0, 303.0, 212.0, 953.0, 658.0, 147.0]
2025-09-16 13:32:41,070 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 58 minutes, 3 seconds)
2025-09-16 13:34:41,867 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:34:51,301 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3156.11694 ± 1459.849
2025-09-16 13:34:51,301 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5203.105, 2434.0405, 2787.2915, 5241.9775, 2028.096, 2189.415, 3329.4663, 2059.2258, 1061.8813, 5226.6714]
2025-09-16 13:34:51,301 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 468.0, 549.0, 1000.0, 406.0, 421.0, 658.0, 399.0, 219.0, 1000.0]
2025-09-16 13:34:51,315 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 56 minutes, 8 seconds)
2025-09-16 13:36:55,331 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:37:02,380 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2655.48975 ± 1680.618
2025-09-16 13:37:02,380 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5150.3486, 1480.2635, 3146.0588, 5210.219, 1256.6918, 4849.8115, 1411.3481, 1861.5527, 940.246, 1248.3582]
2025-09-16 13:37:02,380 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 279.0, 614.0, 996.0, 257.0, 919.0, 273.0, 349.0, 175.0, 232.0]
2025-09-16 13:37:02,393 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 53 minutes, 22 seconds)
2025-09-16 13:39:08,749 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:39:22,149 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4304.94824 ± 1317.683
2025-09-16 13:39:22,149 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [2392.908, 4984.358, 3980.2922, 5117.3823, 1235.5375, 5165.448, 4731.8105, 5128.898, 5096.939, 5215.9067]
2025-09-16 13:39:22,149 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [456.0, 1000.0, 812.0, 1000.0, 246.0, 1000.0, 908.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:39:22,149 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (4304.95) for latency 9
2025-09-16 13:39:22,164 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 53 minutes, 52 seconds)
2025-09-16 13:41:32,748 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:41:42,850 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3674.69141 ± 1583.874
2025-09-16 13:41:42,851 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5107.848, 5207.093, 1569.4005, 2624.2183, 1835.8851, 5215.75, 3837.619, 5156.561, 1293.3625, 4899.176]
2025-09-16 13:41:42,851 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 321.0, 502.0, 362.0, 1000.0, 731.0, 1000.0, 248.0, 948.0]
2025-09-16 13:41:42,858 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 53 minutes, 28 seconds)
2025-09-16 13:43:44,315 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:44:00,213 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5092.71729 ± 90.544
2025-09-16 13:44:00,214 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5130.7974, 5104.947, 5218.93, 5038.0884, 5089.1836, 5181.542, 4956.7324, 4928.3467, 5182.4097, 5096.1973]
2025-09-16 13:44:00,214 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:44:00,214 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (5092.72) for latency 9
2025-09-16 13:44:00,226 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 50 minutes, 55 seconds)
2025-09-16 13:46:07,776 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:46:23,140 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5047.04443 ± 602.682
2025-09-16 13:46:23,140 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5264.7314, 5298.56, 5060.5195, 3252.6885, 5192.278, 5226.873, 5335.569, 5319.6357, 5269.8506, 5249.7363]
2025-09-16 13:46:23,140 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 656.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:46:23,150 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 50 minutes, 41 seconds)
2025-09-16 13:48:27,424 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:48:39,497 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4440.58447 ± 1390.111
2025-09-16 13:48:39,497 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5221.559, 5183.1353, 5130.1333, 5282.719, 2586.0786, 5218.9023, 1052.4685, 5257.3867, 4183.898, 5289.5645]
2025-09-16 13:48:39,497 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 491.0, 1000.0, 202.0, 1000.0, 780.0, 1000.0]
2025-09-16 13:48:39,503 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 49 minutes, 12 seconds)
2025-09-16 13:50:51,358 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:51:01,505 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3872.06006 ± 1681.244
2025-09-16 13:51:01,505 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [4631.7007, 1117.5969, 2820.503, 5213.391, 5301.8384, 1595.6938, 5394.386, 5280.345, 2009.5386, 5355.609]
2025-09-16 13:51:01,505 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [855.0, 214.0, 525.0, 1000.0, 1000.0, 291.0, 1000.0, 1000.0, 364.0, 1000.0]
2025-09-16 13:51:01,518 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 47 minutes, 14 seconds)
2025-09-16 13:52:57,354 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:53:11,354 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5199.66455 ± 306.897
2025-09-16 13:53:11,354 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5354.145, 5360.769, 5337.8726, 4301.251, 5233.739, 5290.177, 5355.2485, 5147.278, 5362.7417, 5253.4253]
2025-09-16 13:53:11,354 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 868.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:53:11,354 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (5199.66) for latency 9
2025-09-16 13:53:11,362 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 43 minutes, 16 seconds)
2025-09-16 13:55:20,159 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:55:33,056 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4859.95312 ± 1220.521
2025-09-16 13:55:33,056 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [4925.7803, 5302.03, 5312.1343, 5285.1885, 1214.6367, 5291.4775, 5298.1533, 5300.69, 5338.7515, 5330.6895]
2025-09-16 13:55:33,056 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [946.0, 1000.0, 1000.0, 1000.0, 260.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:55:33,075 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 41 minutes, 37 seconds)
2025-09-16 13:57:42,668 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:57:54,743 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4626.48633 ± 1419.383
2025-09-16 13:57:54,744 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [2257.2961, 5359.1626, 5439.8325, 5415.032, 1389.2152, 5389.233, 5373.116, 5229.0845, 5035.7344, 5377.155]
2025-09-16 13:57:54,744 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [425.0, 1000.0, 1000.0, 1000.0, 258.0, 1000.0, 1000.0, 1000.0, 936.0, 1000.0]
2025-09-16 13:57:54,755 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 39 minutes, 7 seconds)
2025-09-16 13:59:55,349 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:00:07,794 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4669.06738 ± 1056.648
2025-09-16 14:00:07,794 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5331.0493, 2438.9136, 3307.075, 5329.5415, 5358.8257, 3576.353, 5355.322, 5379.9604, 5271.882, 5341.7524]
2025-09-16 14:00:07,794 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 463.0, 681.0, 1000.0, 1000.0, 670.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:00:07,801 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 36 minutes, 21 seconds)
2025-09-16 14:02:09,912 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:02:23,205 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4913.36963 ± 1093.504
2025-09-16 14:02:23,205 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5166.4717, 5334.329, 5279.8735, 5283.514, 5280.0923, 1635.7766, 5323.659, 5229.0635, 5308.888, 5292.028]
2025-09-16 14:02:23,205 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 355.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:02:23,212 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 33 minutes, 9 seconds)
2025-09-16 14:04:29,668 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:04:42,159 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4773.58057 ± 1349.160
2025-09-16 14:04:42,159 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5352.9814, 5384.5195, 1043.901, 5489.1426, 5392.152, 5358.391, 3641.362, 5427.664, 5451.434, 5194.252]
2025-09-16 14:04:42,159 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 199.0, 1000.0, 1000.0, 1000.0, 677.0, 1000.0, 1000.0, 948.0]
2025-09-16 14:04:42,165 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 32 minutes, 6 seconds)
2025-09-16 14:06:47,981 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:07:01,503 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4713.32764 ± 1370.340
2025-09-16 14:07:01,503 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5332.6743, 5381.1226, 5347.9136, 5409.217, 5364.5244, 1119.1249, 3130.5146, 5363.0073, 5350.8535, 5334.325]
2025-09-16 14:07:01,503 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 221.0, 607.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:07:01,519 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 29 minutes, 29 seconds)
2025-09-16 14:09:10,072 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:09:20,030 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3712.20117 ± 2017.077
2025-09-16 14:09:20,030 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5383.894, 1312.548, 1558.6589, 5310.195, 5388.4604, 5333.676, 1221.7306, 5324.413, 901.91046, 5386.5254]
2025-09-16 14:09:20,030 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 275.0, 329.0, 1000.0, 1000.0, 1000.0, 261.0, 1000.0, 195.0, 1000.0]
2025-09-16 14:09:20,038 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 26 minutes, 48 seconds)
2025-09-16 14:11:26,405 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:11:40,881 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5103.44629 ± 189.585
2025-09-16 14:11:40,881 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5188.47, 5178.73, 5123.681, 4987.8486, 5226.156, 4569.8955, 5144.06, 5194.5146, 5218.7593, 5202.353]
2025-09-16 14:11:40,881 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 892.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:11:40,889 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 25 minutes, 28 seconds)
2025-09-16 14:13:48,698 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:14:04,937 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5164.01855 ± 19.415
2025-09-16 14:14:04,937 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5180.2554, 5184.5674, 5168.327, 5161.3877, 5164.203, 5173.4097, 5122.2505, 5190.345, 5148.3203, 5147.1196]
2025-09-16 14:14:04,938 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:14:04,948 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 24 minutes, 12 seconds)
2025-09-16 14:16:11,487 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:16:25,930 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5341.27881 ± 130.154
2025-09-16 14:16:25,930 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5349.7656, 5436.451, 5417.7905, 5307.2544, 4976.6377, 5417.1074, 5392.519, 5334.0767, 5331.4165, 5449.77]
2025-09-16 14:16:25,931 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 959.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:16:25,931 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (5341.28) for latency 9
2025-09-16 14:16:25,940 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 22 minutes, 6 seconds)
2025-09-16 14:18:32,030 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:18:47,639 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5412.66016 ± 32.210
2025-09-16 14:18:47,639 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5382.7656, 5386.3867, 5417.9355, 5392.4507, 5398.8584, 5480.361, 5397.195, 5389.136, 5416.16, 5465.3496]
2025-09-16 14:18:47,640 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:18:47,640 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (5412.66) for latency 9
2025-09-16 14:18:47,647 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 20 minutes, 1 second)
2025-09-16 14:20:53,313 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:21:06,341 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4970.64941 ± 1343.916
2025-09-16 14:21:06,341 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5382.564, 5414.735, 5412.785, 5417.473, 5428.457, 5466.323, 5422.8604, 5424.8564, 5397.063, 939.3753]
2025-09-16 14:21:06,341 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 185.0]
2025-09-16 14:21:06,365 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 17 minutes, 41 seconds)
2025-09-16 14:23:11,324 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:23:25,549 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4677.93555 ± 1283.285
2025-09-16 14:23:25,549 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5208.1646, 5070.42, 5091.2305, 5302.2515, 5118.164, 926.0811, 5308.779, 5209.977, 5273.8555, 4270.434]
2025-09-16 14:23:25,549 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 215.0, 1000.0, 1000.0, 1000.0, 921.0]
2025-09-16 14:23:25,568 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 15 minutes, 9 seconds)
2025-09-16 14:25:30,449 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:25:45,030 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5308.77588 ± 24.425
2025-09-16 14:25:45,030 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5324.4346, 5293.4575, 5317.4976, 5315.3975, 5316.079, 5261.876, 5327.6274, 5315.646, 5271.07, 5344.6733]
2025-09-16 14:25:45,030 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:25:45,038 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 12 minutes, 20 seconds)
2025-09-16 14:27:50,888 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:28:06,901 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5225.20215 ± 18.332
2025-09-16 14:28:06,901 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5255.6655, 5235.7236, 5246.2056, 5237.191, 5212.1665, 5221.0996, 5211.706, 5229.672, 5191.1616, 5211.4355]
2025-09-16 14:28:06,901 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:28:06,941 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 10 minutes, 6 seconds)
2025-09-16 14:30:09,958 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:30:20,774 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4030.96924 ± 1830.243
2025-09-16 14:30:20,774 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5440.4575, 5341.7617, 5318.9043, 5401.1807, 719.31665, 5379.523, 887.0486, 3496.1475, 5440.6006, 2884.7505]
2025-09-16 14:30:20,774 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 147.0, 1000.0, 193.0, 673.0, 1000.0, 515.0]
2025-09-16 14:30:20,790 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 7 minutes)
2025-09-16 14:32:20,675 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:32:35,076 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5462.43457 ± 21.740
2025-09-16 14:32:35,076 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5465.8174, 5426.877, 5504.0977, 5455.596, 5455.7017, 5474.203, 5471.557, 5427.592, 5474.9395, 5467.972]
2025-09-16 14:32:35,076 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:32:35,076 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (5462.43) for latency 9
2025-09-16 14:32:35,086 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 4 minutes, 16 seconds)
2025-09-16 14:34:48,984 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:35:03,338 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5432.24609 ± 36.640
2025-09-16 14:35:03,338 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5503.8887, 5434.0835, 5451.459, 5424.5195, 5356.4243, 5434.7373, 5397.799, 5425.527, 5461.7896, 5432.236]
2025-09-16 14:35:03,338 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:35:03,349 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 2 minutes, 48 seconds)
2025-09-16 14:37:08,087 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:37:24,377 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5109.38818 ± 21.200
2025-09-16 14:37:24,377 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5114.7607, 5150.961, 5121.6313, 5123.4023, 5069.424, 5117.169, 5095.988, 5107.186, 5107.3115, 5086.052]
2025-09-16 14:37:24,377 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:37:24,385 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 36 seconds)
2025-09-16 14:39:31,006 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:39:46,789 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5428.45020 ± 35.599
2025-09-16 14:39:46,789 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5475.0166, 5383.3423, 5445.8755, 5436.6255, 5457.6484, 5438.188, 5366.7617, 5469.547, 5388.042, 5423.4517]
2025-09-16 14:39:46,789 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:39:46,800 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 58 minutes, 19 seconds)
2025-09-16 14:41:51,380 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:42:05,615 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5479.45801 ± 14.579
2025-09-16 14:42:05,615 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5499.6157, 5453.1367, 5470.5293, 5487.5586, 5480.765, 5479.4097, 5475.397, 5459.831, 5498.888, 5489.4536]
2025-09-16 14:42:05,615 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:42:05,615 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (5479.46) for latency 9
2025-09-16 14:42:05,627 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 56 minutes, 23 seconds)
2025-09-16 14:44:10,693 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:44:25,106 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4940.72754 ± 1344.947
2025-09-16 14:44:25,106 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5380.251, 5357.7007, 5411.822, 5370.1855, 5402.945, 5435.7773, 5406.948, 5386.0005, 5349.0576, 906.5825]
2025-09-16 14:44:25,107 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 168.0]
2025-09-16 14:44:25,115 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 54 minutes, 26 seconds)
2025-09-16 14:46:35,012 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:46:50,720 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5365.25000 ± 42.494
2025-09-16 14:46:50,720 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5351.186, 5398.348, 5411.69, 5374.999, 5380.939, 5380.3857, 5307.929, 5267.844, 5383.312, 5395.869]
2025-09-16 14:46:50,720 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:46:50,727 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 51 minutes, 52 seconds)
2025-09-16 14:48:57,293 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:49:08,197 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4061.83008 ± 1925.355
2025-09-16 14:49:08,197 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5292.453, 5312.395, 5305.0176, 1197.6984, 632.3248, 5338.7026, 5322.0513, 1587.6581, 5327.873, 5302.1294]
2025-09-16 14:49:08,197 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 230.0, 114.0, 1000.0, 1000.0, 305.0, 1000.0, 1000.0]
2025-09-16 14:49:08,232 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 49 minutes, 16 seconds)
2025-09-16 14:51:11,124 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:51:24,139 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4915.94092 ± 1443.247
2025-09-16 14:51:24,139 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5381.9565, 5396.5664, 5386.2393, 5400.0, 5405.473, 5385.1914, 5426.5127, 5411.025, 586.399, 5380.051]
2025-09-16 14:51:24,139 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 115.0, 1000.0]
2025-09-16 14:51:24,161 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 46 minutes, 29 seconds)
2025-09-16 14:53:25,712 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:53:34,930 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3555.38159 ± 2308.429
2025-09-16 14:53:34,930 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5432.6167, 5420.0225, 723.7822, 5470.9365, 5461.0474, 706.8204, 5406.541, 845.4645, 639.1831, 5447.401]
2025-09-16 14:53:34,930 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 135.0, 1000.0, 1000.0, 139.0, 1000.0, 165.0, 117.0, 1000.0]
2025-09-16 14:53:34,939 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 43 minutes, 39 seconds)
2025-09-16 14:55:41,516 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:55:50,824 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3515.22729 ± 2351.479
2025-09-16 14:55:50,825 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5450.618, 715.2435, 5453.461, 5442.5645, 637.2855, 583.1037, 607.2297, 5458.7256, 5357.364, 5446.681]
2025-09-16 14:55:50,825 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 137.0, 1000.0, 1000.0, 121.0, 116.0, 110.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:55:50,854 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 41 minutes, 8 seconds)
2025-09-16 14:57:54,661 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:58:08,583 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5470.93701 ± 58.312
2025-09-16 14:58:08,583 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5473.6025, 5526.8755, 5381.4937, 5447.426, 5452.817, 5482.583, 5556.3096, 5468.8647, 5375.1304, 5544.271]
2025-09-16 14:58:08,584 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:58:08,598 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 38 minutes, 24 seconds)
2025-09-16 15:00:14,911 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:00:29,459 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5327.25879 ± 21.125
2025-09-16 15:00:29,459 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5278.74, 5338.271, 5321.97, 5320.7, 5357.2285, 5315.6353, 5345.8745, 5348.882, 5319.0034, 5326.2866]
2025-09-16 15:00:29,459 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:00:29,475 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 36 minutes, 19 seconds)
2025-09-16 15:02:34,063 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:02:49,698 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5483.18896 ± 11.198
2025-09-16 15:02:49,698 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5494.2163, 5480.587, 5485.409, 5470.3174, 5483.2705, 5489.457, 5498.7524, 5482.5166, 5457.9937, 5489.373]
2025-09-16 15:02:49,698 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:02:49,698 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (5483.19) for latency 9
2025-09-16 15:02:49,738 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 34 minutes, 16 seconds)
2025-09-16 15:05:01,702 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:05:16,398 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5336.34229 ± 23.060
2025-09-16 15:05:16,398 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5340.325, 5326.4297, 5360.3125, 5343.6035, 5359.942, 5313.4604, 5378.7524, 5313.7085, 5305.192, 5321.6943]
2025-09-16 15:05:16,398 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:05:16,405 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 32 minutes, 44 seconds)
2025-09-16 15:07:25,338 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:07:39,854 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5326.84473 ± 21.118
2025-09-16 15:07:39,854 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5329.9893, 5326.455, 5354.6934, 5331.396, 5312.3433, 5344.6387, 5299.7617, 5292.473, 5316.5913, 5360.1094]
2025-09-16 15:07:39,854 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:07:39,861 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 30 minutes, 43 seconds)
2025-09-16 15:09:45,931 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:09:57,661 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4528.75537 ± 1130.440
2025-09-16 15:09:57,662 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5411.4717, 5465.2554, 2817.569, 5407.0073, 5426.2344, 3789.955, 5494.825, 5381.3057, 2945.9648, 3147.9648]
2025-09-16 15:09:57,662 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 529.0, 1000.0, 1000.0, 702.0, 1000.0, 1000.0, 538.0, 572.0]
2025-09-16 15:09:57,670 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 28 minutes, 21 seconds)
2025-09-16 15:12:03,716 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:12:18,394 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5314.65137 ± 17.025
2025-09-16 15:12:18,394 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5312.892, 5315.3516, 5299.1274, 5305.8296, 5281.768, 5333.2456, 5315.3887, 5310.9663, 5346.5996, 5325.3403]
2025-09-16 15:12:18,394 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:12:18,405 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 25 minutes, 59 seconds)
2025-09-16 15:14:24,110 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:14:38,661 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5340.72266 ± 29.648
2025-09-16 15:14:38,661 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5353.3384, 5317.834, 5385.3223, 5333.663, 5315.0273, 5399.6406, 5336.4883, 5306.155, 5345.51, 5314.249]
2025-09-16 15:14:38,661 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:14:38,671 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 23 minutes, 37 seconds)
2025-09-16 15:16:46,137 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:16:59,743 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4907.97119 ± 1288.288
2025-09-16 15:16:59,743 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5369.3535, 5337.4443, 5362.4194, 5368.018, 1044.3464, 5292.065, 5315.483, 5319.97, 5284.249, 5386.365]
2025-09-16 15:16:59,743 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 193.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:16:59,750 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 21 minutes, 6 seconds)
2025-09-16 15:18:54,758 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:19:09,487 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5403.58301 ± 23.996
2025-09-16 15:19:09,487 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5429.155, 5384.592, 5383.825, 5381.49, 5420.422, 5359.967, 5404.1045, 5408.5005, 5426.7754, 5437.0]
2025-09-16 15:19:09,487 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:19:09,494 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 18 minutes, 23 seconds)
2025-09-16 15:21:17,500 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:21:33,510 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5342.71289 ± 19.163
2025-09-16 15:21:33,510 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5350.6714, 5324.633, 5314.0513, 5310.0454, 5370.9214, 5360.1616, 5346.6455, 5358.436, 5348.179, 5343.3823]
2025-09-16 15:21:33,510 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:21:33,530 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 16 minutes, 14 seconds)
2025-09-16 15:23:40,463 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:23:56,360 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5311.82764 ± 22.629
2025-09-16 15:23:56,361 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5359.5605, 5311.9414, 5289.474, 5318.65, 5322.4976, 5328.253, 5306.168, 5303.0835, 5309.215, 5269.4365]
2025-09-16 15:23:56,361 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:23:56,370 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 13 minutes, 57 seconds)
2025-09-16 15:26:05,913 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:26:20,964 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5222.99512 ± 8.882
2025-09-16 15:26:20,964 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5209.7256, 5221.7534, 5218.825, 5208.561, 5221.367, 5222.969, 5226.755, 5234.5376, 5237.523, 5227.937]
2025-09-16 15:26:20,964 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:26:20,973 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 11 minutes, 42 seconds)
2025-09-16 15:28:25,266 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:28:39,778 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5362.16650 ± 20.232
2025-09-16 15:28:39,778 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5393.1943, 5327.9487, 5336.6196, 5369.907, 5357.705, 5375.5786, 5382.9272, 5361.1694, 5375.4204, 5341.196]
2025-09-16 15:28:39,778 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:28:39,789 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 9 minutes, 20 seconds)
2025-09-16 15:30:39,825 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:30:50,195 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3771.30859 ± 2237.369
2025-09-16 15:30:50,195 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [2170.0715, 5543.853, 5558.522, 514.58453, 5556.8794, 695.5736, 5568.7153, 5604.257, 5570.9077, 929.72314]
2025-09-16 15:30:50,195 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [399.0, 1000.0, 1000.0, 100.0, 1000.0, 123.0, 1000.0, 1000.0, 1000.0, 168.0]
2025-09-16 15:30:50,224 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 7 minutes)
2025-09-16 15:32:53,060 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:33:07,455 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5391.91357 ± 19.342
2025-09-16 15:33:07,455 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5407.1055, 5399.9136, 5379.296, 5380.185, 5371.4785, 5356.0044, 5386.5933, 5407.9595, 5409.701, 5420.904]
2025-09-16 15:33:07,455 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:33:07,464 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 37 seconds)
2025-09-16 15:35:05,896 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:35:20,143 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5457.13770 ± 25.959
2025-09-16 15:35:20,143 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5471.593, 5429.343, 5475.423, 5433.5654, 5480.589, 5451.4595, 5401.2046, 5483.031, 5474.7266, 5470.4365]
2025-09-16 15:35:20,143 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:35:20,150 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 16 seconds)
2025-09-16 15:37:22,539 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:37:38,214 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5424.19238 ± 17.025
2025-09-16 15:37:38,214 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5388.6533, 5432.55, 5436.7227, 5417.074, 5439.9956, 5424.8, 5436.112, 5405.55, 5446.7417, 5413.7217]
2025-09-16 15:37:38,214 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:37:38,222 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1251 [DEBUG]: Training session finished
