2025-09-16 11:23:26,838 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.000-delay_6
2025-09-16 11:23:26,839 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.000-delay_6
2025-09-16 11:23:26,839 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'6': <latency_env.delayed_mdp.ConstantDelay object at 0x15010bae84d0>}
2025-09-16 11:23:26,839 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 11:23:26,843 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 11:23:26,862 baseline-bpql-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=478, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 11:23:26,862 baseline-bpql-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 11:23:28,554 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 11:23:28,554 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 11:25:13,857 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:25:14,526 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 272.27927 ± 12.036
2025-09-16 11:25:14,526 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [265.26764, 272.4952, 270.71265, 305.8211, 271.724, 270.17645, 261.80014, 276.70996, 265.0786, 263.00693]
2025-09-16 11:25:14,526 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [51.0, 53.0, 53.0, 59.0, 53.0, 53.0, 51.0, 54.0, 51.0, 52.0]
2025-09-16 11:25:14,526 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (272.28) for latency 6
2025-09-16 11:25:14,532 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 54 minutes, 51 seconds)
2025-09-16 11:27:07,984 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:27:08,636 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 301.96588 ± 43.159
2025-09-16 11:27:08,636 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [288.02957, 300.02005, 205.9, 287.7533, 371.73685, 276.7933, 326.89926, 291.27426, 351.3487, 319.90375]
2025-09-16 11:27:08,636 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [54.0, 56.0, 41.0, 54.0, 69.0, 53.0, 61.0, 55.0, 66.0, 60.0]
2025-09-16 11:27:08,636 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (301.97) for latency 6
2025-09-16 11:27:08,650 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 59 minutes, 44 seconds)
2025-09-16 11:29:03,006 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:29:03,878 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 367.56296 ± 32.666
2025-09-16 11:29:03,878 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [376.89578, 354.7064, 385.5366, 335.65198, 310.66483, 403.89682, 395.48868, 418.78864, 358.6103, 335.38947]
2025-09-16 11:29:03,878 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [72.0, 69.0, 71.0, 63.0, 59.0, 76.0, 77.0, 79.0, 67.0, 64.0]
2025-09-16 11:29:03,878 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (367.56) for latency 6
2025-09-16 11:29:03,882 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 42 seconds)
2025-09-16 11:30:59,220 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:31:00,021 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 358.90952 ± 36.800
2025-09-16 11:31:00,021 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [323.75226, 407.02032, 358.3677, 355.97253, 302.3339, 371.0033, 426.85306, 379.49182, 323.10547, 341.19485]
2025-09-16 11:31:00,021 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [60.0, 77.0, 67.0, 66.0, 56.0, 69.0, 80.0, 70.0, 60.0, 63.0]
2025-09-16 11:31:00,030 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 35 seconds)
2025-09-16 11:32:54,468 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:32:55,717 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 526.12506 ± 86.699
2025-09-16 11:32:55,717 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [538.1115, 516.1686, 362.33765, 580.1665, 603.82635, 559.5316, 620.6673, 376.49866, 603.75146, 500.19077]
2025-09-16 11:32:55,717 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 98.0, 76.0, 114.0, 117.0, 103.0, 122.0, 81.0, 120.0, 92.0]
2025-09-16 11:32:55,717 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (526.13) for latency 6
2025-09-16 11:32:55,723 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 59 minutes, 36 seconds)
2025-09-16 11:34:49,482 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:34:50,611 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 464.73251 ± 59.058
2025-09-16 11:34:50,612 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [414.9123, 455.69894, 507.7783, 564.1901, 393.3496, 498.14935, 361.71188, 504.37405, 441.82993, 505.33078]
2025-09-16 11:34:50,612 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 84.0, 94.0, 106.0, 74.0, 95.0, 66.0, 95.0, 84.0, 95.0]
2025-09-16 11:34:50,618 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 30 seconds)
2025-09-16 11:36:45,306 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:36:46,529 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 494.77304 ± 65.039
2025-09-16 11:36:46,529 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [593.72894, 426.3856, 446.49207, 394.8849, 488.19257, 439.98688, 545.454, 587.1608, 531.8272, 493.61792]
2025-09-16 11:36:46,529 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [125.0, 79.0, 84.0, 86.0, 91.0, 83.0, 115.0, 123.0, 102.0, 108.0]
2025-09-16 11:36:46,533 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 59 minutes, 8 seconds)
2025-09-16 11:38:44,494 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:38:45,699 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 489.20477 ± 67.913
2025-09-16 11:38:45,699 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [454.95786, 489.36154, 609.3261, 555.40735, 406.5345, 388.21744, 487.33163, 496.06482, 568.39404, 436.4525]
2025-09-16 11:38:45,700 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 92.0, 128.0, 108.0, 77.0, 74.0, 102.0, 96.0, 118.0, 82.0]
2025-09-16 11:38:45,712 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 58 minutes, 25 seconds)
2025-09-16 11:40:42,667 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:40:44,000 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 534.90704 ± 98.185
2025-09-16 11:40:44,000 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [747.61206, 427.98166, 516.44794, 528.18445, 459.7282, 523.0645, 673.1743, 420.79544, 501.02463, 551.0577]
2025-09-16 11:40:44,000 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [146.0, 82.0, 96.0, 103.0, 99.0, 107.0, 129.0, 92.0, 98.0, 104.0]
2025-09-16 11:40:44,000 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (534.91) for latency 6
2025-09-16 11:40:44,004 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 57 minutes, 8 seconds)
2025-09-16 11:42:41,988 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:42:43,396 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 575.53766 ± 91.088
2025-09-16 11:42:43,396 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [589.6397, 505.92493, 558.44995, 445.02005, 585.31445, 532.4541, 478.285, 674.4285, 769.89636, 615.9632]
2025-09-16 11:42:43,396 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [113.0, 100.0, 108.0, 84.0, 111.0, 101.0, 90.0, 128.0, 150.0, 118.0]
2025-09-16 11:42:43,396 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (575.54) for latency 6
2025-09-16 11:42:43,400 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 56 minutes, 18 seconds)
2025-09-16 11:44:39,989 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:44:41,751 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 711.17615 ± 110.145
2025-09-16 11:44:41,751 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [631.83154, 857.4535, 618.5905, 873.7172, 764.202, 657.47614, 851.9731, 549.2616, 641.86975, 665.3857]
2025-09-16 11:44:41,751 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [120.0, 176.0, 119.0, 173.0, 142.0, 141.0, 162.0, 105.0, 122.0, 127.0]
2025-09-16 11:44:41,751 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (711.18) for latency 6
2025-09-16 11:44:41,754 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 55 minutes, 22 seconds)
2025-09-16 11:46:39,397 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:46:40,856 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 574.20532 ± 77.905
2025-09-16 11:46:40,856 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [611.899, 458.04785, 579.01215, 525.30975, 737.26807, 525.18964, 616.1874, 510.81174, 652.97345, 525.3538]
2025-09-16 11:46:40,856 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [115.0, 88.0, 112.0, 110.0, 147.0, 114.0, 116.0, 110.0, 126.0, 100.0]
2025-09-16 11:46:40,861 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 54 minutes, 20 seconds)
2025-09-16 11:48:35,800 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:48:37,411 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 637.22101 ± 129.553
2025-09-16 11:48:37,411 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [591.33575, 500.68964, 661.4125, 513.7407, 489.34647, 636.6549, 750.40375, 794.88513, 545.06146, 888.67944]
2025-09-16 11:48:37,411 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [129.0, 106.0, 129.0, 110.0, 107.0, 139.0, 143.0, 153.0, 107.0, 187.0]
2025-09-16 11:48:37,415 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 51 minutes, 35 seconds)
2025-09-16 11:50:32,242 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:50:33,714 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 604.93665 ± 128.720
2025-09-16 11:50:33,714 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [545.4047, 829.96216, 576.8992, 866.8409, 624.3873, 580.785, 507.09296, 544.38257, 484.04208, 489.56973]
2025-09-16 11:50:33,714 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [101.0, 162.0, 109.0, 182.0, 121.0, 107.0, 98.0, 116.0, 90.0, 92.0]
2025-09-16 11:50:33,718 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 49 minutes, 3 seconds)
2025-09-16 11:52:28,554 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:52:30,077 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 636.51819 ± 149.855
2025-09-16 11:52:30,077 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [728.04175, 535.9325, 536.06757, 894.0773, 435.63705, 695.10144, 632.42346, 865.9536, 463.96704, 577.9795]
2025-09-16 11:52:30,077 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [137.0, 102.0, 104.0, 169.0, 91.0, 129.0, 120.0, 170.0, 86.0, 108.0]
2025-09-16 11:52:30,082 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 46 minutes, 13 seconds)
2025-09-16 11:54:25,864 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:54:27,887 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 779.49829 ± 311.343
2025-09-16 11:54:27,888 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [535.28284, 518.60504, 1037.715, 560.78125, 740.1767, 563.5983, 1531.9487, 1042.1315, 616.3078, 648.4361]
2025-09-16 11:54:27,888 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [117.0, 113.0, 200.0, 106.0, 140.0, 121.0, 304.0, 202.0, 117.0, 121.0]
2025-09-16 11:54:27,888 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (779.50) for latency 6
2025-09-16 11:54:27,895 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 44 minutes, 7 seconds)
2025-09-16 11:56:22,388 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:56:24,163 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 721.76770 ± 120.122
2025-09-16 11:56:24,163 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [643.8984, 524.87964, 867.46313, 801.10846, 556.56757, 737.57336, 913.9457, 669.3242, 713.0121, 789.90485]
2025-09-16 11:56:24,163 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [120.0, 114.0, 165.0, 153.0, 104.0, 139.0, 170.0, 128.0, 134.0, 165.0]
2025-09-16 11:56:24,174 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 41 minutes, 23 seconds)
2025-09-16 11:58:19,592 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:58:21,366 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 677.09253 ± 95.764
2025-09-16 11:58:21,366 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [683.2734, 514.49176, 610.68616, 694.82733, 675.96515, 645.90424, 902.05426, 757.58984, 635.04456, 651.0889]
2025-09-16 11:58:21,366 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [141.0, 99.0, 113.0, 133.0, 138.0, 122.0, 186.0, 141.0, 134.0, 137.0]
2025-09-16 11:58:21,373 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 39 minutes, 36 seconds)
2025-09-16 12:00:15,993 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:00:17,291 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 556.44861 ± 113.391
2025-09-16 12:00:17,291 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [412.01834, 502.76755, 516.6056, 545.2456, 779.95337, 411.9542, 525.0451, 622.27136, 530.085, 718.5396]
2025-09-16 12:00:17,291 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 94.0, 111.0, 103.0, 148.0, 76.0, 98.0, 119.0, 99.0, 133.0]
2025-09-16 12:00:17,306 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 37 minutes, 34 seconds)
2025-09-16 12:02:13,565 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:02:15,455 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 766.14587 ± 271.017
2025-09-16 12:02:15,455 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [829.31177, 563.64014, 630.3849, 1046.0546, 1232.4547, 1166.7206, 590.97943, 442.95404, 508.78976, 650.1688]
2025-09-16 12:02:15,455 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [158.0, 105.0, 135.0, 196.0, 249.0, 245.0, 111.0, 84.0, 94.0, 123.0]
2025-09-16 12:02:15,463 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 36 minutes, 6 seconds)
2025-09-16 12:04:10,237 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:04:11,768 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 618.72955 ± 74.001
2025-09-16 12:04:11,768 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [621.5067, 726.7833, 464.52625, 604.1045, 562.1001, 711.5563, 624.0566, 649.3619, 555.9926, 667.30707]
2025-09-16 12:04:11,768 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [114.0, 139.0, 94.0, 113.0, 105.0, 134.0, 117.0, 120.0, 111.0, 136.0]
2025-09-16 12:04:11,773 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 33 minutes, 45 seconds)
2025-09-16 12:06:07,227 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:06:09,334 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 804.22156 ± 346.809
2025-09-16 12:06:09,334 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [501.31738, 1365.152, 617.3439, 892.2469, 1070.2952, 535.6229, 476.62405, 1431.9875, 625.86224, 525.7639]
2025-09-16 12:06:09,334 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [95.0, 269.0, 128.0, 171.0, 222.0, 113.0, 105.0, 295.0, 136.0, 99.0]
2025-09-16 12:06:09,334 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (804.22) for latency 6
2025-09-16 12:06:09,368 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 32 minutes, 9 seconds)
2025-09-16 12:08:03,898 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:08:05,495 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 673.23151 ± 95.143
2025-09-16 12:08:05,496 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [583.929, 634.3382, 556.8471, 678.04175, 710.4415, 715.7546, 658.7567, 919.40894, 616.272, 658.52496]
2025-09-16 12:08:05,496 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [110.0, 121.0, 109.0, 128.0, 151.0, 137.0, 123.0, 172.0, 117.0, 127.0]
2025-09-16 12:08:05,506 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 29 minutes, 55 seconds)
2025-09-16 12:10:00,767 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:10:02,333 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 655.87915 ± 154.544
2025-09-16 12:10:02,333 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [885.66327, 580.6944, 636.86444, 552.28143, 501.51947, 586.7904, 759.2761, 534.6315, 544.4426, 976.6285]
2025-09-16 12:10:02,333 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [187.0, 108.0, 119.0, 103.0, 94.0, 111.0, 143.0, 100.0, 103.0, 186.0]
2025-09-16 12:10:02,338 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 28 minutes, 12 seconds)
2025-09-16 12:11:59,958 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:12:01,781 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 737.50513 ± 167.034
2025-09-16 12:12:01,781 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [538.3408, 1079.7802, 953.89105, 664.81067, 668.65155, 787.94794, 564.3269, 810.74854, 734.66675, 571.8873]
2025-09-16 12:12:01,781 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [101.0, 207.0, 182.0, 127.0, 127.0, 152.0, 107.0, 155.0, 139.0, 108.0]
2025-09-16 12:12:01,788 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 26 minutes, 34 seconds)
2025-09-16 12:13:56,366 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:13:58,992 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1049.02502 ± 338.810
2025-09-16 12:13:58,992 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [629.29663, 805.4086, 1043.3043, 1875.8424, 1061.2346, 829.61584, 738.0915, 1221.2087, 1278.0547, 1008.1926]
2025-09-16 12:13:58,992 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [123.0, 153.0, 209.0, 366.0, 199.0, 158.0, 140.0, 245.0, 262.0, 190.0]
2025-09-16 12:13:58,992 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (1049.03) for latency 6
2025-09-16 12:13:58,998 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 24 minutes, 50 seconds)
2025-09-16 12:15:55,174 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:15:57,496 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 952.54315 ± 247.645
2025-09-16 12:15:57,496 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [696.83374, 1078.7148, 754.37585, 794.0812, 891.35767, 777.8749, 1287.6184, 1337.7339, 659.5754, 1247.2661]
2025-09-16 12:15:57,496 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [132.0, 204.0, 144.0, 148.0, 168.0, 143.0, 242.0, 261.0, 120.0, 233.0]
2025-09-16 12:15:57,504 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 23 minutes, 6 seconds)
2025-09-16 12:17:55,283 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:17:57,687 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 931.68866 ± 187.045
2025-09-16 12:17:57,687 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [874.4856, 859.53094, 825.14136, 1123.0061, 1065.7821, 755.7741, 875.46075, 1373.9226, 760.828, 802.955]
2025-09-16 12:17:57,687 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [170.0, 165.0, 160.0, 215.0, 206.0, 146.0, 187.0, 266.0, 150.0, 149.0]
2025-09-16 12:17:57,713 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 22 minutes, 7 seconds)
2025-09-16 12:19:56,459 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:19:58,620 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 858.58728 ± 122.941
2025-09-16 12:19:58,620 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [671.1531, 829.2106, 741.9879, 999.9036, 1054.4954, 1011.3215, 834.8217, 904.61, 793.7776, 744.5922]
2025-09-16 12:19:58,620 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [130.0, 155.0, 141.0, 204.0, 204.0, 203.0, 157.0, 170.0, 157.0, 142.0]
2025-09-16 12:19:58,634 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 21 minutes, 7 seconds)
2025-09-16 12:21:58,749 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:22:01,540 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1046.29663 ± 282.558
2025-09-16 12:22:01,540 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1127.6147, 869.1198, 1127.0566, 1222.9633, 1338.8291, 774.159, 680.7392, 1000.37054, 1608.7942, 713.31934]
2025-09-16 12:22:01,540 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [217.0, 166.0, 218.0, 235.0, 261.0, 166.0, 130.0, 189.0, 312.0, 133.0]
2025-09-16 12:22:01,545 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 19 minutes, 56 seconds)
2025-09-16 12:24:00,439 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:24:03,210 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1032.37585 ± 283.924
2025-09-16 12:24:03,210 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [897.9673, 791.357, 953.0112, 923.1164, 1125.2906, 886.1973, 896.49316, 977.47, 1027.9312, 1844.9246]
2025-09-16 12:24:03,210 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [168.0, 154.0, 206.0, 175.0, 229.0, 191.0, 174.0, 202.0, 201.0, 392.0]
2025-09-16 12:24:03,214 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 18 minutes, 58 seconds)
2025-09-16 12:26:05,758 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:26:08,987 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1207.34216 ± 543.696
2025-09-16 12:26:08,987 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1118.519, 2344.7764, 801.2311, 1079.1847, 1155.4708, 880.65857, 630.502, 606.54236, 1476.0626, 1980.4745]
2025-09-16 12:26:08,987 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [234.0, 459.0, 154.0, 220.0, 222.0, 181.0, 127.0, 120.0, 287.0, 404.0]
2025-09-16 12:26:08,987 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (1207.34) for latency 6
2025-09-16 12:26:08,991 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 18 minutes, 36 seconds)
2025-09-16 12:28:07,249 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:28:10,310 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1164.05859 ± 524.077
2025-09-16 12:28:10,310 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1048.4045, 750.6045, 2181.3567, 653.64746, 1018.3583, 1279.9928, 611.8842, 1207.0404, 824.98987, 2064.3066]
2025-09-16 12:28:10,310 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [198.0, 140.0, 442.0, 124.0, 194.0, 251.0, 115.0, 254.0, 154.0, 398.0]
2025-09-16 12:28:10,319 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 16 minutes, 48 seconds)
2025-09-16 12:30:12,108 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:30:18,022 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2129.08008 ± 1025.530
2025-09-16 12:30:18,022 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [2443.457, 1925.4617, 2121.659, 1377.9775, 3037.9028, 4689.0205, 980.72076, 1771.9484, 1224.535, 1718.1173]
2025-09-16 12:30:18,022 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [481.0, 378.0, 427.0, 262.0, 596.0, 915.0, 193.0, 361.0, 236.0, 324.0]
2025-09-16 12:30:18,022 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (2129.08) for latency 6
2025-09-16 12:30:18,029 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 16 minutes, 16 seconds)
2025-09-16 12:32:11,897 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:32:15,811 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1424.57690 ± 1190.941
2025-09-16 12:32:15,811 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [722.58124, 692.6743, 4841.7046, 868.513, 1103.4498, 912.74023, 1829.6566, 890.212, 841.306, 1542.9303]
2025-09-16 12:32:15,811 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [135.0, 133.0, 971.0, 163.0, 234.0, 176.0, 358.0, 183.0, 174.0, 302.0]
2025-09-16 12:32:15,834 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 13 minutes, 5 seconds)
2025-09-16 12:34:15,512 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:34:18,923 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1266.11145 ± 900.658
2025-09-16 12:34:18,923 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [938.4483, 664.8515, 575.2946, 2462.4673, 994.55176, 756.5508, 821.8155, 1440.7635, 573.7418, 3432.6287]
2025-09-16 12:34:18,923 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [196.0, 137.0, 125.0, 481.0, 213.0, 151.0, 171.0, 284.0, 124.0, 665.0]
2025-09-16 12:34:18,928 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 11 minutes, 21 seconds)
2025-09-16 12:36:14,040 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:36:25,176 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3857.08057 ± 1495.760
2025-09-16 12:36:25,176 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [3131.8276, 5009.1514, 1198.5634, 5016.849, 1869.4453, 2226.974, 5038.3784, 5016.1704, 5016.1943, 5047.2485]
2025-09-16 12:36:25,176 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [612.0, 1000.0, 237.0, 1000.0, 385.0, 448.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:36:25,176 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (3857.08) for latency 6
2025-09-16 12:36:25,183 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 9 minutes, 24 seconds)
2025-09-16 12:38:30,780 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:38:43,309 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4372.07422 ± 1144.021
2025-09-16 12:38:43,309 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [3420.6274, 1453.621, 5065.576, 5111.925, 5148.012, 5140.1587, 5132.853, 4209.834, 5150.1562, 3887.976]
2025-09-16 12:38:43,309 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [667.0, 291.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 825.0, 1000.0, 758.0]
2025-09-16 12:38:43,309 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (4372.07) for latency 6
2025-09-16 12:38:43,330 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 10 minutes, 49 seconds)
2025-09-16 12:40:38,078 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:40:43,234 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1795.41333 ± 1234.079
2025-09-16 12:40:43,234 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5027.202, 1170.0657, 858.7048, 613.03156, 1972.4647, 2359.8755, 554.6603, 2121.7283, 1829.4944, 1446.9072]
2025-09-16 12:40:43,234 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 241.0, 166.0, 132.0, 410.0, 473.0, 113.0, 440.0, 382.0, 291.0]
2025-09-16 12:40:43,240 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 7 minutes, 7 seconds)
2025-09-16 12:42:39,259 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:42:47,874 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2928.43652 ± 1168.175
2025-09-16 12:42:47,875 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1184.672, 2597.8516, 1958.2496, 5026.466, 2317.574, 3395.1875, 3052.063, 2158.6594, 2690.8584, 4902.7827]
2025-09-16 12:42:47,875 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [240.0, 523.0, 398.0, 1000.0, 472.0, 721.0, 624.0, 443.0, 556.0, 1000.0]
2025-09-16 12:42:47,885 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 6 minutes, 24 seconds)
2025-09-16 12:44:47,455 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:44:57,035 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3386.16455 ± 1517.452
2025-09-16 12:44:57,035 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5036.0566, 2219.321, 5040.1562, 2911.3042, 3938.5215, 4996.8193, 4979.857, 1508.5061, 2226.6387, 1004.46466]
2025-09-16 12:44:57,035 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 438.0, 1000.0, 578.0, 791.0, 1000.0, 1000.0, 308.0, 454.0, 205.0]
2025-09-16 12:44:57,065 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 5 minutes, 30 seconds)
2025-09-16 12:46:56,409 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:47:03,309 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2584.33398 ± 2078.074
2025-09-16 12:47:03,309 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1123.3003, 5232.7695, 977.39105, 942.618, 1080.8916, 4837.4272, 5189.757, 473.78986, 774.0642, 5211.3296]
2025-09-16 12:47:03,309 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [219.0, 1000.0, 185.0, 191.0, 197.0, 921.0, 1000.0, 96.0, 142.0, 1000.0]
2025-09-16 12:47:03,320 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 3 minutes, 22 seconds)
2025-09-16 12:48:55,002 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:49:09,582 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5092.00586 ± 37.350
2025-09-16 12:49:09,582 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5091.693, 5151.1006, 5020.137, 5097.0024, 5088.196, 5053.4, 5137.79, 5090.468, 5066.4604, 5123.8096]
2025-09-16 12:49:09,583 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:49:09,583 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (5092.01) for latency 6
2025-09-16 12:49:09,587 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 58 minutes, 59 seconds)
2025-09-16 12:51:07,193 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:51:20,721 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4841.70898 ± 942.681
2025-09-16 12:51:20,721 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5219.1313, 5144.5996, 5182.959, 5136.7803, 2015.9667, 5091.152, 5133.2275, 5119.984, 5212.4834, 5160.8076]
2025-09-16 12:51:20,721 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 401.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:51:20,733 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 58 minutes, 59 seconds)
2025-09-16 12:53:26,332 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:53:35,832 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3540.63037 ± 1661.891
2025-09-16 12:53:35,832 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1444.9072, 5155.4067, 5150.8076, 1517.0736, 5215.9355, 1892.1687, 1387.2903, 4080.8477, 5229.439, 4332.427]
2025-09-16 12:53:35,832 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [278.0, 1000.0, 1000.0, 307.0, 1000.0, 368.0, 264.0, 773.0, 1000.0, 859.0]
2025-09-16 12:53:35,839 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 58 minutes, 47 seconds)
2025-09-16 12:55:23,874 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:55:38,514 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5080.72168 ± 158.453
2025-09-16 12:55:38,514 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5143.942, 5151.5195, 4610.625, 5125.578, 5072.62, 5161.8315, 5126.6934, 5154.777, 5130.0776, 5129.5547]
2025-09-16 12:55:38,514 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 926.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:55:38,522 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 55 minutes, 27 seconds)
2025-09-16 12:57:34,994 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:57:40,567 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2163.13232 ± 1130.289
2025-09-16 12:57:40,567 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [3148.0435, 2219.6475, 1231.5958, 1669.6958, 1588.0247, 1092.0702, 5108.4233, 2242.8687, 1800.9635, 1529.9926]
2025-09-16 12:57:40,567 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [599.0, 424.0, 236.0, 319.0, 308.0, 224.0, 972.0, 428.0, 340.0, 295.0]
2025-09-16 12:57:40,573 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 52 minutes, 34 seconds)
2025-09-16 12:59:38,097 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:59:44,942 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2477.52295 ± 1491.145
2025-09-16 12:59:44,942 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1445.95, 1243.9796, 1743.5067, 5140.163, 1173.7133, 1431.5586, 2541.0056, 4053.1628, 4760.6484, 1241.544]
2025-09-16 12:59:44,942 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [306.0, 244.0, 342.0, 1000.0, 228.0, 277.0, 509.0, 786.0, 1000.0, 241.0]
2025-09-16 12:59:44,967 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 50 minutes, 7 seconds)
2025-09-16 13:01:46,848 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:01:58,195 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4054.13208 ± 1637.013
2025-09-16 13:01:58,196 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1504.2506, 5018.567, 5154.734, 4995.3525, 5081.103, 876.339, 2465.9788, 5156.0024, 5115.4062, 5173.586]
2025-09-16 13:01:58,196 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [311.0, 1000.0, 1000.0, 1000.0, 1000.0, 164.0, 475.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:01:58,204 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 48 minutes, 22 seconds)
2025-09-16 13:03:53,845 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:04:08,169 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5195.87598 ± 24.481
2025-09-16 13:04:08,169 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5181.741, 5207.0635, 5189.7734, 5169.185, 5253.6514, 5195.8193, 5190.2925, 5207.144, 5204.8486, 5159.2407]
2025-09-16 13:04:08,169 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:04:08,169 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (5195.88) for latency 6
2025-09-16 13:04:08,179 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 45 minutes, 23 seconds)
2025-09-16 13:06:08,593 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:06:22,830 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5050.55762 ± 107.075
2025-09-16 13:06:22,830 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [4988.9253, 5173.223, 5161.8574, 5097.2075, 4875.501, 4907.496, 5066.9004, 5129.215, 5161.4106, 4943.841]
2025-09-16 13:06:22,830 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 943.0, 961.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:06:22,838 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 45 minutes, 14 seconds)
2025-09-16 13:08:16,807 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:08:31,318 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5244.61768 ± 63.826
2025-09-16 13:08:31,318 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5262.0884, 5278.955, 5292.807, 5303.7354, 5246.611, 5312.233, 5150.5586, 5294.823, 5137.9688, 5166.3936]
2025-09-16 13:08:31,318 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:08:31,318 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (5244.62) for latency 6
2025-09-16 13:08:31,352 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 44 minutes, 7 seconds)
2025-09-16 13:10:22,966 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:10:37,460 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5143.37549 ± 111.186
2025-09-16 13:10:37,461 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5146.7285, 5214.9014, 5257.294, 5135.9355, 5147.496, 5150.661, 5159.0435, 5250.715, 4837.5537, 5133.425]
2025-09-16 13:10:37,461 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:10:37,471 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 42 minutes, 13 seconds)
2025-09-16 13:12:33,602 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:12:46,821 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4794.09424 ± 691.822
2025-09-16 13:12:46,821 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [3520.5583, 5220.365, 5199.545, 5173.6675, 5181.5493, 5105.162, 4805.8965, 3343.5803, 5219.3315, 5171.285]
2025-09-16 13:12:46,821 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [666.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 635.0, 1000.0, 1000.0]
2025-09-16 13:12:46,828 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 39 minutes, 27 seconds)
2025-09-16 13:14:43,760 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:14:57,033 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4699.26416 ± 1313.833
2025-09-16 13:14:57,033 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5190.1743, 5139.2583, 5131.9053, 758.85095, 5112.001, 5101.1045, 5111.3135, 5113.782, 5137.847, 5196.4004]
2025-09-16 13:14:57,033 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 150.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:14:57,046 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 37 minutes, 19 seconds)
2025-09-16 13:16:57,026 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:17:09,796 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4796.87646 ± 1412.905
2025-09-16 13:17:09,796 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5218.253, 5280.6694, 5272.912, 5288.8306, 5288.124, 5245.3267, 5261.27, 5259.452, 558.6791, 5295.248]
2025-09-16 13:17:09,796 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 114.0, 1000.0]
2025-09-16 13:17:09,802 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 34 minutes, 53 seconds)
2025-09-16 13:19:05,649 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:19:19,456 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5269.38770 ± 113.522
2025-09-16 13:19:19,456 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5186.241, 5365.3027, 5189.83, 5377.611, 5400.5283, 5315.7163, 5139.9263, 5066.5156, 5400.3384, 5251.8633]
2025-09-16 13:19:19,456 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:19:19,456 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (5269.39) for latency 6
2025-09-16 13:19:19,464 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 32 minutes, 53 seconds)
2025-09-16 13:21:16,282 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:21:29,416 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4775.30957 ± 1211.887
2025-09-16 13:21:29,416 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5192.534, 5184.1, 5200.385, 5142.641, 1140.0737, 5192.1646, 5186.4434, 5177.285, 5146.279, 5191.195]
2025-09-16 13:21:29,416 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 219.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:21:29,449 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 31 minutes, 16 seconds)
2025-09-16 13:23:23,376 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:23:34,454 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3926.09961 ± 1792.419
2025-09-16 13:23:34,454 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [4980.7275, 5110.972, 1641.1099, 5116.9116, 5108.7827, 1262.0065, 718.3819, 5106.19, 5124.3066, 5091.605]
2025-09-16 13:23:34,454 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 344.0, 1000.0, 1000.0, 255.0, 145.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:23:34,462 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 28 minutes, 30 seconds)
2025-09-16 13:25:32,482 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:25:46,479 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5256.01123 ± 18.446
2025-09-16 13:25:46,479 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5276.502, 5248.642, 5260.6445, 5270.786, 5266.5728, 5261.542, 5206.839, 5246.947, 5261.1367, 5260.4995]
2025-09-16 13:25:46,479 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:25:46,488 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 26 minutes, 35 seconds)
2025-09-16 13:27:41,992 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:27:56,312 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5178.33643 ± 11.829
2025-09-16 13:27:56,312 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5162.095, 5182.498, 5184.4956, 5187.6675, 5170.3555, 5166.563, 5194.775, 5161.446, 5193.602, 5179.8623]
2025-09-16 13:27:56,312 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:27:56,321 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 24 minutes, 2 seconds)
2025-09-16 13:29:57,269 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:30:09,583 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4457.97363 ± 1423.104
2025-09-16 13:30:09,584 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [957.58405, 5145.2236, 5123.594, 5145.7456, 5175.2246, 5141.919, 5133.181, 5141.9688, 2418.5308, 5196.7646]
2025-09-16 13:30:09,584 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [179.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 485.0, 1000.0]
2025-09-16 13:30:09,589 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 22 minutes, 20 seconds)
2025-09-16 13:32:09,539 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:32:23,038 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5170.42334 ± 520.068
2025-09-16 13:32:23,038 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5322.1626, 5375.4116, 5345.7163, 5360.507, 3611.4438, 5365.6484, 5346.1514, 5347.0264, 5301.901, 5328.2656]
2025-09-16 13:32:23,038 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 678.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:32:23,060 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 20 minutes, 36 seconds)
2025-09-16 13:34:19,211 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:34:32,286 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4837.11914 ± 1287.326
2025-09-16 13:34:32,286 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1007.3822, 5410.119, 5295.5874, 4785.652, 5242.68, 5336.344, 5332.228, 5320.064, 5273.126, 5368.0117]
2025-09-16 13:34:32,286 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [200.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:34:32,316 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 18 minutes, 56 seconds)
2025-09-16 13:36:23,364 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:36:36,374 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4817.93457 ± 1268.322
2025-09-16 13:36:36,374 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5238.9697, 5224.973, 5247.9834, 5242.5303, 1013.38354, 5289.1465, 5243.1836, 5235.4, 5213.9854, 5229.7876]
2025-09-16 13:36:36,374 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 202.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:36:36,417 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 15 minutes, 49 seconds)
2025-09-16 13:38:38,469 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:38:52,637 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5250.46094 ± 13.371
2025-09-16 13:38:52,637 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5257.9644, 5253.7817, 5259.251, 5214.374, 5249.357, 5246.371, 5245.797, 5255.3267, 5256.333, 5266.056]
2025-09-16 13:38:52,637 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:38:52,656 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 14 minutes, 23 seconds)
2025-09-16 13:40:40,635 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:40:55,008 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5318.77588 ± 11.892
2025-09-16 13:40:55,009 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5323.556, 5309.044, 5316.1797, 5323.783, 5336.642, 5290.803, 5316.0, 5319.965, 5320.756, 5331.0303]
2025-09-16 13:40:55,009 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:40:55,009 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (5318.78) for latency 6
2025-09-16 13:40:55,018 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 10 minutes, 59 seconds)
2025-09-16 13:42:56,835 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:43:10,956 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5236.11084 ± 32.117
2025-09-16 13:43:10,956 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5235.9487, 5253.4653, 5172.827, 5290.0674, 5219.1665, 5253.514, 5275.233, 5232.0767, 5211.3604, 5217.4546]
2025-09-16 13:43:10,956 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:43:10,964 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 9 minutes, 6 seconds)
2025-09-16 13:45:07,555 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:45:16,580 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3450.69067 ± 2288.871
2025-09-16 13:45:16,580 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5325.358, 659.0606, 625.41705, 5342.983, 5330.49, 643.6612, 5317.434, 661.7818, 5285.409, 5315.3105]
2025-09-16 13:45:16,580 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 124.0, 120.0, 1000.0, 1000.0, 122.0, 1000.0, 128.0, 1000.0, 1000.0]
2025-09-16 13:45:16,589 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 6 minutes, 34 seconds)
2025-09-16 13:47:14,758 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:47:28,068 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5117.15137 ± 885.890
2025-09-16 13:47:28,068 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5399.896, 2459.8, 5443.936, 5424.0684, 5396.226, 5409.216, 5406.6787, 5399.7036, 5422.8984, 5409.0923]
2025-09-16 13:47:28,068 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 451.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:47:28,078 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 5 minutes, 9 seconds)
2025-09-16 13:49:24,560 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:49:37,440 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5026.98584 ± 875.876
2025-09-16 13:49:37,440 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5382.5312, 5382.954, 2440.001, 5337.433, 5401.7285, 5375.571, 4909.0386, 5172.695, 5459.1143, 5408.7935]
2025-09-16 13:49:37,440 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 453.0, 1000.0, 1000.0, 1000.0, 906.0, 954.0, 1000.0, 1000.0]
2025-09-16 13:49:37,467 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 2 minutes, 19 seconds)
2025-09-16 13:51:36,343 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:51:50,618 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5312.36914 ± 17.681
2025-09-16 13:51:50,618 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5312.359, 5349.7046, 5308.015, 5299.2046, 5327.3457, 5326.054, 5307.8364, 5289.2676, 5288.5884, 5315.311]
2025-09-16 13:51:50,618 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:51:50,625 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 1 minute, 11 seconds)
2025-09-16 13:53:38,925 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:53:51,943 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4967.89697 ± 1230.792
2025-09-16 13:53:51,943 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5371.782, 5386.112, 5382.0117, 1275.7383, 5397.224, 5353.5757, 5395.462, 5382.037, 5358.484, 5376.5386]
2025-09-16 13:53:51,943 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 235.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:53:51,980 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 57 minutes, 41 seconds)
2025-09-16 13:55:53,494 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:56:06,701 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4952.30176 ± 1177.279
2025-09-16 13:56:06,701 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5343.443, 5347.87, 5365.2886, 1420.557, 5333.6553, 5342.4546, 5350.103, 5342.9, 5333.1187, 5343.63]
2025-09-16 13:56:06,701 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 280.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:56:06,715 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 56 minutes, 20 seconds)
2025-09-16 13:58:03,293 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:58:17,582 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5282.29736 ± 10.453
2025-09-16 13:58:17,583 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5300.0293, 5279.1084, 5290.423, 5260.6973, 5285.7847, 5288.438, 5280.2944, 5285.179, 5283.802, 5269.22]
2025-09-16 13:58:17,583 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:58:17,618 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 54 minutes, 7 seconds)
2025-09-16 14:00:08,238 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:00:21,490 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4838.21631 ± 1377.157
2025-09-16 14:00:21,490 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5300.631, 5282.1514, 5292.176, 5303.3677, 5302.9546, 5293.3667, 706.90045, 5298.063, 5324.0493, 5278.5024]
2025-09-16 14:00:21,490 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 134.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:00:21,498 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 51 minutes, 31 seconds)
2025-09-16 14:02:25,016 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:02:38,019 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4797.20020 ± 1375.393
2025-09-16 14:02:38,019 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5272.012, 5253.608, 671.10956, 5247.072, 5263.284, 5262.172, 5258.906, 5237.839, 5256.9263, 5249.0767]
2025-09-16 14:02:38,019 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 127.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:02:38,029 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 49 minutes, 38 seconds)
2025-09-16 14:04:32,830 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:04:40,914 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3113.95605 ± 2269.615
2025-09-16 14:04:40,914 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5313.5923, 5339.7983, 5350.945, 383.78806, 358.95337, 2161.736, 5332.1504, 5338.5894, 781.89795, 778.1103]
2025-09-16 14:04:40,914 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 75.0, 69.0, 402.0, 1000.0, 1000.0, 150.0, 152.0]
2025-09-16 14:04:40,925 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 47 minutes, 35 seconds)
2025-09-16 14:06:37,378 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:06:51,652 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5376.36426 ± 11.610
2025-09-16 14:06:51,652 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5394.752, 5380.38, 5382.0034, 5370.6577, 5369.985, 5363.509, 5371.661, 5393.2437, 5380.964, 5356.4834]
2025-09-16 14:06:51,652 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:06:51,652 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (5376.36) for latency 6
2025-09-16 14:06:51,660 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 45 minutes, 8 seconds)
2025-09-16 14:08:48,368 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:09:02,709 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5333.39453 ± 9.130
2025-09-16 14:09:02,709 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5320.669, 5341.8374, 5340.723, 5324.8096, 5328.26, 5326.9497, 5335.1265, 5350.5264, 5325.2095, 5339.838]
2025-09-16 14:09:02,709 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:09:02,717 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 43 minutes)
2025-09-16 14:10:59,887 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:11:14,501 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5251.06738 ± 17.320
2025-09-16 14:11:14,501 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5281.6714, 5230.5493, 5243.0015, 5266.5674, 5260.4727, 5235.1753, 5256.4937, 5222.8057, 5249.968, 5263.967]
2025-09-16 14:11:14,501 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:11:14,513 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 41 minutes, 21 seconds)
2025-09-16 14:13:06,624 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:13:21,303 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5266.83838 ± 7.127
2025-09-16 14:13:21,303 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5282.5474, 5266.375, 5274.4795, 5262.5107, 5263.1323, 5267.4717, 5266.1606, 5254.8374, 5268.698, 5262.169]
2025-09-16 14:13:21,303 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:13:21,313 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 38 minutes, 35 seconds)
2025-09-16 14:15:22,944 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:15:37,617 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5280.76123 ± 23.070
2025-09-16 14:15:37,617 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5303.604, 5312.135, 5277.483, 5274.0654, 5288.661, 5291.7837, 5278.7437, 5277.2046, 5282.805, 5221.1294]
2025-09-16 14:15:37,617 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:15:37,624 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 37 minutes, 12 seconds)
2025-09-16 14:17:34,388 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:17:49,033 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5207.09961 ± 50.778
2025-09-16 14:17:49,033 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5242.248, 5129.024, 5227.3945, 5253.188, 5126.9395, 5244.653, 5139.065, 5237.2915, 5214.182, 5257.009]
2025-09-16 14:17:49,033 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:17:49,092 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 35 minutes, 3 seconds)
2025-09-16 14:19:45,685 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:20:00,054 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5331.68799 ± 11.458
2025-09-16 14:20:00,055 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5324.48, 5326.966, 5363.702, 5325.3296, 5324.257, 5335.19, 5329.741, 5329.455, 5335.253, 5322.5044]
2025-09-16 14:20:00,055 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:20:00,063 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 32 minutes, 52 seconds)
2025-09-16 14:21:57,299 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:22:07,702 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3914.52075 ± 2178.341
2025-09-16 14:22:07,702 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5223.851, 5336.588, 5372.432, 509.28738, 5342.4243, 5368.3765, 5360.96, 749.0017, 508.683, 5373.602]
2025-09-16 14:22:07,702 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 99.0, 1000.0, 1000.0, 1000.0, 141.0, 96.0, 1000.0]
2025-09-16 14:22:07,729 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 30 minutes, 29 seconds)
2025-09-16 14:23:58,814 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:24:12,223 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4944.91504 ± 1084.239
2025-09-16 14:24:12,223 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5309.1025, 5271.0806, 1692.4465, 5315.9233, 5296.675, 5308.61, 5318.14, 5319.086, 5312.987, 5305.097]
2025-09-16 14:24:12,223 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 319.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:24:12,244 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 28 minutes, 12 seconds)
2025-09-16 14:26:18,210 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:26:32,907 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5238.09277 ± 14.181
2025-09-16 14:26:32,907 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5227.992, 5242.9595, 5252.903, 5250.6675, 5248.5312, 5240.362, 5243.491, 5244.9146, 5224.793, 5204.3115]
2025-09-16 14:26:32,907 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:26:32,917 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 26 minutes, 12 seconds)
2025-09-16 14:28:29,422 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:28:43,781 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5244.81299 ± 24.183
2025-09-16 14:28:43,781 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5224.588, 5228.0493, 5218.53, 5244.1387, 5297.5034, 5222.2744, 5233.018, 5248.697, 5256.7607, 5274.572]
2025-09-16 14:28:43,781 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:28:43,790 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 24 minutes)
2025-09-16 14:30:40,645 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:30:52,548 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4400.28076 ± 1806.869
2025-09-16 14:30:52,548 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [772.11395, 5309.9688, 5243.1353, 5260.0522, 5305.3022, 5301.288, 5402.612, 802.7615, 5290.667, 5314.9077]
2025-09-16 14:30:52,548 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [145.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 152.0, 1000.0, 1000.0]
2025-09-16 14:30:52,557 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 21 minutes, 44 seconds)
2025-09-16 14:32:39,648 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:32:52,258 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4690.11621 ± 1415.254
2025-09-16 14:32:52,259 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5305.8394, 4400.6567, 554.2198, 5326.9536, 5327.972, 5304.6475, 4659.5034, 5338.0605, 5364.548, 5318.763]
2025-09-16 14:32:52,259 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 819.0, 102.0, 1000.0, 1000.0, 1000.0, 868.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:32:52,265 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 19 minutes, 20 seconds)
2025-09-16 14:34:48,585 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:34:59,229 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3986.64893 ± 2061.595
2025-09-16 14:34:59,229 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5339.5254, 1300.7098, 5333.9023, 5343.6396, 5334.259, 5321.6753, 535.287, 5332.0366, 712.36804, 5313.0864]
2025-09-16 14:34:59,229 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 244.0, 1000.0, 1000.0, 1000.0, 1000.0, 99.0, 1000.0, 140.0, 1000.0]
2025-09-16 14:34:59,254 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 17 minutes, 15 seconds)
2025-09-16 14:36:59,678 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:37:11,934 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4433.10352 ± 1786.003
2025-09-16 14:37:11,934 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5339.0664, 5312.456, 5342.493, 987.7113, 738.1338, 5334.514, 5334.641, 5304.2705, 5311.1196, 5326.624]
2025-09-16 14:37:11,934 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 210.0, 139.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:37:11,952 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 14 minutes, 54 seconds)
2025-09-16 14:39:08,344 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:39:22,668 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5390.08545 ± 10.856
2025-09-16 14:39:22,669 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5392.8086, 5391.107, 5364.22, 5401.329, 5388.1255, 5397.109, 5376.717, 5394.471, 5400.1255, 5394.8438]
2025-09-16 14:39:22,669 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:39:22,669 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (5390.09) for latency 6
2025-09-16 14:39:22,680 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 12 minutes, 46 seconds)
2025-09-16 14:41:19,639 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:41:34,050 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5353.62207 ± 10.296
2025-09-16 14:41:34,050 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5354.13, 5352.253, 5370.4634, 5347.765, 5342.851, 5352.5996, 5333.264, 5365.584, 5355.937, 5361.375]
2025-09-16 14:41:34,050 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:41:34,063 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 10 minutes, 41 seconds)
2025-09-16 14:43:30,451 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:43:44,735 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5416.97168 ± 4.560
2025-09-16 14:43:44,735 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5420.9956, 5416.986, 5414.54, 5419.9478, 5420.167, 5417.0225, 5414.0684, 5416.6997, 5423.2344, 5406.06]
2025-09-16 14:43:44,735 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:43:44,735 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (5416.97) for latency 6
2025-09-16 14:43:44,776 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes, 42 seconds)
2025-09-16 14:45:41,216 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:45:55,753 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5331.60059 ± 6.108
2025-09-16 14:45:55,753 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5324.633, 5332.8726, 5330.48, 5335.9204, 5324.2017, 5334.117, 5340.21, 5336.898, 5320.6406, 5336.0317]
2025-09-16 14:45:55,753 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:45:55,760 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 33 seconds)
2025-09-16 14:47:52,224 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:48:06,353 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5392.34473 ± 11.960
2025-09-16 14:48:06,353 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5393.235, 5364.3345, 5386.157, 5390.1885, 5390.665, 5407.8228, 5389.35, 5390.6255, 5405.4707, 5405.5933]
2025-09-16 14:48:06,353 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:48:06,361 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 21 seconds)
2025-09-16 14:50:05,412 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:50:20,007 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5255.48242 ± 7.703
2025-09-16 14:50:20,007 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5263.6797, 5256.3306, 5250.8525, 5255.6323, 5240.2695, 5265.784, 5245.647, 5260.57, 5261.9526, 5254.107]
2025-09-16 14:50:20,007 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:50:20,015 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 11 seconds)
2025-09-16 14:52:16,469 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:52:30,666 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5416.51270 ± 11.676
2025-09-16 14:52:30,666 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5429.872, 5422.345, 5423.129, 5423.6514, 5402.952, 5419.0645, 5417.4585, 5387.8086, 5415.959, 5422.8813]
2025-09-16 14:52:30,666 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:52:30,681 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1251 [DEBUG]: Training session finished
