2025-09-16 13:35:52,744 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.000-delay_18
2025-09-16 13:35:52,744 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.000-delay_18
2025-09-16 13:35:52,744 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'18': <latency_env.delayed_mdp.ConstantDelay object at 0x154f955f8c50>}
2025-09-16 13:35:52,744 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 13:35:52,751 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 13:35:52,769 baseline-bpql-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=682, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 13:35:52,769 baseline-bpql-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 13:35:54,684 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 13:35:54,685 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 13:37:47,518 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:37:48,510 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 354.14606 ± 17.389
2025-09-16 13:37:48,510 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [355.7478, 331.7846, 371.96808, 351.15762, 330.47858, 348.55698, 381.9744, 379.82135, 344.04294, 345.9283]
2025-09-16 13:37:48,510 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [67.0, 62.0, 70.0, 66.0, 62.0, 65.0, 72.0, 72.0, 64.0, 65.0]
2025-09-16 13:37:48,510 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (354.15) for latency 18
2025-09-16 13:37:48,524 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 7 minutes, 50 seconds)
2025-09-16 13:39:49,560 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:39:50,944 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 452.88849 ± 62.744
2025-09-16 13:39:50,944 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [492.3628, 439.52454, 404.6891, 517.34424, 537.646, 380.1827, 400.0342, 372.58032, 548.5655, 435.9558]
2025-09-16 13:39:50,944 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [95.0, 86.0, 77.0, 110.0, 109.0, 73.0, 77.0, 70.0, 110.0, 85.0]
2025-09-16 13:39:50,944 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (452.89) for latency 18
2025-09-16 13:39:50,948 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 12 minutes, 56 seconds)
2025-09-16 13:41:52,035 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:41:53,296 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 415.95782 ± 29.971
2025-09-16 13:41:53,296 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [464.86942, 434.62332, 398.4094, 436.82138, 389.0229, 420.49164, 399.6342, 389.11957, 456.60547, 369.98093]
2025-09-16 13:41:53,296 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 85.0, 78.0, 89.0, 75.0, 78.0, 76.0, 75.0, 89.0, 71.0]
2025-09-16 13:41:53,300 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 13 minutes, 15 seconds)
2025-09-16 13:43:54,439 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:43:55,565 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 410.08139 ± 71.853
2025-09-16 13:43:55,565 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [556.2546, 309.06897, 440.8441, 416.48807, 349.87247, 435.56638, 473.08542, 324.7323, 357.30048, 437.601]
2025-09-16 13:43:55,565 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [108.0, 61.0, 84.0, 81.0, 66.0, 83.0, 103.0, 63.0, 67.0, 83.0]
2025-09-16 13:43:55,573 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 12 minutes, 21 seconds)
2025-09-16 13:45:58,162 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:45:59,636 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 458.71851 ± 90.577
2025-09-16 13:45:59,636 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [423.44913, 446.95483, 596.8464, 540.13477, 291.84143, 517.81946, 559.2991, 361.0728, 453.77875, 395.98837]
2025-09-16 13:45:59,636 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 87.0, 123.0, 109.0, 57.0, 102.0, 118.0, 80.0, 94.0, 82.0]
2025-09-16 13:45:59,636 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (458.72) for latency 18
2025-09-16 13:45:59,642 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 11 minutes, 34 seconds)
2025-09-16 13:48:00,864 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:48:02,296 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 472.90155 ± 98.781
2025-09-16 13:48:02,297 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [360.43866, 593.4229, 663.2714, 532.06714, 511.1756, 457.6264, 350.5514, 449.71698, 451.93405, 358.81094]
2025-09-16 13:48:02,297 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 125.0, 131.0, 116.0, 108.0, 99.0, 74.0, 95.0, 99.0, 80.0]
2025-09-16 13:48:02,297 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (472.90) for latency 18
2025-09-16 13:48:02,301 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 12 minutes, 18 seconds)
2025-09-16 13:50:04,904 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:50:06,418 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 526.16278 ± 98.335
2025-09-16 13:50:06,418 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [565.779, 662.32007, 671.3134, 348.41605, 581.42004, 557.18463, 441.02316, 504.9122, 416.47382, 512.78564]
2025-09-16 13:50:06,418 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [107.0, 130.0, 144.0, 71.0, 112.0, 117.0, 97.0, 95.0, 92.0, 97.0]
2025-09-16 13:50:06,418 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (526.16) for latency 18
2025-09-16 13:50:06,430 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 10 minutes, 47 seconds)
2025-09-16 13:52:07,885 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:52:09,244 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 491.19873 ± 90.020
2025-09-16 13:52:09,244 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [616.222, 440.0108, 387.87384, 552.7367, 637.9117, 492.08276, 494.24265, 351.97647, 526.29504, 412.6358]
2025-09-16 13:52:09,244 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [123.0, 88.0, 78.0, 106.0, 124.0, 98.0, 95.0, 72.0, 106.0, 82.0]
2025-09-16 13:52:09,250 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 8 minutes, 53 seconds)
2025-09-16 13:54:12,220 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:54:13,617 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 475.21674 ± 105.387
2025-09-16 13:54:13,617 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [490.2762, 393.30356, 552.72046, 730.4523, 377.27676, 379.7445, 543.39624, 473.2328, 406.66092, 405.1037]
2025-09-16 13:54:13,617 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [108.0, 82.0, 114.0, 142.0, 79.0, 79.0, 113.0, 100.0, 86.0, 84.0]
2025-09-16 13:54:13,624 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 3 hours, 7 minutes, 28 seconds)
2025-09-16 13:56:16,016 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:56:17,809 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 554.78918 ± 103.875
2025-09-16 13:56:17,809 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [570.29535, 572.4527, 740.1245, 378.9443, 446.59128, 583.2358, 689.8569, 483.58456, 487.1389, 595.66754]
2025-09-16 13:56:17,809 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [127.0, 115.0, 149.0, 74.0, 85.0, 110.0, 142.0, 95.0, 104.0, 127.0]
2025-09-16 13:56:17,809 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (554.79) for latency 18
2025-09-16 13:56:17,820 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 3 hours, 5 minutes, 27 seconds)
2025-09-16 13:58:20,710 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 13:58:22,080 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 504.61099 ± 75.642
2025-09-16 13:58:22,081 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [541.3072, 480.975, 508.74667, 501.62933, 386.7197, 493.6126, 542.33856, 562.8158, 379.29703, 648.66797]
2025-09-16 13:58:22,081 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [102.0, 92.0, 97.0, 103.0, 75.0, 94.0, 103.0, 107.0, 74.0, 126.0]
2025-09-16 13:58:22,085 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 3 hours, 3 minutes, 52 seconds)
2025-09-16 14:00:23,350 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:00:24,832 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 521.94733 ± 72.998
2025-09-16 14:00:24,832 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [470.5774, 448.76404, 511.1631, 510.43307, 457.99142, 462.71994, 602.24585, 617.13324, 475.76703, 662.6776]
2025-09-16 14:00:24,832 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 84.0, 97.0, 106.0, 86.0, 101.0, 125.0, 120.0, 90.0, 130.0]
2025-09-16 14:00:24,841 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 3 hours, 1 minute, 24 seconds)
2025-09-16 14:02:26,077 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:02:27,617 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 541.08679 ± 111.577
2025-09-16 14:02:27,617 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [687.3599, 560.5693, 496.93768, 434.3104, 446.0508, 776.81946, 587.34625, 549.19714, 452.44427, 419.83295]
2025-09-16 14:02:27,617 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [135.0, 119.0, 96.0, 82.0, 88.0, 154.0, 112.0, 103.0, 98.0, 81.0]
2025-09-16 14:02:27,625 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 59 minutes, 19 seconds)
2025-09-16 14:04:30,377 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:04:31,902 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 532.15442 ± 75.977
2025-09-16 14:04:31,902 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [433.25476, 550.22534, 684.4084, 492.6502, 532.47015, 540.8354, 411.53198, 604.0496, 575.1114, 497.00708]
2025-09-16 14:04:31,902 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 109.0, 133.0, 93.0, 114.0, 102.0, 87.0, 127.0, 107.0, 98.0]
2025-09-16 14:04:31,911 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 57 minutes, 14 seconds)
2025-09-16 14:06:35,427 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:06:36,973 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 550.52368 ± 76.193
2025-09-16 14:06:36,973 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [558.89557, 482.42114, 594.6871, 584.0271, 448.0636, 714.69653, 588.79443, 500.63852, 573.7558, 459.25726]
2025-09-16 14:06:36,973 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [106.0, 92.0, 116.0, 115.0, 96.0, 137.0, 115.0, 98.0, 112.0, 87.0]
2025-09-16 14:06:36,978 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 55 minutes, 25 seconds)
2025-09-16 14:08:40,132 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:08:41,885 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 582.80078 ± 163.555
2025-09-16 14:08:41,885 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [495.75302, 423.42813, 489.8721, 703.85114, 605.8453, 503.68695, 899.40845, 494.20163, 387.71274, 824.24786]
2025-09-16 14:08:41,885 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [111.0, 89.0, 92.0, 134.0, 116.0, 110.0, 189.0, 111.0, 74.0, 166.0]
2025-09-16 14:08:41,885 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (582.80) for latency 18
2025-09-16 14:08:41,897 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 53 minutes, 32 seconds)
2025-09-16 14:10:45,344 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:10:46,813 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 497.78387 ± 74.994
2025-09-16 14:10:46,813 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [344.32626, 485.1089, 449.42297, 444.50052, 549.1801, 632.16626, 514.87115, 561.9105, 531.7014, 464.6503]
2025-09-16 14:10:46,813 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 99.0, 95.0, 92.0, 111.0, 122.0, 104.0, 119.0, 111.0, 97.0]
2025-09-16 14:10:46,820 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 52 minutes, 4 seconds)
2025-09-16 14:12:49,418 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:12:51,253 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 573.68500 ± 187.163
2025-09-16 14:12:51,253 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [449.09738, 1080.4352, 472.76523, 644.1895, 531.0972, 433.27707, 510.5451, 656.48865, 399.7688, 559.1861]
2025-09-16 14:12:51,253 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 227.0, 104.0, 121.0, 101.0, 82.0, 97.0, 126.0, 76.0, 106.0]
2025-09-16 14:12:51,262 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 50 minutes, 27 seconds)
2025-09-16 14:14:51,934 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:14:53,545 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 568.19324 ± 98.813
2025-09-16 14:14:53,546 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [751.87537, 577.09186, 408.04333, 443.53372, 517.79706, 679.9326, 618.3577, 513.37994, 612.9431, 558.9775]
2025-09-16 14:14:53,546 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [141.0, 121.0, 89.0, 84.0, 97.0, 131.0, 120.0, 110.0, 117.0, 106.0]
2025-09-16 14:14:53,552 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 47 minutes, 50 seconds)
2025-09-16 14:16:58,213 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:17:00,125 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 589.55847 ± 98.031
2025-09-16 14:17:00,125 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [505.79932, 636.7432, 687.4838, 576.0052, 390.44193, 594.9637, 486.55093, 662.2736, 732.37665, 622.9465]
2025-09-16 14:17:00,125 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [110.0, 137.0, 147.0, 111.0, 80.0, 113.0, 108.0, 138.0, 136.0, 115.0]
2025-09-16 14:17:00,125 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (589.56) for latency 18
2025-09-16 14:17:00,134 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 46 minutes, 10 seconds)
2025-09-16 14:19:01,901 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:19:03,818 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 655.40009 ± 79.034
2025-09-16 14:19:03,818 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [687.3911, 761.6505, 757.53644, 624.9661, 658.40674, 560.04517, 703.48206, 542.70996, 547.432, 710.38055]
2025-09-16 14:19:03,819 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [144.0, 143.0, 150.0, 133.0, 124.0, 118.0, 149.0, 104.0, 112.0, 139.0]
2025-09-16 14:19:03,819 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (655.40) for latency 18
2025-09-16 14:19:03,825 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 43 minutes, 46 seconds)
2025-09-16 14:21:06,764 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:21:08,497 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 613.06091 ± 124.892
2025-09-16 14:21:08,497 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [848.64185, 616.4223, 519.94965, 800.64374, 552.2814, 725.32166, 524.34515, 474.50027, 513.94666, 554.55634]
2025-09-16 14:21:08,497 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [169.0, 134.0, 97.0, 151.0, 107.0, 152.0, 114.0, 88.0, 96.0, 104.0]
2025-09-16 14:21:08,504 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 41 minutes, 38 seconds)
2025-09-16 14:23:11,222 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:23:13,139 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 600.39740 ± 108.220
2025-09-16 14:23:13,139 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [606.5904, 645.261, 469.7831, 509.82016, 604.2619, 472.00186, 477.4622, 741.69275, 733.56616, 743.53467]
2025-09-16 14:23:13,139 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [130.0, 130.0, 89.0, 100.0, 111.0, 102.0, 103.0, 143.0, 139.0, 153.0]
2025-09-16 14:23:13,147 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 39 minutes, 37 seconds)
2025-09-16 14:25:15,735 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:25:17,365 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 524.02441 ± 78.695
2025-09-16 14:25:17,365 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [503.95132, 636.43915, 596.9237, 554.29944, 428.9934, 591.9411, 534.10394, 426.3505, 393.87845, 573.363]
2025-09-16 14:25:17,365 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [95.0, 133.0, 112.0, 104.0, 92.0, 113.0, 115.0, 79.0, 77.0, 106.0]
2025-09-16 14:25:17,379 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 38 minutes, 2 seconds)
2025-09-16 14:27:20,229 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:27:22,073 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 639.09595 ± 154.523
2025-09-16 14:27:22,073 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [730.03845, 434.9054, 819.8407, 513.1018, 588.5159, 904.9424, 566.0863, 536.9331, 808.1336, 488.46188]
2025-09-16 14:27:22,073 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [145.0, 81.0, 166.0, 100.0, 123.0, 171.0, 118.0, 113.0, 173.0, 106.0]
2025-09-16 14:27:22,080 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 35 minutes, 29 seconds)
2025-09-16 14:29:24,711 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:29:26,887 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 685.82434 ± 154.932
2025-09-16 14:29:26,887 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [452.9672, 587.9686, 892.2307, 773.36206, 608.60583, 968.5735, 556.68134, 598.6789, 623.96674, 795.2085]
2025-09-16 14:29:26,887 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 126.0, 168.0, 148.0, 123.0, 190.0, 105.0, 123.0, 139.0, 155.0]
2025-09-16 14:29:26,887 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (685.82) for latency 18
2025-09-16 14:29:26,892 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 33 minutes, 41 seconds)
2025-09-16 14:31:26,807 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:31:28,837 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 710.46747 ± 113.724
2025-09-16 14:31:28,838 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [810.8924, 574.0922, 742.7719, 803.5048, 827.11487, 761.6469, 816.309, 546.383, 702.63666, 519.3235]
2025-09-16 14:31:28,838 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [154.0, 119.0, 153.0, 163.0, 159.0, 147.0, 159.0, 113.0, 138.0, 98.0]
2025-09-16 14:31:28,838 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (710.47) for latency 18
2025-09-16 14:31:28,846 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 30 minutes, 56 seconds)
2025-09-16 14:33:29,050 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:33:30,846 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 642.55005 ± 138.752
2025-09-16 14:33:30,846 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [558.3009, 586.7686, 436.58456, 494.7692, 738.5219, 727.0309, 551.0115, 910.82635, 788.3825, 633.304]
2025-09-16 14:33:30,846 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [106.0, 128.0, 82.0, 93.0, 142.0, 138.0, 119.0, 177.0, 152.0, 118.0]
2025-09-16 14:33:30,851 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 28 minutes, 14 seconds)
2025-09-16 14:35:32,953 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:35:35,025 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 710.75104 ± 151.904
2025-09-16 14:35:35,025 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1002.40607, 701.64343, 824.879, 537.00476, 817.2802, 621.63214, 538.06683, 595.88794, 594.76587, 873.94403]
2025-09-16 14:35:35,025 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [188.0, 152.0, 159.0, 101.0, 162.0, 138.0, 114.0, 114.0, 123.0, 188.0]
2025-09-16 14:35:35,025 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (710.75) for latency 18
2025-09-16 14:35:35,033 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 26 minutes, 10 seconds)
2025-09-16 14:37:36,207 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:37:38,388 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 760.19537 ± 153.242
2025-09-16 14:37:38,388 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [667.9183, 745.5893, 865.04, 863.56726, 551.61273, 615.69025, 1048.732, 751.1266, 578.5321, 914.14484]
2025-09-16 14:37:38,388 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [133.0, 153.0, 160.0, 167.0, 118.0, 126.0, 217.0, 157.0, 119.0, 174.0]
2025-09-16 14:37:38,388 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (760.20) for latency 18
2025-09-16 14:37:38,394 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 23 minutes, 48 seconds)
2025-09-16 14:39:37,720 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:39:40,031 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 823.36066 ± 235.251
2025-09-16 14:39:40,031 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [561.38477, 689.7469, 672.96594, 665.6339, 728.7622, 1174.4474, 721.14594, 1349.1982, 833.4887, 836.8326]
2025-09-16 14:39:40,031 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [102.0, 131.0, 136.0, 135.0, 139.0, 235.0, 133.0, 266.0, 157.0, 161.0]
2025-09-16 14:39:40,031 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (823.36) for latency 18
2025-09-16 14:39:40,050 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 21 minutes, 1 second)
2025-09-16 14:41:41,315 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:41:43,592 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 806.63660 ± 82.208
2025-09-16 14:41:43,592 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [900.5673, 768.3812, 727.99243, 854.93115, 797.9564, 740.14294, 900.51746, 896.4407, 835.756, 643.6805]
2025-09-16 14:41:43,592 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [177.0, 145.0, 139.0, 174.0, 151.0, 158.0, 174.0, 171.0, 156.0, 137.0]
2025-09-16 14:41:43,600 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 19 minutes, 20 seconds)
2025-09-16 14:43:46,160 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:43:48,400 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 719.23102 ± 195.921
2025-09-16 14:43:48,400 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [881.5301, 650.7216, 433.27014, 557.15924, 1042.4125, 514.17456, 709.36523, 719.8908, 657.37866, 1026.4069]
2025-09-16 14:43:48,400 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [169.0, 118.0, 80.0, 113.0, 219.0, 106.0, 148.0, 148.0, 125.0, 192.0]
2025-09-16 14:43:48,406 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 17 minutes, 55 seconds)
2025-09-16 14:45:48,473 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:45:50,601 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 772.70795 ± 171.286
2025-09-16 14:45:50,601 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1179.1029, 828.1606, 639.12024, 907.64734, 804.4513, 608.5465, 746.2978, 728.1856, 753.93024, 531.6372]
2025-09-16 14:45:50,601 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [236.0, 163.0, 122.0, 181.0, 166.0, 112.0, 135.0, 133.0, 140.0, 99.0]
2025-09-16 14:45:50,612 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 15 minutes, 25 seconds)
2025-09-16 14:47:50,836 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:47:52,909 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 767.83606 ± 143.019
2025-09-16 14:47:52,909 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [983.2057, 675.53174, 708.9909, 941.9587, 624.2019, 951.7904, 671.87103, 870.1521, 648.7227, 601.9357]
2025-09-16 14:47:52,909 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [180.0, 124.0, 131.0, 182.0, 121.0, 177.0, 125.0, 162.0, 124.0, 111.0]
2025-09-16 14:47:52,916 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 13 minutes, 8 seconds)
2025-09-16 14:49:55,728 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:49:57,938 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 823.94385 ± 146.844
2025-09-16 14:49:57,938 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [997.95636, 906.45074, 708.0793, 963.6312, 1057.0045, 744.7174, 695.8677, 677.86035, 612.6597, 875.21124]
2025-09-16 14:49:57,938 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [179.0, 170.0, 140.0, 175.0, 206.0, 137.0, 131.0, 124.0, 120.0, 168.0]
2025-09-16 14:49:57,938 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (823.94) for latency 18
2025-09-16 14:49:57,946 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 11 minutes, 49 seconds)
2025-09-16 14:51:57,688 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:52:00,190 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 886.57440 ± 165.321
2025-09-16 14:52:00,190 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [866.69867, 1218.2177, 1059.9188, 615.33954, 848.9941, 1030.4954, 830.8614, 714.3062, 831.7814, 849.1308]
2025-09-16 14:52:00,190 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [175.0, 238.0, 216.0, 117.0, 164.0, 207.0, 163.0, 134.0, 161.0, 159.0]
2025-09-16 14:52:00,190 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (886.57) for latency 18
2025-09-16 14:52:00,199 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 9 minutes, 29 seconds)
2025-09-16 14:54:00,903 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:54:03,245 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 797.99060 ± 184.020
2025-09-16 14:54:03,245 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1129.2635, 627.75037, 505.30148, 972.9597, 919.7469, 911.88837, 654.1309, 891.22156, 697.5401, 670.1029]
2025-09-16 14:54:03,245 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [242.0, 118.0, 111.0, 194.0, 166.0, 185.0, 150.0, 176.0, 151.0, 141.0]
2025-09-16 14:54:03,251 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 7 minutes, 4 seconds)
2025-09-16 14:56:04,957 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:56:06,989 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 722.58740 ± 154.861
2025-09-16 14:56:06,989 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [936.7119, 745.93567, 572.9406, 734.42346, 770.5662, 633.4004, 982.507, 419.63263, 690.64124, 739.1145]
2025-09-16 14:56:06,989 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [191.0, 147.0, 114.0, 150.0, 140.0, 120.0, 195.0, 90.0, 142.0, 135.0]
2025-09-16 14:56:06,997 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 5 minutes, 19 seconds)
2025-09-16 14:58:09,910 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:58:13,077 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1008.21124 ± 118.871
2025-09-16 14:58:13,077 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [834.9654, 896.28064, 900.43744, 1038.3065, 1036.2793, 1181.0024, 933.9558, 943.2427, 1174.5176, 1143.124]
2025-09-16 14:58:13,077 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [158.0, 178.0, 173.0, 216.0, 208.0, 237.0, 185.0, 178.0, 239.0, 228.0]
2025-09-16 14:58:13,078 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (1008.21) for latency 18
2025-09-16 14:58:13,086 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 4 minutes, 2 seconds)
2025-09-16 15:00:13,153 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:00:15,862 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 880.28284 ± 246.436
2025-09-16 15:00:15,862 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [813.48474, 1259.1777, 628.4832, 796.7654, 1081.3992, 1074.8793, 713.98785, 734.89404, 1210.3994, 489.35706]
2025-09-16 15:00:15,862 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [158.0, 247.0, 122.0, 155.0, 197.0, 229.0, 134.0, 143.0, 233.0, 106.0]
2025-09-16 15:00:15,870 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 1 minute, 31 seconds)
2025-09-16 15:02:15,593 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:02:17,875 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 779.47815 ± 218.423
2025-09-16 15:02:17,876 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1068.9003, 614.29083, 1030.81, 1115.5358, 826.4327, 507.97302, 558.9028, 680.1127, 839.98944, 551.83453]
2025-09-16 15:02:17,876 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [211.0, 128.0, 214.0, 221.0, 172.0, 102.0, 110.0, 143.0, 167.0, 112.0]
2025-09-16 15:02:17,883 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 59 minutes, 25 seconds)
2025-09-16 15:04:19,358 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:04:22,010 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 947.13409 ± 188.687
2025-09-16 15:04:22,010 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [773.4795, 857.836, 892.3599, 1217.6255, 607.61096, 799.7093, 1223.9053, 970.8944, 1052.6292, 1075.2909]
2025-09-16 15:04:22,010 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [154.0, 157.0, 177.0, 260.0, 111.0, 154.0, 244.0, 192.0, 208.0, 200.0]
2025-09-16 15:04:22,017 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 57 minutes, 33 seconds)
2025-09-16 15:06:23,274 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:06:25,943 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 876.78467 ± 154.863
2025-09-16 15:06:25,943 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [870.2066, 959.86816, 869.7885, 792.6478, 927.07056, 802.51697, 829.47894, 1278.6598, 673.8105, 763.79895]
2025-09-16 15:06:25,943 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [182.0, 182.0, 164.0, 146.0, 181.0, 156.0, 156.0, 248.0, 126.0, 143.0]
2025-09-16 15:06:25,954 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 55 minutes, 32 seconds)
2025-09-16 15:08:28,764 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:08:31,657 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1011.23376 ± 241.701
2025-09-16 15:08:31,658 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1078.5715, 902.6436, 856.6761, 1117.4603, 853.6139, 853.45416, 1647.1667, 1133.7194, 811.2399, 857.79205]
2025-09-16 15:08:31,658 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [227.0, 188.0, 175.0, 214.0, 170.0, 168.0, 324.0, 232.0, 150.0, 166.0]
2025-09-16 15:08:31,658 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (1011.23) for latency 18
2025-09-16 15:08:31,670 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 53 minutes, 24 seconds)
2025-09-16 15:10:32,531 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:10:35,901 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1130.52466 ± 338.927
2025-09-16 15:10:35,901 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1744.2219, 1139.4594, 845.8017, 864.4453, 875.0683, 1012.8238, 1795.2893, 1200.9662, 917.70746, 909.4639]
2025-09-16 15:10:35,901 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [345.0, 246.0, 182.0, 171.0, 186.0, 215.0, 356.0, 242.0, 187.0, 163.0]
2025-09-16 15:10:35,901 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (1130.52) for latency 18
2025-09-16 15:10:35,908 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 51 minutes, 36 seconds)
2025-09-16 15:12:36,230 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:12:38,933 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 936.30127 ± 257.023
2025-09-16 15:12:38,933 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [499.86478, 793.48193, 1093.6537, 635.3264, 1332.4607, 1106.3375, 1208.691, 1113.1295, 801.93695, 778.13025]
2025-09-16 15:12:38,933 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 149.0, 210.0, 120.0, 264.0, 238.0, 247.0, 225.0, 168.0, 170.0]
2025-09-16 15:12:38,942 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 49 minutes, 43 seconds)
2025-09-16 15:14:39,675 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:14:43,048 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1157.54919 ± 313.787
2025-09-16 15:14:43,048 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [706.4077, 1542.5381, 1007.81476, 1143.6693, 1210.92, 1090.9469, 1098.5255, 1054.7825, 1863.4346, 856.45355]
2025-09-16 15:14:43,048 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [149.0, 302.0, 213.0, 227.0, 242.0, 201.0, 234.0, 224.0, 373.0, 161.0]
2025-09-16 15:14:43,048 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (1157.55) for latency 18
2025-09-16 15:14:43,060 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 47 minutes, 38 seconds)
2025-09-16 15:16:47,106 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:16:49,989 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1010.61572 ± 251.388
2025-09-16 15:16:49,990 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [933.34717, 645.5563, 898.9995, 1222.2368, 1581.8207, 1019.97845, 1108.4958, 1085.645, 736.54205, 873.535]
2025-09-16 15:16:49,990 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [179.0, 138.0, 171.0, 256.0, 307.0, 193.0, 237.0, 211.0, 145.0, 161.0]
2025-09-16 15:16:49,998 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 46 minutes, 5 seconds)
2025-09-16 15:18:48,729 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:18:51,776 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 924.18683 ± 314.057
2025-09-16 15:18:51,776 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [650.8304, 652.3337, 998.4779, 758.3292, 943.24054, 853.46014, 960.88654, 928.5025, 705.5264, 1790.2809]
2025-09-16 15:18:51,776 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [146.0, 135.0, 210.0, 166.0, 182.0, 179.0, 206.0, 184.0, 135.0, 360.0]
2025-09-16 15:18:51,790 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 43 minutes, 21 seconds)
2025-09-16 15:20:56,117 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:20:58,784 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 865.47351 ± 283.590
2025-09-16 15:20:58,784 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [813.6442, 692.9226, 749.95404, 514.19934, 1545.0706, 893.737, 1021.86914, 672.7157, 640.176, 1110.4462]
2025-09-16 15:20:58,784 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [157.0, 133.0, 146.0, 102.0, 300.0, 168.0, 190.0, 125.0, 119.0, 235.0]
2025-09-16 15:20:58,790 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 41 minutes, 44 seconds)
2025-09-16 15:22:59,762 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:23:02,976 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1004.98328 ± 199.466
2025-09-16 15:23:02,976 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [681.6648, 1069.9106, 1073.8528, 1294.2769, 651.41156, 1070.7186, 983.1295, 1181.8228, 1161.2548, 881.791]
2025-09-16 15:23:02,976 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [141.0, 211.0, 228.0, 278.0, 124.0, 221.0, 186.0, 231.0, 236.0, 170.0]
2025-09-16 15:23:02,987 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 39 minutes, 50 seconds)
2025-09-16 15:25:04,064 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:25:07,217 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1120.58887 ± 346.965
2025-09-16 15:25:07,217 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1081.4498, 707.92236, 1133.4585, 773.0655, 1056.2695, 1337.7347, 1070.0878, 939.55707, 1078.7167, 2027.6271]
2025-09-16 15:25:07,217 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [210.0, 150.0, 220.0, 141.0, 204.0, 270.0, 211.0, 173.0, 202.0, 401.0]
2025-09-16 15:25:07,229 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 37 minutes, 47 seconds)
2025-09-16 15:27:10,279 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:27:13,257 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1037.78979 ± 310.826
2025-09-16 15:27:13,258 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1186.0316, 809.85803, 1279.1746, 999.38763, 1707.8258, 538.5211, 1106.2189, 680.5383, 1066.1345, 1004.2069]
2025-09-16 15:27:13,258 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [228.0, 160.0, 246.0, 202.0, 355.0, 112.0, 210.0, 137.0, 220.0, 207.0]
2025-09-16 15:27:13,267 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 35 minutes, 34 seconds)
2025-09-16 15:29:17,040 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:29:20,459 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1240.83887 ± 206.354
2025-09-16 15:29:20,459 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1411.1196, 1071.3582, 1270.8268, 1268.5217, 977.83295, 1178.2034, 1217.6758, 1068.4127, 1751.5032, 1192.9344]
2025-09-16 15:29:20,459 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [280.0, 210.0, 242.0, 241.0, 187.0, 214.0, 233.0, 206.0, 336.0, 226.0]
2025-09-16 15:29:20,459 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (1240.84) for latency 18
2025-09-16 15:29:20,466 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 34 minutes, 18 seconds)
2025-09-16 15:31:23,102 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:31:26,075 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1021.35461 ± 276.268
2025-09-16 15:31:26,075 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1135.2883, 1122.3228, 481.72754, 1105.2639, 925.41675, 1399.0216, 929.9157, 1383.9583, 640.0905, 1090.5397]
2025-09-16 15:31:26,075 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [229.0, 225.0, 98.0, 233.0, 189.0, 273.0, 193.0, 266.0, 132.0, 218.0]
2025-09-16 15:31:26,082 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 32 minutes)
2025-09-16 15:33:27,383 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:33:29,878 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 872.60974 ± 192.049
2025-09-16 15:33:29,879 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [960.6833, 923.97455, 720.9305, 946.0226, 823.1052, 1021.1983, 736.58093, 790.56885, 1274.7482, 528.28485]
2025-09-16 15:33:29,879 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [186.0, 176.0, 142.0, 183.0, 158.0, 199.0, 155.0, 170.0, 269.0, 109.0]
2025-09-16 15:33:29,888 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 29 minutes, 51 seconds)
2025-09-16 15:35:31,408 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:35:35,168 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1265.25745 ± 207.047
2025-09-16 15:35:35,168 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1347.7069, 1549.2693, 1092.099, 1109.365, 1182.495, 1248.2976, 1132.7382, 1185.9021, 1732.768, 1071.9331]
2025-09-16 15:35:35,168 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [258.0, 286.0, 201.0, 207.0, 231.0, 239.0, 202.0, 239.0, 322.0, 192.0]
2025-09-16 15:35:35,168 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (1265.26) for latency 18
2025-09-16 15:35:35,178 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 27 minutes, 54 seconds)
2025-09-16 15:37:38,533 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:37:42,154 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1284.86865 ± 345.892
2025-09-16 15:37:42,154 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1222.5352, 842.37067, 1648.403, 1802.5984, 1573.7625, 1058.0172, 796.8182, 1498.8353, 916.21625, 1489.1296]
2025-09-16 15:37:42,154 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [246.0, 157.0, 323.0, 339.0, 289.0, 205.0, 169.0, 286.0, 186.0, 302.0]
2025-09-16 15:37:42,154 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (1284.87) for latency 18
2025-09-16 15:37:42,162 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 25 minutes, 56 seconds)
2025-09-16 15:39:45,782 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:39:48,971 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1146.24194 ± 335.405
2025-09-16 15:39:48,971 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [663.0805, 1483.1166, 868.91797, 1378.7394, 1648.8466, 777.0207, 1548.1259, 1002.6287, 878.8001, 1213.1433]
2025-09-16 15:39:48,971 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [125.0, 293.0, 164.0, 255.0, 321.0, 153.0, 284.0, 187.0, 175.0, 242.0]
2025-09-16 15:39:48,979 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 23 minutes, 48 seconds)
2025-09-16 15:41:48,895 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:41:53,029 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1416.37231 ± 743.561
2025-09-16 15:41:53,029 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1228.7543, 568.77856, 888.7954, 1663.0833, 1453.018, 1030.6265, 1130.6921, 3264.7031, 842.0681, 2093.2034]
2025-09-16 15:41:53,029 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [242.0, 118.0, 185.0, 334.0, 295.0, 211.0, 212.0, 651.0, 181.0, 398.0]
2025-09-16 15:41:53,030 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (1416.37) for latency 18
2025-09-16 15:41:53,037 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 21 minutes, 30 seconds)
2025-09-16 15:43:54,113 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:43:58,331 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1417.74292 ± 441.959
2025-09-16 15:43:58,332 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1909.3898, 826.07227, 2058.9456, 737.52734, 1287.1099, 1391.9502, 963.15924, 1645.9491, 1484.9596, 1872.3661]
2025-09-16 15:43:58,332 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [393.0, 173.0, 414.0, 154.0, 265.0, 277.0, 203.0, 342.0, 301.0, 360.0]
2025-09-16 15:43:58,332 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (1417.74) for latency 18
2025-09-16 15:43:58,342 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 19 minutes, 36 seconds)
2025-09-16 15:46:02,411 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:46:06,118 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1263.51562 ± 508.235
2025-09-16 15:46:06,118 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [991.52545, 860.9928, 843.3913, 964.1641, 1165.3423, 984.1632, 2637.0508, 1538.4777, 1433.3352, 1216.714]
2025-09-16 15:46:06,118 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [206.0, 177.0, 166.0, 182.0, 220.0, 201.0, 526.0, 301.0, 290.0, 237.0]
2025-09-16 15:46:06,124 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 17 minutes, 49 seconds)
2025-09-16 15:48:08,124 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:48:11,919 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1363.53442 ± 366.603
2025-09-16 15:48:11,920 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1259.5632, 1677.3491, 1063.8727, 1984.2075, 735.0593, 1585.569, 1276.2423, 1022.5552, 1785.0228, 1245.9031]
2025-09-16 15:48:11,920 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [239.0, 329.0, 221.0, 377.0, 138.0, 302.0, 260.0, 189.0, 328.0, 225.0]
2025-09-16 15:48:11,930 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 15 minutes, 34 seconds)
2025-09-16 15:50:14,383 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:50:17,451 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1001.86682 ± 183.227
2025-09-16 15:50:17,452 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1108.7979, 1276.901, 923.6401, 785.9406, 836.9866, 1108.906, 1156.9554, 747.70233, 854.46045, 1218.3779]
2025-09-16 15:50:17,452 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [226.0, 243.0, 196.0, 152.0, 156.0, 222.0, 224.0, 145.0, 178.0, 222.0]
2025-09-16 15:50:17,461 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 13 minutes, 19 seconds)
2025-09-16 15:52:18,172 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:52:21,576 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1209.24597 ± 284.126
2025-09-16 15:52:21,577 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1313.653, 1778.7603, 1546.8021, 930.7248, 1309.4733, 1042.5193, 1176.039, 1255.9811, 805.26154, 933.2452]
2025-09-16 15:52:21,577 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [246.0, 345.0, 315.0, 178.0, 261.0, 195.0, 229.0, 241.0, 160.0, 178.0]
2025-09-16 15:52:21,586 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 11 minutes, 14 seconds)
2025-09-16 15:54:24,138 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:54:28,162 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1514.35840 ± 196.921
2025-09-16 15:54:28,162 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1803.3776, 1109.3126, 1381.0728, 1531.138, 1720.5948, 1667.3193, 1410.1023, 1491.9335, 1364.6422, 1664.091]
2025-09-16 15:54:28,162 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [322.0, 199.0, 254.0, 289.0, 305.0, 313.0, 274.0, 272.0, 252.0, 312.0]
2025-09-16 15:54:28,162 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (1514.36) for latency 18
2025-09-16 15:54:28,170 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 9 minutes, 16 seconds)
2025-09-16 15:56:28,999 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:56:34,170 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1733.24866 ± 412.510
2025-09-16 15:56:34,170 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1988.9861, 1496.3004, 2164.8862, 1798.8635, 1744.8785, 1231.0619, 1918.5687, 2277.943, 1865.7587, 845.2393]
2025-09-16 15:56:34,170 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [381.0, 279.0, 402.0, 319.0, 347.0, 224.0, 363.0, 428.0, 337.0, 156.0]
2025-09-16 15:56:34,170 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (1733.25) for latency 18
2025-09-16 15:56:34,177 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 6 minutes, 59 seconds)
2025-09-16 15:58:38,208 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:58:43,024 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1533.26575 ± 404.681
2025-09-16 15:58:43,024 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1598.7593, 1277.4187, 1319.5885, 1115.2302, 2083.5598, 1212.7856, 953.85315, 1624.5428, 1988.8868, 2158.0322]
2025-09-16 15:58:43,024 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [309.0, 241.0, 260.0, 237.0, 406.0, 235.0, 188.0, 333.0, 395.0, 400.0]
2025-09-16 15:58:43,033 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 5 minutes, 12 seconds)
2025-09-16 16:00:42,452 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:00:47,686 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1657.28784 ± 879.860
2025-09-16 16:00:47,686 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1542.9343, 979.0816, 928.54553, 3400.7202, 1709.223, 1261.3901, 969.5296, 1538.667, 976.71814, 3266.0693]
2025-09-16 16:00:47,686 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [307.0, 198.0, 201.0, 670.0, 340.0, 237.0, 191.0, 307.0, 197.0, 629.0]
2025-09-16 16:00:47,693 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 3 minutes, 1 second)
2025-09-16 16:02:52,542 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:02:57,162 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1498.77161 ± 469.419
2025-09-16 16:02:57,163 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1208.1556, 1579.7369, 2396.6309, 1482.7126, 1262.6772, 833.8002, 1339.0214, 1133.7024, 1448.2417, 2303.0361]
2025-09-16 16:02:57,163 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [221.0, 288.0, 482.0, 304.0, 237.0, 159.0, 249.0, 224.0, 280.0, 434.0]
2025-09-16 16:02:57,172 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 1 minute, 26 seconds)
2025-09-16 16:04:57,742 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:05:03,507 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1726.82654 ± 875.587
2025-09-16 16:05:03,507 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1097.3293, 2269.2769, 808.25305, 2346.6938, 1719.5476, 1565.2871, 1275.9731, 923.94324, 1358.3923, 3903.5693]
2025-09-16 16:05:03,507 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [217.0, 432.0, 169.0, 493.0, 347.0, 314.0, 261.0, 203.0, 256.0, 769.0]
2025-09-16 16:05:03,513 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 59 minutes, 17 seconds)
2025-09-16 16:07:08,531 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:07:13,236 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1454.42822 ± 333.829
2025-09-16 16:07:13,236 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [2075.5623, 802.075, 1692.6321, 1225.0013, 1656.8113, 1299.7177, 1668.0417, 1437.7202, 1172.7825, 1513.9382]
2025-09-16 16:07:13,236 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [420.0, 172.0, 337.0, 240.0, 339.0, 266.0, 318.0, 297.0, 219.0, 309.0]
2025-09-16 16:07:13,243 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 57 minutes, 30 seconds)
2025-09-16 16:09:13,180 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:09:19,022 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2062.20776 ± 852.542
2025-09-16 16:09:19,023 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1678.1775, 3723.0688, 3121.9363, 3106.508, 1717.7789, 1385.1268, 1415.3904, 1154.9673, 1714.2302, 1604.8944]
2025-09-16 16:09:19,023 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [313.0, 712.0, 580.0, 582.0, 338.0, 278.0, 266.0, 226.0, 314.0, 307.0]
2025-09-16 16:09:19,023 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (2062.21) for latency 18
2025-09-16 16:09:19,033 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 55 minutes, 7 seconds)
2025-09-16 16:11:23,496 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:11:29,725 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2062.19580 ± 705.587
2025-09-16 16:11:29,725 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1895.7162, 1324.5675, 2531.4475, 3205.637, 1611.8706, 1867.17, 3269.3555, 2253.5867, 1526.1971, 1136.4077]
2025-09-16 16:11:29,725 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [352.0, 243.0, 485.0, 581.0, 294.0, 334.0, 641.0, 432.0, 279.0, 235.0]
2025-09-16 16:11:29,735 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 53 minutes, 30 seconds)
2025-09-16 16:13:32,568 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:13:38,946 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2318.70068 ± 1414.235
2025-09-16 16:13:38,946 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1599.7487, 4557.0947, 5360.046, 1299.596, 1553.6987, 1411.2607, 3044.315, 1512.6652, 1432.604, 1415.9801]
2025-09-16 16:13:38,946 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [300.0, 872.0, 998.0, 240.0, 306.0, 256.0, 565.0, 287.0, 277.0, 263.0]
2025-09-16 16:13:38,946 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (2318.70) for latency 18
2025-09-16 16:13:38,954 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 51 minutes, 20 seconds)
2025-09-16 16:15:41,054 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:15:50,397 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3179.58765 ± 1400.296
2025-09-16 16:15:50,397 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [3754.701, 4124.9395, 4908.6353, 1647.9083, 4991.459, 2938.7568, 1720.9911, 4636.9307, 1374.9359, 1696.6208]
2025-09-16 16:15:50,397 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [710.0, 767.0, 939.0, 303.0, 949.0, 600.0, 314.0, 880.0, 272.0, 324.0]
2025-09-16 16:15:50,397 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (3179.59) for latency 18
2025-09-16 16:15:50,411 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 49 minutes, 35 seconds)
2025-09-16 16:17:53,025 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:17:58,232 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1797.10779 ± 562.997
2025-09-16 16:17:58,232 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [2167.3914, 1610.8467, 1466.8083, 2404.531, 2460.2173, 2043.2888, 829.02844, 2013.5765, 838.09564, 2137.2954]
2025-09-16 16:17:58,232 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [439.0, 302.0, 268.0, 471.0, 494.0, 406.0, 155.0, 415.0, 158.0, 431.0]
2025-09-16 16:17:58,240 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 47 minutes, 17 seconds)
2025-09-16 16:20:00,222 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:20:09,144 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2836.97925 ± 1235.645
2025-09-16 16:20:09,144 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [2891.7112, 2240.187, 2389.0872, 1803.4578, 5146.6045, 1850.6764, 3438.4438, 2752.049, 1065.0211, 4792.555]
2025-09-16 16:20:09,144 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [562.0, 425.0, 468.0, 321.0, 1000.0, 342.0, 648.0, 514.0, 195.0, 926.0]
2025-09-16 16:20:09,153 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 45 minutes, 30 seconds)
2025-09-16 16:22:13,891 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:22:19,579 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1990.19434 ± 1121.045
2025-09-16 16:22:19,579 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [681.63104, 1653.7648, 1284.05, 1966.6649, 1467.4565, 2745.036, 1799.9014, 3374.7856, 593.9317, 4334.7217]
2025-09-16 16:22:19,579 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [135.0, 335.0, 247.0, 377.0, 301.0, 520.0, 368.0, 632.0, 129.0, 831.0]
2025-09-16 16:22:19,592 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 43 minutes, 19 seconds)
2025-09-16 16:24:20,597 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:24:25,868 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1914.05042 ± 1011.541
2025-09-16 16:24:25,868 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1643.7615, 1814.8103, 1095.6814, 4269.4575, 3315.1074, 1168.6469, 1631.4886, 1991.5784, 1199.7789, 1010.1932]
2025-09-16 16:24:25,868 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [291.0, 335.0, 199.0, 808.0, 635.0, 238.0, 307.0, 395.0, 230.0, 188.0]
2025-09-16 16:24:25,878 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 40 minutes, 58 seconds)
2025-09-16 16:26:28,249 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:26:38,241 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3385.58936 ± 1557.890
2025-09-16 16:26:38,241 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [858.23413, 4133.315, 4823.676, 5187.185, 2050.8875, 4663.933, 5256.8794, 1637.273, 1962.534, 3281.9768]
2025-09-16 16:26:38,241 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [183.0, 773.0, 905.0, 1000.0, 399.0, 867.0, 1000.0, 332.0, 393.0, 651.0]
2025-09-16 16:26:38,241 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (3385.59) for latency 18
2025-09-16 16:26:38,258 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 38 minutes, 52 seconds)
2025-09-16 16:28:50,114 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:29:01,040 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3725.10596 ± 1327.453
2025-09-16 16:29:01,040 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [2708.336, 5199.0283, 2273.0383, 2155.532, 5245.797, 4058.1682, 5177.327, 2890.7458, 5289.7905, 2253.2961]
2025-09-16 16:29:01,040 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [526.0, 1000.0, 416.0, 397.0, 1000.0, 777.0, 1000.0, 542.0, 1000.0, 432.0]
2025-09-16 16:29:01,040 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (3725.11) for latency 18
2025-09-16 16:29:01,049 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 37 minutes, 33 seconds)
2025-09-16 16:30:55,238 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:31:03,672 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2562.52759 ± 1602.491
2025-09-16 16:31:03,673 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [2163.5151, 1499.6251, 650.43774, 5248.715, 4267.0254, 1814.769, 4897.747, 942.9584, 2934.4512, 1206.0306]
2025-09-16 16:31:03,673 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [437.0, 299.0, 142.0, 1000.0, 857.0, 343.0, 958.0, 175.0, 590.0, 236.0]
2025-09-16 16:31:03,682 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 34 minutes, 54 seconds)
2025-09-16 16:33:14,059 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:33:24,557 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3237.24756 ± 1377.740
2025-09-16 16:33:24,557 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1028.4058, 3900.48, 3809.6777, 3826.8296, 5053.1196, 3521.9016, 1514.2186, 1511.401, 5142.9106, 3063.5342]
2025-09-16 16:33:24,557 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [224.0, 774.0, 754.0, 768.0, 1000.0, 665.0, 280.0, 282.0, 1000.0, 618.0]
2025-09-16 16:33:24,566 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 33 minutes, 14 seconds)
2025-09-16 16:35:30,153 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:35:40,803 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3663.82153 ± 1372.817
2025-09-16 16:35:40,803 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5127.8506, 4145.6416, 2162.536, 3806.2932, 1490.4521, 5171.402, 3389.5652, 4459.7876, 5230.2715, 1654.4161]
2025-09-16 16:35:40,803 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 778.0, 420.0, 722.0, 284.0, 1000.0, 658.0, 842.0, 1000.0, 328.0]
2025-09-16 16:35:40,813 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 31 minutes, 29 seconds)
2025-09-16 16:37:35,505 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:37:46,655 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3317.77612 ± 1461.795
2025-09-16 16:37:46,655 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [2912.3667, 2958.5776, 4940.4614, 4755.824, 3309.5085, 4492.835, 5114.2856, 2809.1797, 1275.4036, 609.3201]
2025-09-16 16:37:46,655 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [584.0, 610.0, 1000.0, 985.0, 659.0, 884.0, 1000.0, 563.0, 233.0, 115.0]
2025-09-16 16:37:46,662 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 28 minutes, 57 seconds)
2025-09-16 16:39:49,192 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:40:02,531 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4559.89600 ± 1326.850
2025-09-16 16:40:02,531 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5258.6895, 5145.2227, 4743.47, 5166.129, 5235.933, 2446.945, 5246.89, 1488.0719, 5344.122, 5523.4883]
2025-09-16 16:40:02,531 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 918.0, 1000.0, 1000.0, 492.0, 1000.0, 290.0, 1000.0, 1000.0]
2025-09-16 16:40:02,531 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (4559.90) for latency 18
2025-09-16 16:40:02,539 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 26 minutes, 27 seconds)
2025-09-16 16:42:02,730 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:42:16,417 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4300.72461 ± 1306.732
2025-09-16 16:42:16,417 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5318.2744, 2752.8047, 1827.3284, 5157.6206, 2505.8235, 5293.4277, 4483.328, 5210.699, 5217.9614, 5239.98]
2025-09-16 16:42:16,417 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 505.0, 337.0, 1000.0, 480.0, 1000.0, 833.0, 998.0, 1000.0, 986.0]
2025-09-16 16:42:16,429 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 24 minutes, 40 seconds)
2025-09-16 16:44:30,592 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:44:41,301 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3777.61133 ± 1590.596
2025-09-16 16:44:41,301 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5217.3486, 2231.8242, 1393.9983, 5319.806, 5286.362, 3921.4814, 1111.35, 4681.3667, 5305.272, 3307.2998]
2025-09-16 16:44:41,301 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 433.0, 293.0, 1000.0, 1000.0, 705.0, 239.0, 880.0, 1000.0, 614.0]
2025-09-16 16:44:41,313 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 22 minutes, 33 seconds)
2025-09-16 16:46:40,608 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:46:47,930 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2563.85669 ± 1125.794
2025-09-16 16:46:47,930 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [2571.0728, 3559.4414, 1401.028, 2955.8257, 3091.2056, 1853.4221, 5159.434, 2105.7827, 1639.9573, 1301.398]
2025-09-16 16:46:47,930 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [478.0, 690.0, 268.0, 571.0, 579.0, 361.0, 1000.0, 386.0, 321.0, 265.0]
2025-09-16 16:46:47,942 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 20 minutes)
2025-09-16 16:48:46,945 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:49:01,404 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 5207.47998 ± 704.157
2025-09-16 16:49:01,404 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5474.9814, 5528.882, 5394.952, 5482.252, 5534.005, 5461.9, 5233.9375, 3108.9485, 5437.3228, 5417.62]
2025-09-16 16:49:01,404 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 545.0, 1000.0, 1000.0]
2025-09-16 16:49:01,404 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (5207.48) for latency 18
2025-09-16 16:49:01,413 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 17 minutes, 59 seconds)
2025-09-16 16:51:14,064 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:51:25,379 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4006.86597 ± 1586.724
2025-09-16 16:51:25,380 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5358.324, 1728.6586, 4915.199, 5399.112, 3086.9841, 2223.5881, 1499.5037, 5473.2266, 5441.282, 4942.7817]
2025-09-16 16:51:25,380 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 321.0, 945.0, 1000.0, 586.0, 396.0, 288.0, 1000.0, 1000.0, 919.0]
2025-09-16 16:51:25,391 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 15 minutes, 55 seconds)
2025-09-16 16:53:19,974 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:53:30,031 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3513.86450 ± 1318.291
2025-09-16 16:53:30,031 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5345.8955, 4469.597, 2852.5393, 5344.5957, 4272.7524, 1368.8926, 2518.744, 2813.9429, 4143.441, 2008.2472]
2025-09-16 16:53:30,031 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 858.0, 543.0, 1000.0, 823.0, 264.0, 474.0, 526.0, 811.0, 394.0]
2025-09-16 16:53:30,039 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 13 minutes, 28 seconds)
2025-09-16 16:55:32,535 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:55:45,818 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4310.21240 ± 1299.361
2025-09-16 16:55:45,818 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [3541.281, 5201.999, 5059.562, 1560.9532, 2251.678, 5126.5933, 5228.811, 5138.536, 5022.252, 4970.459]
2025-09-16 16:55:45,818 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [667.0, 1000.0, 1000.0, 291.0, 432.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 16:55:45,826 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 11 minutes, 4 seconds)
2025-09-16 16:57:46,332 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:57:57,865 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3450.96338 ± 1477.261
2025-09-16 16:57:57,865 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1301.0204, 5113.271, 2033.5736, 3190.1804, 1631.1641, 3607.1968, 2410.0183, 4925.6377, 5126.8525, 5170.7188]
2025-09-16 16:57:57,865 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [249.0, 1000.0, 424.0, 654.0, 335.0, 720.0, 489.0, 1000.0, 1000.0, 1000.0]
2025-09-16 16:57:57,873 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes, 55 seconds)
2025-09-16 17:00:10,595 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 17:00:24,768 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4740.25684 ± 1197.144
2025-09-16 17:00:24,768 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1215.0892, 5249.461, 5230.987, 5206.902, 5222.9453, 5209.387, 5218.9927, 5210.755, 4451.0415, 5187.0073]
2025-09-16 17:00:24,769 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [260.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 837.0, 1000.0]
2025-09-16 17:00:24,781 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 50 seconds)
2025-09-16 17:02:17,036 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 17:02:31,035 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4864.92725 ± 1362.767
2025-09-16 17:02:31,035 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5339.9663, 5373.6475, 5386.193, 5071.5405, 5329.2217, 785.07794, 5345.117, 5335.628, 5393.6963, 5289.187]
2025-09-16 17:02:31,035 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 971.0, 1000.0, 140.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 17:02:31,044 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 26 seconds)
2025-09-16 17:04:34,116 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 17:04:44,200 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3500.79492 ± 1099.664
2025-09-16 17:04:44,201 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [3330.5415, 5234.3335, 2866.989, 4269.594, 2191.3413, 2772.178, 3285.8945, 1931.6405, 5289.482, 3835.9558]
2025-09-16 17:04:44,201 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [629.0, 1000.0, 526.0, 798.0, 416.0, 530.0, 619.0, 368.0, 1000.0, 736.0]
2025-09-16 17:04:44,210 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 14 seconds)
2025-09-16 17:06:54,543 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 17:07:07,509 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4156.91748 ± 1328.860
2025-09-16 17:07:07,509 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [4559.1133, 4818.315, 5471.136, 5255.033, 5316.7334, 5376.5645, 1603.054, 3784.1667, 3107.08, 2277.979]
2025-09-16 17:07:07,509 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [844.0, 898.0, 1000.0, 1000.0, 1000.0, 1000.0, 285.0, 716.0, 570.0, 443.0]
2025-09-16 17:07:07,522 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1251 [DEBUG]: Training session finished
