2025-09-16 14:53:15,083 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.000-delay_24
2025-09-16 14:53:15,083 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.000-delay_24
2025-09-16 14:53:15,083 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'24': <latency_env.delayed_mdp.ConstantDelay object at 0x14e6ca23ca90>}
2025-09-16 14:53:15,083 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 14:53:15,088 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 14:53:15,107 baseline-bpql-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=784, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 14:53:15,107 baseline-bpql-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 14:53:16,927 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 14:53:16,928 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 14:55:10,321 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 14:55:11,692 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 423.46387 ± 41.839
2025-09-16 14:55:11,692 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [444.23694, 477.33774, 438.97992, 395.26276, 364.74405, 415.09448, 453.28903, 352.24173, 410.8628, 482.5891]
2025-09-16 14:55:11,692 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 91.0, 83.0, 74.0, 68.0, 78.0, 86.0, 66.0, 77.0, 92.0]
2025-09-16 14:55:11,692 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (423.46) for latency 24
2025-09-16 14:55:11,710 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 9 minutes, 23 seconds)
2025-09-16 14:57:12,446 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 14:57:14,158 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 538.13049 ± 127.627
2025-09-16 14:57:14,159 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [578.3593, 529.7541, 596.53143, 481.3465, 867.59155, 525.3517, 348.14853, 457.73956, 490.58417, 505.89877]
2025-09-16 14:57:14,159 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [110.0, 104.0, 120.0, 90.0, 170.0, 107.0, 73.0, 88.0, 96.0, 94.0]
2025-09-16 14:57:14,159 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (538.13) for latency 24
2025-09-16 14:57:14,163 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 13 minutes, 44 seconds)
2025-09-16 14:59:15,904 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 14:59:17,380 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 427.48325 ± 51.328
2025-09-16 14:59:17,380 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [385.99063, 383.44583, 391.77335, 398.61465, 498.36557, 416.13367, 489.16504, 409.31216, 522.36176, 379.67004]
2025-09-16 14:59:17,380 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 83.0, 86.0, 84.0, 106.0, 90.0, 106.0, 90.0, 117.0, 82.0]
2025-09-16 14:59:17,385 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 14 minutes, 14 seconds)
2025-09-16 15:01:19,432 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:01:20,730 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 432.90869 ± 33.417
2025-09-16 15:01:20,730 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [461.27924, 489.26447, 441.54413, 431.32144, 405.99316, 407.38797, 443.75372, 371.54962, 468.3976, 408.5956]
2025-09-16 15:01:20,730 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 91.0, 87.0, 81.0, 77.0, 80.0, 83.0, 71.0, 87.0, 80.0]
2025-09-16 15:01:20,733 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 13 minutes, 31 seconds)
2025-09-16 15:03:23,340 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:03:25,236 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 602.61505 ± 73.505
2025-09-16 15:03:25,236 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [620.3752, 570.31555, 668.6659, 743.36206, 552.305, 501.31766, 675.55536, 583.73505, 607.915, 502.60352]
2025-09-16 15:03:25,237 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [126.0, 112.0, 124.0, 140.0, 110.0, 102.0, 126.0, 113.0, 119.0, 107.0]
2025-09-16 15:03:25,237 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (602.62) for latency 24
2025-09-16 15:03:25,241 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 12 minutes, 37 seconds)
2025-09-16 15:05:27,067 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:05:28,556 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 489.36517 ± 60.953
2025-09-16 15:05:28,556 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [577.50836, 537.5862, 449.0132, 535.7624, 411.01028, 529.8593, 372.1766, 463.888, 494.4675, 522.3803]
2025-09-16 15:05:28,556 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [110.0, 102.0, 83.0, 103.0, 77.0, 100.0, 69.0, 86.0, 98.0, 98.0]
2025-09-16 15:05:28,560 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 13 minutes, 16 seconds)
2025-09-16 15:07:30,898 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:07:32,542 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 513.58966 ± 73.697
2025-09-16 15:07:32,542 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [441.36816, 410.65695, 476.95563, 587.29846, 497.70807, 477.78622, 651.2005, 601.1225, 534.4186, 457.38116]
2025-09-16 15:07:32,542 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [96.0, 78.0, 91.0, 112.0, 106.0, 90.0, 129.0, 116.0, 113.0, 86.0]
2025-09-16 15:07:32,555 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 11 minutes, 42 seconds)
2025-09-16 15:09:34,750 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:09:36,478 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 548.72491 ± 79.661
2025-09-16 15:09:36,478 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [483.64728, 415.17017, 532.6816, 573.71423, 524.8215, 682.7611, 493.4003, 566.3171, 530.8177, 683.91864]
2025-09-16 15:09:36,478 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 79.0, 98.0, 107.0, 105.0, 143.0, 107.0, 120.0, 111.0, 131.0]
2025-09-16 15:09:36,483 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 9 minutes, 51 seconds)
2025-09-16 15:11:39,253 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:11:40,922 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 509.60529 ± 67.702
2025-09-16 15:11:40,922 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [528.5019, 427.04352, 590.3035, 438.1688, 460.32852, 534.4294, 650.043, 504.37323, 517.9287, 444.9321]
2025-09-16 15:11:40,922 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [113.0, 80.0, 127.0, 92.0, 86.0, 114.0, 121.0, 109.0, 111.0, 82.0]
2025-09-16 15:11:40,930 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 3 hours, 8 minutes, 7 seconds)
2025-09-16 15:13:42,579 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:13:44,365 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 561.64618 ± 76.339
2025-09-16 15:13:44,365 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [536.5559, 610.1103, 578.892, 606.7089, 472.80322, 607.7247, 554.9411, 468.38446, 717.8105, 462.53085]
2025-09-16 15:13:44,365 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [112.0, 125.0, 113.0, 128.0, 88.0, 117.0, 105.0, 86.0, 139.0, 95.0]
2025-09-16 15:13:44,369 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 3 hours, 5 minutes, 44 seconds)
2025-09-16 15:15:47,308 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:15:49,030 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 550.29681 ± 109.383
2025-09-16 15:15:49,030 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [402.09552, 480.4408, 615.9706, 496.3106, 577.9364, 735.1984, 409.97614, 467.36487, 643.13885, 674.5363]
2025-09-16 15:15:49,030 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 91.0, 116.0, 95.0, 114.0, 141.0, 77.0, 88.0, 119.0, 144.0]
2025-09-16 15:15:49,041 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 3 hours, 4 minutes, 4 seconds)
2025-09-16 15:17:51,963 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:17:53,764 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 551.65674 ± 107.464
2025-09-16 15:17:53,764 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [675.1866, 600.6396, 498.64456, 733.07526, 509.4876, 442.82208, 474.21634, 458.13492, 691.62964, 432.73056]
2025-09-16 15:17:53,764 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [140.0, 125.0, 104.0, 154.0, 95.0, 90.0, 101.0, 97.0, 129.0, 88.0]
2025-09-16 15:17:53,768 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 3 hours, 2 minutes, 13 seconds)
2025-09-16 15:19:55,974 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:19:57,909 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 594.62585 ± 89.491
2025-09-16 15:19:57,909 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [633.1678, 507.1622, 500.38776, 542.31934, 648.3753, 570.7243, 450.92056, 650.64935, 722.68054, 719.8717]
2025-09-16 15:19:57,909 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [126.0, 94.0, 101.0, 116.0, 123.0, 123.0, 95.0, 124.0, 136.0, 145.0]
2025-09-16 15:19:57,912 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 3 hours, 12 seconds)
2025-09-16 15:22:00,873 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:22:02,827 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 618.33838 ± 85.757
2025-09-16 15:22:02,827 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [592.8657, 620.46735, 575.6071, 676.8779, 537.1369, 514.751, 729.0102, 722.44055, 723.2488, 490.97852]
2025-09-16 15:22:02,827 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [114.0, 132.0, 108.0, 129.0, 102.0, 109.0, 134.0, 143.0, 140.0, 103.0]
2025-09-16 15:22:02,827 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (618.34) for latency 24
2025-09-16 15:22:02,833 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 58 minutes, 16 seconds)
2025-09-16 15:24:06,024 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:24:07,846 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 569.22260 ± 44.193
2025-09-16 15:24:07,846 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [525.2314, 591.9971, 507.1083, 592.71857, 653.58484, 583.83545, 505.2082, 582.852, 552.61224, 597.0776]
2025-09-16 15:24:07,846 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [110.0, 121.0, 104.0, 111.0, 121.0, 109.0, 107.0, 115.0, 107.0, 121.0]
2025-09-16 15:24:07,854 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 56 minutes, 39 seconds)
2025-09-16 15:26:09,308 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:26:11,368 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 637.09064 ± 103.802
2025-09-16 15:26:11,368 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [586.03656, 721.5331, 664.30054, 854.26025, 553.44727, 709.4239, 553.6744, 461.37048, 630.4837, 636.376]
2025-09-16 15:26:11,368 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [112.0, 151.0, 128.0, 166.0, 105.0, 133.0, 118.0, 94.0, 124.0, 136.0]
2025-09-16 15:26:11,368 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (637.09) for latency 24
2025-09-16 15:26:11,375 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 54 minutes, 15 seconds)
2025-09-16 15:28:14,599 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:28:16,233 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 516.12823 ± 143.920
2025-09-16 15:28:16,233 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [439.59872, 454.00433, 485.94278, 717.42413, 489.13126, 837.9275, 564.0419, 429.26685, 384.1462, 359.79895]
2025-09-16 15:28:16,233 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 91.0, 93.0, 139.0, 96.0, 166.0, 115.0, 85.0, 80.0, 72.0]
2025-09-16 15:28:16,255 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 52 minutes, 13 seconds)
2025-09-16 15:30:18,452 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:30:20,254 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 545.02209 ± 95.764
2025-09-16 15:30:20,254 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [705.7009, 369.10773, 593.00665, 598.41876, 583.64905, 471.40527, 573.1871, 623.4475, 424.35382, 507.944]
2025-09-16 15:30:20,254 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [151.0, 71.0, 111.0, 127.0, 121.0, 102.0, 122.0, 116.0, 87.0, 96.0]
2025-09-16 15:30:20,268 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 50 minutes, 6 seconds)
2025-09-16 15:32:22,640 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:32:24,837 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 694.96686 ± 103.433
2025-09-16 15:32:24,837 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [830.57556, 581.15985, 781.8914, 743.0021, 615.704, 811.49005, 805.5848, 566.93774, 577.5439, 635.7786]
2025-09-16 15:32:24,837 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [161.0, 108.0, 153.0, 142.0, 115.0, 157.0, 168.0, 105.0, 110.0, 120.0]
2025-09-16 15:32:24,837 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (694.97) for latency 24
2025-09-16 15:32:24,843 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 47 minutes, 56 seconds)
2025-09-16 15:34:28,122 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:34:30,138 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 643.94568 ± 123.843
2025-09-16 15:34:30,138 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [931.1317, 500.13458, 672.6137, 689.85547, 638.3225, 603.2737, 599.34656, 484.8566, 754.26416, 565.65735]
2025-09-16 15:34:30,138 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [179.0, 98.0, 131.0, 143.0, 118.0, 114.0, 113.0, 95.0, 144.0, 104.0]
2025-09-16 15:34:30,142 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 45 minutes, 56 seconds)
2025-09-16 15:36:30,997 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:36:33,098 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 638.49890 ± 95.504
2025-09-16 15:36:33,098 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [762.97614, 530.87036, 621.8546, 744.3962, 550.2354, 510.05792, 680.8218, 532.28076, 730.94543, 720.5508]
2025-09-16 15:36:33,099 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [144.0, 114.0, 127.0, 140.0, 110.0, 107.0, 144.0, 115.0, 151.0, 137.0]
2025-09-16 15:36:33,109 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 43 minutes, 43 seconds)
2025-09-16 15:38:35,254 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:38:37,170 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 606.69037 ± 103.550
2025-09-16 15:38:37,170 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [529.0378, 602.1138, 719.0762, 495.1213, 627.3839, 440.71954, 694.72894, 712.8209, 742.4914, 503.40994]
2025-09-16 15:38:37,170 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 121.0, 151.0, 94.0, 115.0, 91.0, 132.0, 149.0, 137.0, 94.0]
2025-09-16 15:38:37,173 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 41 minutes, 26 seconds)
2025-09-16 15:40:40,904 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:40:43,009 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 635.34387 ± 150.872
2025-09-16 15:40:43,009 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [500.68008, 481.6544, 595.08344, 512.1805, 825.84296, 532.3632, 855.7238, 580.86365, 894.17145, 574.8756]
2025-09-16 15:40:43,009 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 92.0, 119.0, 108.0, 156.0, 102.0, 177.0, 118.0, 185.0, 116.0]
2025-09-16 15:40:43,015 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 39 minutes, 50 seconds)
2025-09-16 15:42:44,698 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:42:47,035 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 743.48181 ± 227.703
2025-09-16 15:42:47,035 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [519.9614, 773.6727, 801.2963, 697.9808, 731.1603, 580.2969, 751.42444, 1370.3656, 561.62665, 647.03326]
2025-09-16 15:42:47,035 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 150.0, 150.0, 149.0, 136.0, 116.0, 140.0, 281.0, 105.0, 119.0]
2025-09-16 15:42:47,035 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (743.48) for latency 24
2025-09-16 15:42:47,041 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 37 minutes, 37 seconds)
2025-09-16 15:44:49,076 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:44:51,085 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 662.91217 ± 165.099
2025-09-16 15:44:51,085 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [542.6419, 1074.326, 630.2361, 693.7839, 540.7911, 832.4924, 625.01135, 492.56332, 645.76575, 551.50977]
2025-09-16 15:44:51,085 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 210.0, 116.0, 127.0, 98.0, 162.0, 119.0, 91.0, 119.0, 102.0]
2025-09-16 15:44:51,089 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 35 minutes, 14 seconds)
2025-09-16 15:46:52,358 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:46:54,225 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 588.07727 ± 119.237
2025-09-16 15:46:54,225 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [483.47278, 419.4978, 576.23193, 682.4344, 867.677, 551.51776, 605.697, 517.54877, 521.36676, 655.3289]
2025-09-16 15:46:54,225 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 80.0, 122.0, 132.0, 166.0, 103.0, 116.0, 100.0, 100.0, 138.0]
2025-09-16 15:46:54,229 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 33 minutes, 12 seconds)
2025-09-16 15:48:57,015 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:48:59,222 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 701.14954 ± 96.247
2025-09-16 15:48:59,222 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [574.89624, 795.9369, 573.9311, 827.5064, 643.54346, 723.43555, 854.02454, 638.9517, 735.684, 643.586]
2025-09-16 15:48:59,222 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [123.0, 148.0, 123.0, 161.0, 127.0, 135.0, 161.0, 131.0, 140.0, 120.0]
2025-09-16 15:48:59,233 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 31 minutes, 22 seconds)
2025-09-16 15:51:03,225 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:51:05,794 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 796.71332 ± 165.601
2025-09-16 15:51:05,794 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [801.6086, 935.40344, 652.47974, 659.90594, 895.4923, 644.16895, 725.70636, 876.8683, 609.9117, 1165.5879]
2025-09-16 15:51:05,794 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [152.0, 179.0, 121.0, 125.0, 166.0, 133.0, 146.0, 169.0, 115.0, 242.0]
2025-09-16 15:51:05,794 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (796.71) for latency 24
2025-09-16 15:51:05,797 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 29 minutes, 28 seconds)
2025-09-16 15:53:06,930 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:53:09,537 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 836.32312 ± 236.498
2025-09-16 15:53:09,537 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [945.18677, 1121.2969, 1072.6393, 664.7483, 1025.2241, 671.26984, 418.89835, 803.84766, 553.4008, 1086.72]
2025-09-16 15:53:09,537 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [169.0, 228.0, 201.0, 122.0, 187.0, 133.0, 80.0, 155.0, 104.0, 219.0]
2025-09-16 15:53:09,537 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (836.32) for latency 24
2025-09-16 15:53:09,540 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 27 minutes, 19 seconds)
2025-09-16 15:55:11,351 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:55:13,603 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 686.01917 ± 107.084
2025-09-16 15:55:13,603 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [716.7388, 676.29663, 577.49023, 515.1983, 723.09924, 660.76416, 702.875, 593.43835, 776.852, 917.43945]
2025-09-16 15:55:13,603 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [136.0, 144.0, 122.0, 110.0, 140.0, 139.0, 147.0, 110.0, 141.0, 179.0]
2025-09-16 15:55:13,607 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 25 minutes, 15 seconds)
2025-09-16 15:57:17,011 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:57:19,026 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 645.62750 ± 206.703
2025-09-16 15:57:19,026 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [433.96713, 449.64392, 640.4426, 644.5356, 449.19962, 869.0263, 999.0753, 536.3329, 484.14566, 949.9058]
2025-09-16 15:57:19,026 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 86.0, 117.0, 125.0, 88.0, 178.0, 200.0, 101.0, 91.0, 175.0]
2025-09-16 15:57:19,032 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 23 minutes, 42 seconds)
2025-09-16 15:59:20,659 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:59:23,012 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 752.15765 ± 149.324
2025-09-16 15:59:23,012 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [865.6097, 964.86444, 553.548, 935.528, 540.4783, 878.38556, 691.3776, 772.16895, 589.4591, 730.1573]
2025-09-16 15:59:23,012 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [162.0, 181.0, 102.0, 186.0, 116.0, 158.0, 128.0, 167.0, 117.0, 133.0]
2025-09-16 15:59:23,017 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 21 minutes, 23 seconds)
2025-09-16 16:01:24,503 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:01:26,977 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 790.15491 ± 206.443
2025-09-16 16:01:26,977 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [977.8209, 913.9846, 544.2008, 897.3295, 767.91034, 696.7178, 1167.5175, 469.9145, 586.5205, 879.6327]
2025-09-16 16:01:26,977 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [180.0, 183.0, 102.0, 166.0, 140.0, 138.0, 226.0, 103.0, 110.0, 168.0]
2025-09-16 16:01:26,983 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 18 minutes, 43 seconds)
2025-09-16 16:03:27,840 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:03:30,094 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 727.30206 ± 70.228
2025-09-16 16:03:30,095 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [690.2656, 881.1751, 713.3657, 641.3203, 646.6388, 783.50214, 698.1784, 760.1628, 779.62555, 678.7865]
2025-09-16 16:03:30,095 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [126.0, 163.0, 132.0, 121.0, 122.0, 165.0, 131.0, 148.0, 145.0, 124.0]
2025-09-16 16:03:30,099 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 16 minutes, 31 seconds)
2025-09-16 16:05:32,307 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:05:34,421 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 679.31165 ± 134.815
2025-09-16 16:05:34,421 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [710.73145, 777.1674, 514.72473, 964.482, 529.2423, 665.79395, 539.74, 716.3832, 589.7362, 785.1147]
2025-09-16 16:05:34,421 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [138.0, 164.0, 98.0, 187.0, 107.0, 123.0, 100.0, 130.0, 107.0, 144.0]
2025-09-16 16:05:34,428 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 14 minutes, 30 seconds)
2025-09-16 16:07:36,036 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:07:38,629 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 848.52051 ± 208.037
2025-09-16 16:07:38,630 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1225.8601, 549.7745, 806.23505, 713.1433, 743.8666, 786.848, 639.9408, 1006.07086, 851.98065, 1161.4855]
2025-09-16 16:07:38,630 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [224.0, 114.0, 143.0, 136.0, 144.0, 149.0, 120.0, 183.0, 162.0, 216.0]
2025-09-16 16:07:38,630 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (848.52) for latency 24
2025-09-16 16:07:38,659 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 12 minutes, 11 seconds)
2025-09-16 16:09:40,732 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:09:43,230 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 782.30792 ± 236.507
2025-09-16 16:09:43,231 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [580.7735, 529.9305, 788.8756, 1289.5062, 648.49695, 1036.4813, 555.25256, 850.7098, 600.0949, 942.95807]
2025-09-16 16:09:43,231 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [119.0, 96.0, 146.0, 262.0, 135.0, 196.0, 115.0, 160.0, 121.0, 194.0]
2025-09-16 16:09:43,236 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 10 minutes, 14 seconds)
2025-09-16 16:11:44,774 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:11:47,277 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 781.50183 ± 200.200
2025-09-16 16:11:47,278 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1041.8994, 466.53107, 728.2075, 858.026, 828.5114, 628.5846, 969.31256, 633.55804, 1092.4852, 567.90173]
2025-09-16 16:11:47,278 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [198.0, 101.0, 147.0, 168.0, 156.0, 136.0, 181.0, 139.0, 206.0, 104.0]
2025-09-16 16:11:47,297 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 8 minutes, 11 seconds)
2025-09-16 16:13:49,256 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:13:52,003 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 876.00879 ± 184.726
2025-09-16 16:13:52,004 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [847.18066, 758.6368, 838.1617, 799.0298, 927.3789, 1089.2626, 685.8941, 1318.0651, 807.1762, 689.3022]
2025-09-16 16:13:52,004 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [151.0, 162.0, 156.0, 165.0, 190.0, 206.0, 125.0, 255.0, 146.0, 142.0]
2025-09-16 16:13:52,004 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (876.01) for latency 24
2025-09-16 16:13:52,012 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 6 minutes, 27 seconds)
2025-09-16 16:15:53,150 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:15:55,923 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 828.02844 ± 167.026
2025-09-16 16:15:55,924 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [811.2097, 941.9972, 1202.9653, 632.7073, 821.46625, 748.6161, 1005.8468, 728.6964, 731.38007, 655.39923]
2025-09-16 16:15:55,924 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [165.0, 179.0, 243.0, 134.0, 151.0, 158.0, 192.0, 158.0, 159.0, 140.0]
2025-09-16 16:15:55,931 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 4 minutes, 18 seconds)
2025-09-16 16:17:58,983 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:18:01,530 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 829.26257 ± 121.798
2025-09-16 16:18:01,530 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [782.1068, 736.51514, 971.4252, 677.23486, 984.8731, 1001.5245, 641.1993, 781.4989, 880.4344, 835.81323]
2025-09-16 16:18:01,530 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [145.0, 145.0, 177.0, 132.0, 188.0, 181.0, 116.0, 146.0, 160.0, 177.0]
2025-09-16 16:18:01,553 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 2 minutes, 30 seconds)
2025-09-16 16:20:02,174 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:20:04,720 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 829.66882 ± 170.698
2025-09-16 16:20:04,720 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1261.3077, 660.19403, 954.8347, 702.5108, 779.3457, 848.86035, 787.5566, 721.3383, 902.1983, 678.5417]
2025-09-16 16:20:04,720 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [232.0, 121.0, 187.0, 132.0, 145.0, 176.0, 164.0, 130.0, 165.0, 125.0]
2025-09-16 16:20:04,727 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 9 seconds)
2025-09-16 16:22:07,351 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:22:10,007 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 899.42773 ± 222.393
2025-09-16 16:22:10,008 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [633.2855, 876.13007, 1225.1782, 1034.482, 718.8888, 837.39105, 743.6776, 809.57227, 1357.7522, 757.9201]
2025-09-16 16:22:10,008 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [118.0, 154.0, 224.0, 197.0, 125.0, 154.0, 132.0, 144.0, 245.0, 141.0]
2025-09-16 16:22:10,008 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (899.43) for latency 24
2025-09-16 16:22:10,012 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 58 minutes, 18 seconds)
2025-09-16 16:24:10,996 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:24:13,814 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 837.94519 ± 167.119
2025-09-16 16:24:13,814 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1122.9619, 855.3819, 615.8917, 1039.924, 913.1101, 963.81555, 606.1302, 671.20416, 773.02386, 818.0087]
2025-09-16 16:24:13,814 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [237.0, 180.0, 133.0, 214.0, 181.0, 183.0, 128.0, 141.0, 157.0, 165.0]
2025-09-16 16:24:13,820 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 56 minutes, 4 seconds)
2025-09-16 16:26:16,740 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:26:19,263 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 808.73358 ± 144.312
2025-09-16 16:26:19,263 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [862.6382, 724.3522, 829.6149, 713.9155, 653.29614, 624.2983, 1134.9142, 891.7495, 734.8183, 917.73804]
2025-09-16 16:26:19,263 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [165.0, 150.0, 167.0, 144.0, 119.0, 135.0, 207.0, 170.0, 140.0, 171.0]
2025-09-16 16:26:19,271 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 54 minutes, 16 seconds)
2025-09-16 16:28:20,189 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:28:23,002 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 900.23792 ± 203.774
2025-09-16 16:28:23,002 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1121.255, 679.3736, 921.93304, 774.6303, 795.50244, 907.3947, 843.126, 1354.0023, 626.3687, 978.792]
2025-09-16 16:28:23,002 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [203.0, 141.0, 184.0, 147.0, 143.0, 167.0, 154.0, 259.0, 128.0, 202.0]
2025-09-16 16:28:23,002 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (900.24) for latency 24
2025-09-16 16:28:23,011 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 51 minutes, 51 seconds)
2025-09-16 16:30:26,523 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:30:29,557 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 932.00031 ± 298.707
2025-09-16 16:30:29,557 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [670.53613, 886.3283, 501.93692, 1372.0485, 748.2225, 835.10895, 1472.2483, 677.5057, 1097.586, 1058.4812]
2025-09-16 16:30:29,557 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [144.0, 161.0, 109.0, 271.0, 152.0, 172.0, 268.0, 130.0, 213.0, 219.0]
2025-09-16 16:30:29,558 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (932.00) for latency 24
2025-09-16 16:30:29,594 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 50 minutes, 23 seconds)
2025-09-16 16:32:30,683 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:32:33,548 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 904.05878 ± 154.664
2025-09-16 16:32:33,548 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [868.1433, 1083.5725, 881.43835, 1010.72296, 1231.6124, 828.47766, 683.5362, 773.8539, 780.82916, 898.40094]
2025-09-16 16:32:33,548 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [171.0, 200.0, 177.0, 218.0, 229.0, 153.0, 150.0, 147.0, 147.0, 179.0]
2025-09-16 16:32:33,556 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 48 minutes, 4 seconds)
2025-09-16 16:34:34,889 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:34:37,596 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 801.39398 ± 204.471
2025-09-16 16:34:37,597 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [648.92346, 824.4356, 584.97424, 644.5129, 645.45074, 1082.7148, 1219.0669, 846.365, 898.8999, 618.59674]
2025-09-16 16:34:37,597 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [139.0, 171.0, 123.0, 121.0, 139.0, 223.0, 248.0, 157.0, 185.0, 132.0]
2025-09-16 16:34:37,601 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 46 minutes, 2 seconds)
2025-09-16 16:36:39,940 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:36:42,847 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 896.41357 ± 126.585
2025-09-16 16:36:42,847 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1037.4996, 867.74756, 927.05646, 934.8444, 1094.5769, 791.87885, 631.7669, 990.35876, 880.13715, 808.2692]
2025-09-16 16:36:42,847 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [215.0, 170.0, 173.0, 200.0, 232.0, 150.0, 119.0, 185.0, 168.0, 155.0]
2025-09-16 16:36:42,855 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 43 minutes, 55 seconds)
2025-09-16 16:38:44,336 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:38:47,551 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1009.52771 ± 244.970
2025-09-16 16:38:47,551 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1058.0675, 1513.041, 665.78326, 716.132, 1170.7372, 875.526, 1064.5455, 1072.4364, 1188.368, 770.6408]
2025-09-16 16:38:47,552 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [210.0, 289.0, 140.0, 156.0, 212.0, 167.0, 211.0, 200.0, 246.0, 138.0]
2025-09-16 16:38:47,552 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (1009.53) for latency 24
2025-09-16 16:38:47,564 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 42 minutes)
2025-09-16 16:40:50,107 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:40:53,381 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1070.02734 ± 239.251
2025-09-16 16:40:53,381 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1266.688, 1462.9619, 1390.0302, 1104.3668, 1006.6453, 1044.7189, 861.3301, 1061.8739, 641.9653, 859.6935]
2025-09-16 16:40:53,381 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [230.0, 282.0, 275.0, 198.0, 185.0, 186.0, 165.0, 199.0, 116.0, 161.0]
2025-09-16 16:40:53,381 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (1070.03) for latency 24
2025-09-16 16:40:53,389 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 39 minutes, 48 seconds)
2025-09-16 16:42:55,970 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:42:58,891 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 933.56824 ± 278.262
2025-09-16 16:42:58,893 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1194.6825, 1117.3988, 1264.3943, 571.70264, 1092.3641, 496.38013, 815.06726, 720.82513, 1293.2747, 769.5929]
2025-09-16 16:42:58,893 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [219.0, 194.0, 230.0, 123.0, 207.0, 105.0, 159.0, 154.0, 252.0, 144.0]
2025-09-16 16:42:58,904 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 37 minutes, 58 seconds)
2025-09-16 16:45:00,081 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:45:02,766 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 858.88507 ± 200.071
2025-09-16 16:45:02,766 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [610.40424, 1053.7291, 725.79376, 603.9684, 1027.1868, 992.5424, 790.10364, 694.7619, 1233.0817, 857.27924]
2025-09-16 16:45:02,766 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [121.0, 209.0, 149.0, 133.0, 208.0, 175.0, 139.0, 140.0, 225.0, 154.0]
2025-09-16 16:45:02,772 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 35 minutes, 51 seconds)
2025-09-16 16:47:04,801 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:47:08,341 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1083.63489 ± 231.304
2025-09-16 16:47:08,342 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [885.9629, 1151.6096, 1126.6268, 1361.1852, 1026.6869, 1036.9905, 876.5823, 746.30554, 1042.1335, 1582.266]
2025-09-16 16:47:08,342 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [183.0, 235.0, 221.0, 253.0, 208.0, 188.0, 159.0, 150.0, 200.0, 298.0]
2025-09-16 16:47:08,342 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (1083.63) for latency 24
2025-09-16 16:47:08,347 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 33 minutes, 49 seconds)
2025-09-16 16:49:09,663 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:49:12,797 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 962.36072 ± 234.950
2025-09-16 16:49:12,797 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [946.2846, 835.06177, 775.34766, 596.40656, 1001.83997, 1108.0507, 783.57135, 990.8527, 1507.9484, 1078.2441]
2025-09-16 16:49:12,798 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [196.0, 169.0, 162.0, 126.0, 205.0, 219.0, 163.0, 180.0, 266.0, 211.0]
2025-09-16 16:49:12,803 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 31 minutes, 42 seconds)
2025-09-16 16:51:15,306 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:51:18,952 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1172.11670 ± 269.638
2025-09-16 16:51:18,952 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1701.7015, 1315.3772, 1149.5591, 1136.2031, 809.017, 1438.5323, 1299.519, 1020.08093, 751.5931, 1099.5841]
2025-09-16 16:51:18,952 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [314.0, 247.0, 203.0, 229.0, 152.0, 259.0, 267.0, 199.0, 133.0, 204.0]
2025-09-16 16:51:18,953 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (1172.12) for latency 24
2025-09-16 16:51:18,973 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 29 minutes, 40 seconds)
2025-09-16 16:53:20,845 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:53:24,499 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1166.79968 ± 411.271
2025-09-16 16:53:24,499 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [922.88025, 1136.6587, 713.39996, 1304.0311, 744.31476, 788.0907, 2125.8386, 1052.1156, 1473.2479, 1407.4188]
2025-09-16 16:53:24,499 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [164.0, 221.0, 152.0, 243.0, 147.0, 143.0, 405.0, 194.0, 255.0, 288.0]
2025-09-16 16:53:24,505 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 27 minutes, 35 seconds)
2025-09-16 16:55:26,846 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:55:30,330 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1029.95081 ± 322.669
2025-09-16 16:55:30,330 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1251.9829, 1483.7228, 650.5137, 590.8144, 1162.7864, 1160.2051, 909.0955, 1528.097, 662.9106, 899.3803]
2025-09-16 16:55:30,330 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [251.0, 292.0, 139.0, 128.0, 237.0, 232.0, 191.0, 299.0, 145.0, 174.0]
2025-09-16 16:55:30,335 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 25 minutes, 46 seconds)
2025-09-16 16:57:33,353 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:57:36,419 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 966.41162 ± 209.785
2025-09-16 16:57:36,419 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1173.5508, 745.0209, 1152.8832, 799.69556, 716.97015, 920.7444, 844.5812, 1307.5101, 1210.9099, 792.25037]
2025-09-16 16:57:36,419 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [216.0, 144.0, 232.0, 140.0, 157.0, 190.0, 161.0, 251.0, 221.0, 149.0]
2025-09-16 16:57:36,462 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 23 minutes, 44 seconds)
2025-09-16 16:59:39,443 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:59:43,391 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1283.97742 ± 439.100
2025-09-16 16:59:43,391 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [987.6923, 2110.355, 933.281, 1670.856, 1271.1965, 1179.7024, 609.23047, 1109.9252, 1091.1501, 1876.3848]
2025-09-16 16:59:43,391 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [186.0, 375.0, 192.0, 310.0, 229.0, 216.0, 125.0, 216.0, 193.0, 357.0]
2025-09-16 16:59:43,391 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (1283.98) for latency 24
2025-09-16 16:59:43,397 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 21 minutes, 58 seconds)
2025-09-16 17:01:43,870 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:01:47,610 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1198.36243 ± 313.505
2025-09-16 17:01:47,610 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1507.827, 879.0428, 641.0897, 804.65735, 1073.6863, 1478.8658, 1202.9921, 1411.5149, 1430.8779, 1553.0701]
2025-09-16 17:01:47,610 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [288.0, 183.0, 116.0, 167.0, 199.0, 271.0, 223.0, 270.0, 277.0, 285.0]
2025-09-16 17:01:47,619 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 19 minutes, 37 seconds)
2025-09-16 17:03:48,834 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:03:52,754 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1196.16321 ± 308.545
2025-09-16 17:03:52,754 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1497.3907, 1166.2639, 1515.3016, 752.1045, 1633.7609, 851.5935, 783.7794, 1322.5862, 1040.5586, 1398.2917]
2025-09-16 17:03:52,754 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [286.0, 243.0, 286.0, 164.0, 327.0, 175.0, 157.0, 249.0, 203.0, 276.0]
2025-09-16 17:03:52,792 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 17 minutes, 29 seconds)
2025-09-16 17:05:55,514 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:05:58,898 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1075.84985 ± 232.699
2025-09-16 17:05:58,898 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1303.8835, 830.2855, 1320.374, 816.4984, 1509.4645, 839.21606, 914.14215, 1114.6578, 1176.6182, 933.35925]
2025-09-16 17:05:58,898 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [230.0, 179.0, 223.0, 167.0, 289.0, 182.0, 189.0, 198.0, 210.0, 178.0]
2025-09-16 17:05:58,913 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 15 minutes, 25 seconds)
2025-09-16 17:08:02,462 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:08:06,250 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1186.98242 ± 344.639
2025-09-16 17:08:06,250 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1357.5052, 590.7751, 1408.5109, 1127.6862, 880.9464, 1720.1656, 1387.9926, 920.3278, 1595.7119, 880.2026]
2025-09-16 17:08:06,250 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [246.0, 119.0, 273.0, 225.0, 166.0, 345.0, 262.0, 186.0, 289.0, 189.0]
2025-09-16 17:08:06,259 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 13 minutes, 28 seconds)
2025-09-16 17:10:07,597 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:10:11,903 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1386.50513 ± 421.191
2025-09-16 17:10:11,903 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1052.2, 619.3392, 1538.0631, 1790.4354, 1007.27905, 2003.0687, 990.1159, 1649.4825, 1748.3088, 1466.7592]
2025-09-16 17:10:11,903 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [205.0, 126.0, 292.0, 344.0, 186.0, 385.0, 178.0, 303.0, 326.0, 269.0]
2025-09-16 17:10:11,903 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (1386.51) for latency 24
2025-09-16 17:10:11,912 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 11 minutes, 13 seconds)
2025-09-16 17:12:14,385 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:12:19,274 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1507.63086 ± 568.660
2025-09-16 17:12:19,274 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1523.444, 2301.647, 2825.1282, 1538.7611, 1290.1779, 1291.8403, 932.9172, 1148.3658, 1115.0101, 1109.0181]
2025-09-16 17:12:19,274 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [271.0, 455.0, 575.0, 293.0, 258.0, 258.0, 199.0, 235.0, 199.0, 200.0]
2025-09-16 17:12:19,274 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (1507.63) for latency 24
2025-09-16 17:12:19,282 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 9 minutes, 28 seconds)
2025-09-16 17:14:21,297 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:14:24,429 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1052.46851 ± 150.561
2025-09-16 17:14:24,429 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [950.38153, 1001.5642, 1351.8535, 1088.112, 1306.0568, 842.0818, 995.7068, 970.6997, 992.74756, 1025.4817]
2025-09-16 17:14:24,429 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [173.0, 184.0, 246.0, 195.0, 229.0, 151.0, 189.0, 172.0, 177.0, 182.0]
2025-09-16 17:14:24,436 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 7 minutes, 22 seconds)
2025-09-16 17:16:27,512 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:16:32,229 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1469.04150 ± 590.010
2025-09-16 17:16:32,229 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1060.5747, 1068.9828, 1498.7728, 1431.9646, 1956.7223, 2979.4375, 1079.2183, 969.43646, 1629.5859, 1015.7199]
2025-09-16 17:16:32,229 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [214.0, 213.0, 294.0, 263.0, 347.0, 546.0, 194.0, 192.0, 326.0, 206.0]
2025-09-16 17:16:32,235 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 5 minutes, 26 seconds)
2025-09-16 17:18:35,908 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:18:40,213 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1368.73657 ± 410.369
2025-09-16 17:18:40,213 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [988.4112, 1424.0664, 1352.5457, 1796.5852, 1614.6605, 957.4477, 795.25903, 1631.8019, 993.3906, 2133.1968]
2025-09-16 17:18:40,213 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [183.0, 271.0, 274.0, 330.0, 309.0, 176.0, 147.0, 329.0, 191.0, 390.0]
2025-09-16 17:18:40,222 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 3 minutes, 23 seconds)
2025-09-16 17:20:42,026 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:20:46,232 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1268.96448 ± 329.041
2025-09-16 17:20:46,232 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1880.8665, 1142.507, 1522.7544, 904.36145, 901.2461, 1651.1216, 1187.4365, 1190.587, 1449.8558, 858.9084]
2025-09-16 17:20:46,232 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [369.0, 232.0, 286.0, 191.0, 203.0, 331.0, 216.0, 257.0, 273.0, 182.0]
2025-09-16 17:20:46,241 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 1 minute, 19 seconds)
2025-09-16 17:22:47,827 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:22:52,681 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1558.72412 ± 349.147
2025-09-16 17:22:52,681 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1128.8673, 2123.515, 1324.8037, 1824.9232, 1639.4113, 1845.348, 1739.7928, 1181.9818, 1755.9708, 1022.62665]
2025-09-16 17:22:52,681 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [228.0, 398.0, 232.0, 353.0, 317.0, 324.0, 317.0, 243.0, 349.0, 173.0]
2025-09-16 17:22:52,681 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (1558.72) for latency 24
2025-09-16 17:22:52,687 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 59 minutes, 7 seconds)
2025-09-16 17:24:55,484 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:24:59,437 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1258.20801 ± 405.090
2025-09-16 17:24:59,437 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [785.3462, 879.5654, 1436.9778, 1162.7578, 1078.6962, 1122.321, 1397.6978, 2256.9543, 1517.0663, 944.69684]
2025-09-16 17:24:59,437 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [160.0, 170.0, 280.0, 224.0, 209.0, 227.0, 245.0, 416.0, 295.0, 162.0]
2025-09-16 17:24:59,446 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 57 minutes, 9 seconds)
2025-09-16 17:27:03,640 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:27:08,147 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1504.29126 ± 711.503
2025-09-16 17:27:08,147 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1346.3939, 1145.0847, 1634.3744, 843.84503, 1267.2574, 1057.2731, 3003.8083, 1398.6747, 2643.3022, 702.89954]
2025-09-16 17:27:08,147 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [239.0, 204.0, 310.0, 159.0, 226.0, 182.0, 561.0, 261.0, 470.0, 138.0]
2025-09-16 17:27:08,161 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 55 minutes, 6 seconds)
2025-09-16 17:29:11,743 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:29:16,009 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1356.38892 ± 411.154
2025-09-16 17:29:16,009 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1212.4468, 1316.0696, 985.4885, 2062.2534, 1419.1875, 1579.5029, 1003.12384, 861.357, 1050.3684, 2074.091]
2025-09-16 17:29:16,009 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [236.0, 266.0, 185.0, 368.0, 256.0, 274.0, 216.0, 173.0, 224.0, 380.0]
2025-09-16 17:29:16,019 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 52 minutes, 58 seconds)
2025-09-16 17:31:17,928 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:31:22,657 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1559.31372 ± 351.310
2025-09-16 17:31:22,657 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1194.5732, 2316.538, 1621.2557, 1718.6063, 1468.1173, 1507.6489, 982.9916, 1281.4607, 1702.2314, 1799.714]
2025-09-16 17:31:22,657 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [217.0, 409.0, 329.0, 314.0, 273.0, 257.0, 198.0, 239.0, 301.0, 333.0]
2025-09-16 17:31:22,657 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (1559.31) for latency 24
2025-09-16 17:31:22,666 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 50 minutes, 54 seconds)
2025-09-16 17:33:25,059 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:33:31,236 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2049.14380 ± 871.058
2025-09-16 17:33:31,236 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1927.7185, 2247.1372, 1257.8732, 1031.7834, 2778.0574, 1183.9744, 2433.6433, 3572.4653, 999.0081, 3059.7776]
2025-09-16 17:33:31,236 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [348.0, 406.0, 224.0, 186.0, 507.0, 230.0, 454.0, 671.0, 194.0, 548.0]
2025-09-16 17:33:31,236 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (2049.14) for latency 24
2025-09-16 17:33:31,247 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 48 minutes, 57 seconds)
2025-09-16 17:35:35,517 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:35:39,918 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1442.76868 ± 474.513
2025-09-16 17:35:39,919 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [2011.602, 949.5131, 1151.3983, 1035.7148, 1728.1772, 1144.971, 1266.4413, 901.3847, 2230.1538, 2008.329]
2025-09-16 17:35:39,919 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [361.0, 187.0, 194.0, 203.0, 335.0, 228.0, 221.0, 158.0, 412.0, 370.0]
2025-09-16 17:35:39,927 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 46 minutes, 58 seconds)
2025-09-16 17:37:39,765 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:37:43,852 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1225.39978 ± 409.748
2025-09-16 17:37:43,852 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [804.19116, 1348.6967, 2023.6282, 1093.0499, 739.61176, 1041.9587, 1084.2645, 1723.2384, 1580.2329, 815.127]
2025-09-16 17:37:43,852 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [173.0, 272.0, 390.0, 203.0, 160.0, 219.0, 197.0, 327.0, 310.0, 177.0]
2025-09-16 17:37:43,874 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 44 minutes, 29 seconds)
2025-09-16 17:39:47,944 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:39:51,832 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1258.23511 ± 428.067
2025-09-16 17:39:51,832 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [738.7025, 736.12494, 1844.7244, 908.34406, 1514.2421, 1820.2876, 885.14685, 1524.0292, 1642.8495, 967.90015]
2025-09-16 17:39:51,832 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [141.0, 155.0, 321.0, 173.0, 285.0, 333.0, 176.0, 282.0, 299.0, 203.0]
2025-09-16 17:39:51,842 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 42 minutes, 23 seconds)
2025-09-16 17:41:55,906 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:42:02,190 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2005.62280 ± 602.540
2025-09-16 17:42:02,190 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [2217.6157, 2373.5518, 1848.4694, 2349.7976, 1244.4244, 1687.6947, 3167.0083, 1037.0773, 2473.0674, 1657.5223]
2025-09-16 17:42:02,190 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [443.0, 446.0, 364.0, 426.0, 242.0, 327.0, 575.0, 200.0, 450.0, 290.0]
2025-09-16 17:42:02,206 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 40 minutes, 30 seconds)
2025-09-16 17:44:10,949 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:44:16,464 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1774.59692 ± 619.014
2025-09-16 17:44:16,464 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1303.1584, 2782.6172, 2218.6409, 1134.1067, 2527.513, 1373.8523, 1139.7001, 2484.888, 1223.0403, 1558.4529]
2025-09-16 17:44:16,464 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [260.0, 526.0, 407.0, 206.0, 475.0, 250.0, 230.0, 479.0, 247.0, 288.0]
2025-09-16 17:44:16,485 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 38 minutes, 42 seconds)
2025-09-16 17:46:12,265 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:46:16,758 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1423.01819 ± 251.991
2025-09-16 17:46:16,758 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1016.0136, 1328.0848, 1411.9541, 1391.8895, 1609.7042, 1468.9884, 1494.4867, 1300.5176, 2014.4258, 1194.1165]
2025-09-16 17:46:16,758 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [210.0, 232.0, 267.0, 276.0, 282.0, 265.0, 282.0, 272.0, 372.0, 245.0]
2025-09-16 17:46:16,766 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 36 minutes, 5 seconds)
2025-09-16 17:48:23,684 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:48:28,450 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1495.24377 ± 373.348
2025-09-16 17:48:28,451 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1863.5732, 2035.6823, 1091.6882, 1127.8751, 1384.0581, 1121.4745, 2143.6912, 1439.9694, 1548.917, 1195.5085]
2025-09-16 17:48:28,451 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [352.0, 369.0, 216.0, 220.0, 261.0, 203.0, 420.0, 282.0, 307.0, 222.0]
2025-09-16 17:48:28,459 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 34 minutes, 22 seconds)
2025-09-16 17:50:28,487 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:50:35,193 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2244.40967 ± 954.665
2025-09-16 17:50:35,193 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1060.485, 3712.246, 1821.6547, 3950.48, 2033.1802, 1070.722, 1627.9834, 2950.9949, 1853.5022, 2362.8481]
2025-09-16 17:50:35,193 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [186.0, 684.0, 314.0, 712.0, 362.0, 187.0, 296.0, 510.0, 331.0, 410.0]
2025-09-16 17:50:35,193 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (2244.41) for latency 24
2025-09-16 17:50:35,212 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 32 minutes, 10 seconds)
2025-09-16 17:52:36,608 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:52:42,523 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1857.41479 ± 1038.597
2025-09-16 17:52:42,523 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1323.6775, 1662.7299, 974.3472, 1461.4174, 879.043, 1637.4944, 1413.6929, 1517.5553, 3497.1462, 4207.044]
2025-09-16 17:52:42,523 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [268.0, 310.0, 190.0, 265.0, 187.0, 313.0, 261.0, 287.0, 661.0, 801.0]
2025-09-16 17:52:42,532 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 29 minutes, 52 seconds)
2025-09-16 17:54:50,893 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:54:54,574 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1165.62427 ± 411.246
2025-09-16 17:54:54,574 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1985.0981, 854.88995, 742.73834, 752.4099, 1088.6672, 1449.5015, 824.22705, 1752.83, 1008.5208, 1197.3613]
2025-09-16 17:54:54,574 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [350.0, 179.0, 141.0, 168.0, 193.0, 251.0, 161.0, 347.0, 196.0, 244.0]
2025-09-16 17:54:54,590 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 27 minutes, 39 seconds)
2025-09-16 17:56:54,194 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:57:01,548 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2296.07397 ± 995.681
2025-09-16 17:57:01,548 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [3023.0408, 1749.977, 1707.7686, 2581.3076, 4256.4478, 1656.4675, 3623.7368, 969.4263, 1408.7997, 1983.7689]
2025-09-16 17:57:01,548 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [589.0, 316.0, 303.0, 489.0, 842.0, 328.0, 684.0, 186.0, 256.0, 382.0]
2025-09-16 17:57:01,548 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (2296.07) for latency 24
2025-09-16 17:57:01,566 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 25 minutes, 47 seconds)
2025-09-16 17:59:07,312 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:59:12,650 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1607.41187 ± 1260.491
2025-09-16 17:59:12,650 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5340.4854, 1290.946, 952.88007, 1126.1039, 1662.0208, 1380.203, 973.2497, 1148.807, 1030.0564, 1169.3667]
2025-09-16 17:59:12,650 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 250.0, 184.0, 224.0, 342.0, 284.0, 193.0, 233.0, 192.0, 240.0]
2025-09-16 17:59:12,663 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 23 minutes, 37 seconds)
2025-09-16 18:01:13,387 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:01:18,114 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1502.85681 ± 1068.163
2025-09-16 18:01:18,114 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1098.6254, 591.80774, 1080.8848, 956.0913, 1528.1255, 4541.8545, 889.9384, 1737.9554, 992.4151, 1610.8706]
2025-09-16 18:01:18,114 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [221.0, 123.0, 207.0, 184.0, 270.0, 817.0, 191.0, 333.0, 182.0, 306.0]
2025-09-16 18:01:18,123 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 21 minutes, 25 seconds)
2025-09-16 18:03:19,947 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:03:26,898 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2213.56787 ± 1321.049
2025-09-16 18:03:26,898 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [2392.5312, 1609.6499, 1241.3044, 2433.0266, 1112.7295, 963.31177, 5541.248, 2861.6514, 2910.2139, 1070.0099]
2025-09-16 18:03:26,898 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [469.0, 282.0, 258.0, 468.0, 233.0, 190.0, 1000.0, 508.0, 507.0, 219.0]
2025-09-16 18:03:26,912 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 19 minutes, 19 seconds)
2025-09-16 18:05:31,497 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:05:35,411 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1234.21753 ± 652.186
2025-09-16 18:05:35,411 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1382.1687, 753.5591, 1814.5043, 720.6284, 1118.839, 692.9539, 875.94543, 1325.2135, 768.71985, 2889.644]
2025-09-16 18:05:35,411 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [262.0, 146.0, 370.0, 136.0, 224.0, 132.0, 167.0, 253.0, 148.0, 540.0]
2025-09-16 18:05:35,424 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 17 minutes, 5 seconds)
2025-09-16 18:07:39,958 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:07:45,397 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1902.58521 ± 492.430
2025-09-16 18:07:45,398 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [2283.6782, 2348.5776, 2114.0796, 2318.477, 1094.9545, 2271.1433, 1463.0399, 1092.2476, 1705.5098, 2334.1443]
2025-09-16 18:07:45,398 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [387.0, 393.0, 370.0, 418.0, 204.0, 392.0, 270.0, 189.0, 305.0, 406.0]
2025-09-16 18:07:45,406 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 15 minutes, 1 second)
2025-09-16 18:09:49,576 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:09:53,994 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1386.31543 ± 357.830
2025-09-16 18:09:53,994 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1802.4841, 1110.0166, 1354.3339, 862.22253, 903.7121, 1853.8014, 1290.9976, 1738.5134, 1194.7935, 1752.2794]
2025-09-16 18:09:53,994 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [329.0, 216.0, 240.0, 160.0, 170.0, 367.0, 267.0, 313.0, 255.0, 342.0]
2025-09-16 18:09:54,001 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 12 minutes, 49 seconds)
2025-09-16 18:11:52,571 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:11:59,073 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2089.74951 ± 1350.534
2025-09-16 18:11:59,073 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [2580.213, 5467.3447, 1046.1906, 833.306, 1873.093, 1809.5859, 3366.9297, 1763.6293, 1165.9064, 991.29675]
2025-09-16 18:11:59,073 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [488.0, 1000.0, 206.0, 161.0, 347.0, 357.0, 610.0, 309.0, 207.0, 182.0]
2025-09-16 18:11:59,110 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 10 minutes, 40 seconds)
2025-09-16 18:14:02,249 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:14:09,890 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2523.47754 ± 982.732
2025-09-16 18:14:09,890 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [3854.9104, 2265.9668, 2971.8699, 1145.4318, 1475.9836, 3469.0403, 3513.2917, 1318.8916, 1809.2006, 3410.1868]
2025-09-16 18:14:09,890 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [701.0, 404.0, 561.0, 204.0, 263.0, 612.0, 626.0, 243.0, 334.0, 633.0]
2025-09-16 18:14:09,890 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (2523.48) for latency 24
2025-09-16 18:14:09,901 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes, 34 seconds)
2025-09-16 18:16:14,138 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:16:22,185 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2661.56128 ± 982.643
2025-09-16 18:16:22,185 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [2052.0715, 2300.1858, 3512.784, 2601.0647, 4928.8945, 1525.5953, 3327.1423, 2031.0804, 1590.7235, 2746.0732]
2025-09-16 18:16:22,185 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [352.0, 401.0, 644.0, 483.0, 901.0, 274.0, 601.0, 343.0, 303.0, 472.0]
2025-09-16 18:16:22,185 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (2661.56) for latency 24
2025-09-16 18:16:22,197 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 28 seconds)
2025-09-16 18:18:26,111 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:18:33,481 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2460.99146 ± 1031.009
2025-09-16 18:18:33,482 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [2140.6719, 3301.5706, 3646.4104, 1863.2974, 4405.528, 1420.6466, 2918.321, 2428.2363, 1154.4971, 1330.7349]
2025-09-16 18:18:33,482 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [380.0, 584.0, 617.0, 318.0, 796.0, 256.0, 531.0, 447.0, 200.0, 268.0]
2025-09-16 18:18:33,497 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 19 seconds)
2025-09-16 18:20:44,428 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:20:53,008 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2694.20972 ± 1604.257
2025-09-16 18:20:53,008 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1717.184, 1714.4281, 1324.6084, 4292.897, 1802.4689, 5432.9287, 5496.5728, 1342.1555, 1608.2673, 2210.5889]
2025-09-16 18:20:53,008 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [347.0, 322.0, 238.0, 777.0, 319.0, 1000.0, 1000.0, 246.0, 315.0, 388.0]
2025-09-16 18:20:53,008 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (2694.21) for latency 24
2025-09-16 18:20:53,019 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 11 seconds)
2025-09-16 18:22:49,521 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:23:00,931 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3639.31567 ± 1295.487
2025-09-16 18:23:00,932 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1882.9695, 4776.2964, 5632.197, 4948.085, 4338.3164, 2124.8696, 2699.7505, 2481.6638, 2835.45, 4673.5586]
2025-09-16 18:23:00,932 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [332.0, 852.0, 1000.0, 922.0, 775.0, 391.0, 499.0, 467.0, 561.0, 849.0]
2025-09-16 18:23:00,932 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (3639.32) for latency 24
2025-09-16 18:23:00,942 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1251 [DEBUG]: Training session finished
