2025-09-16 14:42:02,968 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.000-delay_21
2025-09-16 14:42:02,968 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.000-delay_21
2025-09-16 14:42:02,969 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'21': <latency_env.delayed_mdp.ConstantDelay object at 0x1493e9420790>}
2025-09-16 14:42:02,969 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 14:42:02,974 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 14:42:02,993 baseline-bpql-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=733, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 14:42:02,993 baseline-bpql-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 14:42:04,753 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 14:42:04,754 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 14:43:55,616 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 14:43:56,810 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 386.60275 ± 23.618
2025-09-16 14:43:56,810 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [419.12442, 416.06433, 361.54617, 425.1383, 381.75464, 361.57242, 361.257, 386.1793, 381.72574, 371.66525]
2025-09-16 14:43:56,810 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 79.0, 70.0, 83.0, 76.0, 69.0, 69.0, 73.0, 73.0, 72.0]
2025-09-16 14:43:56,810 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (386.60) for latency 21
2025-09-16 14:43:56,816 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 4 minutes, 54 seconds)
2025-09-16 14:45:58,948 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 14:46:00,302 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 468.37680 ± 85.692
2025-09-16 14:46:00,302 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [698.93475, 423.26935, 382.49814, 492.5937, 476.6894, 414.0442, 406.94934, 429.8943, 449.80637, 509.0889]
2025-09-16 14:46:00,302 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [133.0, 78.0, 71.0, 100.0, 91.0, 77.0, 76.0, 81.0, 84.0, 95.0]
2025-09-16 14:46:00,302 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (468.38) for latency 21
2025-09-16 14:46:00,305 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 12 minutes, 22 seconds)
2025-09-16 14:48:02,082 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 14:48:03,345 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 433.72760 ± 111.665
2025-09-16 14:48:03,345 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [443.2766, 756.629, 429.6697, 413.355, 401.59045, 392.11777, 421.57504, 340.64236, 369.60995, 368.80994]
2025-09-16 14:48:03,345 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 154.0, 84.0, 82.0, 76.0, 79.0, 80.0, 67.0, 71.0, 71.0]
2025-09-16 14:48:03,348 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 13 minutes, 14 seconds)
2025-09-16 14:50:05,155 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 14:50:06,494 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 450.82684 ± 86.112
2025-09-16 14:50:06,494 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [371.33005, 561.1995, 394.40747, 400.21417, 514.94434, 385.0403, 343.89615, 440.4147, 622.1644, 474.65765]
2025-09-16 14:50:06,494 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [72.0, 106.0, 74.0, 86.0, 100.0, 73.0, 66.0, 84.0, 128.0, 91.0]
2025-09-16 14:50:06,497 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 12 minutes, 41 seconds)
2025-09-16 14:52:07,809 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 14:52:09,045 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 419.68231 ± 56.153
2025-09-16 14:52:09,046 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [390.33615, 416.92026, 423.85666, 357.21576, 552.9576, 404.3446, 441.6271, 462.30154, 340.41556, 406.8479]
2025-09-16 14:52:09,046 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 78.0, 80.0, 69.0, 109.0, 85.0, 85.0, 87.0, 66.0, 84.0]
2025-09-16 14:52:09,052 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 11 minutes, 21 seconds)
2025-09-16 14:54:09,965 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 14:54:11,251 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 442.47095 ± 49.148
2025-09-16 14:54:11,251 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [421.40213, 484.26895, 384.26318, 538.6051, 363.76044, 480.1027, 446.66324, 464.49893, 432.0056, 409.139]
2025-09-16 14:54:11,251 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 90.0, 82.0, 99.0, 68.0, 101.0, 83.0, 85.0, 82.0, 77.0]
2025-09-16 14:54:11,285 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 12 minutes, 32 seconds)
2025-09-16 14:56:10,407 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 14:56:11,811 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 490.35297 ± 61.681
2025-09-16 14:56:11,811 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [421.98947, 504.3261, 488.76947, 580.8464, 532.2477, 519.6819, 555.03503, 388.57922, 508.33142, 403.72305]
2025-09-16 14:56:11,811 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 94.0, 91.0, 111.0, 115.0, 100.0, 103.0, 74.0, 101.0, 76.0]
2025-09-16 14:56:11,811 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (490.35) for latency 21
2025-09-16 14:56:11,828 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 9 minutes, 34 seconds)
2025-09-16 14:58:11,993 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 14:58:13,485 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 503.54340 ± 105.858
2025-09-16 14:58:13,485 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [454.6874, 500.56335, 402.96603, 642.03656, 488.9532, 410.12524, 510.6283, 745.5542, 488.55484, 391.3652]
2025-09-16 14:58:13,485 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 94.0, 75.0, 126.0, 98.0, 78.0, 111.0, 148.0, 102.0, 73.0]
2025-09-16 14:58:13,485 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (503.54) for latency 21
2025-09-16 14:58:13,491 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 7 minutes, 6 seconds)
2025-09-16 15:00:12,994 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:00:14,452 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 506.98306 ± 97.323
2025-09-16 15:00:14,452 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [593.6464, 402.1702, 631.6984, 465.67172, 486.34555, 534.73096, 684.89246, 481.30228, 383.89987, 405.47275]
2025-09-16 15:00:14,452 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [117.0, 75.0, 118.0, 88.0, 93.0, 104.0, 132.0, 90.0, 73.0, 75.0]
2025-09-16 15:00:14,452 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (506.98) for latency 21
2025-09-16 15:00:14,473 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 3 hours, 4 minutes, 25 seconds)
2025-09-16 15:02:14,407 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:02:15,930 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 512.96021 ± 66.414
2025-09-16 15:02:15,930 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [642.7616, 434.02036, 536.6561, 492.46567, 448.8423, 456.65005, 594.91394, 512.1988, 451.06036, 560.03253]
2025-09-16 15:02:15,930 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [118.0, 80.0, 106.0, 96.0, 97.0, 96.0, 123.0, 112.0, 83.0, 113.0]
2025-09-16 15:02:15,930 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (512.96) for latency 21
2025-09-16 15:02:15,939 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 3 hours, 2 minutes, 3 seconds)
2025-09-16 15:04:15,861 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:04:17,412 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 524.63239 ± 66.155
2025-09-16 15:04:17,412 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [536.1392, 553.2038, 479.0852, 496.00037, 597.579, 612.3324, 418.8052, 619.3416, 473.1587, 460.6784]
2025-09-16 15:04:17,412 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [110.0, 106.0, 91.0, 99.0, 114.0, 113.0, 90.0, 134.0, 89.0, 86.0]
2025-09-16 15:04:17,412 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (524.63) for latency 21
2025-09-16 15:04:17,417 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 59 minutes, 49 seconds)
2025-09-16 15:06:16,651 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:06:18,331 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 563.56409 ± 100.576
2025-09-16 15:06:18,332 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [556.44135, 742.59875, 430.2198, 456.36197, 470.53577, 529.74347, 677.4644, 518.9054, 686.02814, 567.34125]
2025-09-16 15:06:18,332 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [105.0, 136.0, 80.0, 98.0, 97.0, 98.0, 128.0, 112.0, 148.0, 106.0]
2025-09-16 15:06:18,332 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (563.56) for latency 21
2025-09-16 15:06:18,337 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 57 minutes, 54 seconds)
2025-09-16 15:08:18,202 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:08:19,921 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 563.26355 ± 110.602
2025-09-16 15:08:19,921 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [657.71906, 529.3261, 479.34927, 457.47928, 733.7273, 565.5498, 655.06305, 501.04138, 684.78033, 368.60025]
2025-09-16 15:08:19,921 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [126.0, 108.0, 90.0, 85.0, 140.0, 119.0, 135.0, 107.0, 145.0, 78.0]
2025-09-16 15:08:19,929 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 55 minutes, 52 seconds)
2025-09-16 15:10:20,433 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:10:22,126 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 582.76385 ± 105.191
2025-09-16 15:10:22,127 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [579.89764, 713.39594, 657.7951, 341.4646, 648.9153, 532.1831, 573.8699, 471.307, 651.0146, 657.7954]
2025-09-16 15:10:22,127 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [108.0, 149.0, 124.0, 68.0, 123.0, 101.0, 109.0, 88.0, 124.0, 123.0]
2025-09-16 15:10:22,127 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (582.76) for latency 21
2025-09-16 15:10:22,149 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 54 minutes, 12 seconds)
2025-09-16 15:12:21,581 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:12:23,290 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 580.55713 ± 92.170
2025-09-16 15:12:23,290 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [490.83435, 481.6727, 632.69684, 410.89984, 531.8089, 710.21783, 624.80786, 620.36316, 689.4077, 612.8616]
2025-09-16 15:12:23,290 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [91.0, 90.0, 118.0, 87.0, 102.0, 137.0, 116.0, 119.0, 142.0, 132.0]
2025-09-16 15:12:23,297 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 52 minutes, 5 seconds)
2025-09-16 15:14:22,951 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:14:24,656 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 572.98724 ± 115.170
2025-09-16 15:14:24,656 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [377.4286, 625.5158, 595.2548, 458.25995, 459.4904, 512.1576, 573.43774, 769.3945, 670.8983, 688.0346]
2025-09-16 15:14:24,656 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 119.0, 112.0, 97.0, 96.0, 110.0, 109.0, 143.0, 130.0, 130.0]
2025-09-16 15:14:24,660 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 50 minutes, 1 second)
2025-09-16 15:16:24,964 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:16:26,653 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 570.76813 ± 75.939
2025-09-16 15:16:26,653 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [651.39246, 675.94995, 661.02295, 582.486, 436.06915, 580.6216, 573.0766, 516.31256, 562.6614, 468.08853]
2025-09-16 15:16:26,653 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [123.0, 128.0, 127.0, 108.0, 81.0, 115.0, 108.0, 110.0, 120.0, 89.0]
2025-09-16 15:16:26,662 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 48 minutes, 18 seconds)
2025-09-16 15:18:27,422 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:18:28,839 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 497.11295 ± 131.383
2025-09-16 15:18:28,839 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [411.47888, 855.0215, 433.12408, 397.45935, 594.2194, 422.68124, 447.2189, 508.496, 425.58917, 475.8408]
2025-09-16 15:18:28,839 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 164.0, 83.0, 78.0, 111.0, 82.0, 89.0, 94.0, 80.0, 89.0]
2025-09-16 15:18:28,852 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 46 minutes, 26 seconds)
2025-09-16 15:20:29,642 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:20:31,571 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 632.44733 ± 183.852
2025-09-16 15:20:31,571 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [460.47754, 664.67365, 652.1046, 538.52936, 467.6695, 537.31555, 1105.3702, 782.40204, 606.7392, 509.1915]
2025-09-16 15:20:31,571 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 124.0, 125.0, 103.0, 98.0, 101.0, 227.0, 155.0, 113.0, 109.0]
2025-09-16 15:20:31,572 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (632.45) for latency 21
2025-09-16 15:20:31,577 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 44 minutes, 32 seconds)
2025-09-16 15:22:31,031 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:22:32,832 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 580.40564 ± 190.158
2025-09-16 15:22:32,833 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [618.5823, 435.9354, 610.8608, 383.36914, 703.1003, 535.82056, 396.92877, 437.54755, 1056.1687, 625.7429]
2025-09-16 15:22:32,833 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [115.0, 83.0, 117.0, 72.0, 147.0, 101.0, 83.0, 85.0, 205.0, 128.0]
2025-09-16 15:22:32,839 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 42 minutes, 32 seconds)
2025-09-16 15:24:33,234 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:24:34,858 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 554.09766 ± 76.302
2025-09-16 15:24:34,859 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [583.0762, 593.1214, 563.0434, 600.0357, 538.58954, 402.40582, 649.32227, 513.67456, 648.0001, 449.7078]
2025-09-16 15:24:34,859 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 113.0, 102.0, 113.0, 99.0, 83.0, 130.0, 111.0, 125.0, 97.0]
2025-09-16 15:24:34,865 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 40 minutes, 41 seconds)
2025-09-16 15:26:37,600 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:26:39,190 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 524.56177 ± 113.669
2025-09-16 15:26:39,190 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [453.15216, 496.15927, 506.2855, 356.95932, 353.62155, 645.0746, 558.51385, 734.47156, 599.5101, 541.8697]
2025-09-16 15:26:39,191 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 106.0, 105.0, 77.0, 71.0, 121.0, 116.0, 136.0, 125.0, 100.0]
2025-09-16 15:26:39,198 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 39 minutes, 15 seconds)
2025-09-16 15:28:42,039 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:28:43,896 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 622.73749 ± 144.869
2025-09-16 15:28:43,896 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [644.45575, 711.0536, 718.10406, 557.497, 469.86978, 425.67917, 912.26447, 623.39844, 442.9052, 722.1475]
2025-09-16 15:28:43,896 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [132.0, 133.0, 136.0, 104.0, 89.0, 92.0, 177.0, 118.0, 96.0, 139.0]
2025-09-16 15:28:43,902 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 37 minutes, 51 seconds)
2025-09-16 15:30:45,239 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:30:47,108 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 640.56409 ± 158.963
2025-09-16 15:30:47,108 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [903.35126, 708.0529, 622.52673, 448.8279, 834.31, 599.67926, 763.42773, 659.58923, 409.39624, 456.47925]
2025-09-16 15:30:47,108 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [174.0, 150.0, 119.0, 83.0, 162.0, 113.0, 142.0, 127.0, 89.0, 86.0]
2025-09-16 15:30:47,108 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (640.56) for latency 21
2025-09-16 15:30:47,113 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 35 minutes, 56 seconds)
2025-09-16 15:32:49,933 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:32:52,042 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 726.67950 ± 212.555
2025-09-16 15:32:52,042 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [626.60065, 570.4328, 1038.0692, 799.66956, 505.78513, 532.31573, 559.49664, 774.71857, 1170.3596, 689.3479]
2025-09-16 15:32:52,042 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [125.0, 113.0, 197.0, 154.0, 95.0, 97.0, 120.0, 139.0, 230.0, 129.0]
2025-09-16 15:32:52,042 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (726.68) for latency 21
2025-09-16 15:32:52,046 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 34 minutes, 48 seconds)
2025-09-16 15:34:54,793 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:34:56,680 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 653.34583 ± 142.164
2025-09-16 15:34:56,681 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [742.4941, 457.87073, 582.2855, 643.55023, 555.3556, 508.22644, 889.139, 697.1087, 888.8169, 568.61115]
2025-09-16 15:34:56,681 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [140.0, 86.0, 107.0, 123.0, 102.0, 95.0, 170.0, 128.0, 172.0, 122.0]
2025-09-16 15:34:56,691 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 33 minutes, 23 seconds)
2025-09-16 15:36:55,573 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:36:57,430 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 599.31036 ± 147.642
2025-09-16 15:36:57,431 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [486.76474, 586.0466, 553.86365, 461.51215, 653.6135, 472.1433, 735.76746, 601.8149, 962.4733, 479.10397]
2025-09-16 15:36:57,431 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [105.0, 107.0, 116.0, 101.0, 140.0, 99.0, 138.0, 111.0, 200.0, 106.0]
2025-09-16 15:36:57,439 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 30 minutes, 26 seconds)
2025-09-16 15:38:58,548 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:39:00,738 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 705.44940 ± 143.884
2025-09-16 15:39:00,738 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [678.75964, 688.28345, 898.1489, 806.73944, 853.5118, 487.2068, 843.43243, 447.47617, 646.42065, 704.5149]
2025-09-16 15:39:00,738 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [143.0, 141.0, 172.0, 169.0, 175.0, 106.0, 168.0, 96.0, 120.0, 140.0]
2025-09-16 15:39:00,744 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 28 minutes, 2 seconds)
2025-09-16 15:41:00,853 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:41:03,030 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 722.45642 ± 185.705
2025-09-16 15:41:03,030 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [626.42194, 817.29596, 705.5942, 985.78516, 1017.13806, 869.2478, 454.16943, 566.93, 685.6031, 496.3784]
2025-09-16 15:41:03,030 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [129.0, 175.0, 139.0, 193.0, 202.0, 170.0, 96.0, 105.0, 132.0, 104.0]
2025-09-16 15:41:03,036 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 25 minutes, 46 seconds)
2025-09-16 15:43:02,149 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:43:04,326 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 727.46472 ± 135.612
2025-09-16 15:43:04,326 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [811.5625, 823.75146, 947.13947, 830.0423, 649.3678, 490.53857, 576.3775, 590.7278, 776.0987, 779.0413]
2025-09-16 15:43:04,326 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [167.0, 163.0, 181.0, 159.0, 134.0, 105.0, 109.0, 116.0, 160.0, 142.0]
2025-09-16 15:43:04,326 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (727.46) for latency 21
2025-09-16 15:43:04,331 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 22 minutes, 51 seconds)
2025-09-16 15:45:05,362 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:45:07,301 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 643.49463 ± 90.945
2025-09-16 15:45:07,301 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [751.15247, 606.8274, 474.2535, 679.8652, 662.70605, 820.80927, 651.95917, 606.65106, 568.8639, 611.8582]
2025-09-16 15:45:07,301 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [143.0, 114.0, 100.0, 151.0, 123.0, 154.0, 134.0, 126.0, 124.0, 119.0]
2025-09-16 15:45:07,306 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 20 minutes, 26 seconds)
2025-09-16 15:47:07,206 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:47:09,188 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 686.57037 ± 213.235
2025-09-16 15:47:09,188 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [437.31168, 808.9201, 528.39374, 1195.9827, 597.0, 759.46466, 835.789, 583.8022, 645.03174, 474.00775]
2025-09-16 15:47:09,188 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 151.0, 97.0, 249.0, 124.0, 140.0, 159.0, 109.0, 125.0, 87.0]
2025-09-16 15:47:09,195 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 18 minutes, 39 seconds)
2025-09-16 15:49:09,753 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:49:11,491 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 580.24982 ± 132.186
2025-09-16 15:49:11,491 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [537.51733, 892.64233, 423.511, 651.8035, 613.6615, 511.16724, 473.1037, 505.93808, 491.2094, 701.94385]
2025-09-16 15:49:11,491 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [113.0, 167.0, 89.0, 128.0, 113.0, 97.0, 89.0, 105.0, 97.0, 143.0]
2025-09-16 15:49:11,496 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 16 minutes, 24 seconds)
2025-09-16 15:51:12,311 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:51:14,325 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 694.32605 ± 169.452
2025-09-16 15:51:14,325 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [668.91675, 813.8145, 666.1897, 815.2098, 1103.0299, 597.18475, 596.5353, 642.58093, 459.70593, 580.09326]
2025-09-16 15:51:14,325 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [141.0, 163.0, 121.0, 156.0, 208.0, 112.0, 111.0, 120.0, 99.0, 111.0]
2025-09-16 15:51:14,330 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 14 minutes, 29 seconds)
2025-09-16 15:53:14,179 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:53:16,302 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 735.46857 ± 221.154
2025-09-16 15:53:16,303 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [592.5164, 845.5839, 486.47034, 513.68646, 798.24396, 762.57904, 1274.981, 838.11914, 701.7696, 540.73596]
2025-09-16 15:53:16,303 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [108.0, 156.0, 91.0, 95.0, 157.0, 146.0, 243.0, 179.0, 130.0, 103.0]
2025-09-16 15:53:16,303 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (735.47) for latency 21
2025-09-16 15:53:16,308 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 12 minutes, 35 seconds)
2025-09-16 15:55:15,605 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:55:17,858 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 756.42786 ± 396.282
2025-09-16 15:55:17,858 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [522.167, 544.3854, 571.9493, 641.1735, 604.3647, 613.60815, 791.20483, 1924.4347, 699.6834, 651.30725]
2025-09-16 15:55:17,858 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [114.0, 103.0, 106.0, 138.0, 111.0, 115.0, 146.0, 375.0, 133.0, 125.0]
2025-09-16 15:55:17,858 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (756.43) for latency 21
2025-09-16 15:55:17,865 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 10 minutes, 15 seconds)
2025-09-16 15:57:18,080 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:57:20,118 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 706.88135 ± 92.572
2025-09-16 15:57:20,118 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [727.9636, 615.8869, 838.8298, 561.02515, 871.68805, 696.9228, 701.9886, 664.8723, 761.7039, 627.9323]
2025-09-16 15:57:20,118 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [140.0, 110.0, 160.0, 104.0, 175.0, 132.0, 140.0, 119.0, 145.0, 123.0]
2025-09-16 15:57:20,123 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 8 minutes, 17 seconds)
2025-09-16 15:59:21,356 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:59:23,627 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 775.54163 ± 142.585
2025-09-16 15:59:23,627 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [699.00555, 633.02374, 719.92883, 1106.4534, 824.4233, 597.00684, 771.65234, 683.31415, 799.533, 921.0748]
2025-09-16 15:59:23,627 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [135.0, 137.0, 134.0, 211.0, 161.0, 118.0, 144.0, 135.0, 160.0, 169.0]
2025-09-16 15:59:23,627 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (775.54) for latency 21
2025-09-16 15:59:23,635 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 6 minutes, 30 seconds)
2025-09-16 16:01:24,611 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:01:27,068 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 804.46924 ± 158.970
2025-09-16 16:01:27,069 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [612.2447, 682.211, 939.9423, 732.6013, 1067.5614, 565.89044, 810.84827, 997.53577, 735.8209, 900.0363]
2025-09-16 16:01:27,069 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [114.0, 141.0, 192.0, 144.0, 216.0, 122.0, 166.0, 196.0, 156.0, 170.0]
2025-09-16 16:01:27,069 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (804.47) for latency 21
2025-09-16 16:01:27,078 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 4 minutes, 35 seconds)
2025-09-16 16:03:27,167 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:03:29,375 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 749.46747 ± 123.954
2025-09-16 16:03:29,375 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [564.2249, 661.43, 728.851, 853.59424, 906.4892, 684.6112, 750.95276, 975.7424, 761.1287, 607.6504]
2025-09-16 16:03:29,376 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [124.0, 128.0, 152.0, 169.0, 189.0, 128.0, 137.0, 181.0, 139.0, 113.0]
2025-09-16 16:03:29,382 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 2 minutes, 36 seconds)
2025-09-16 16:05:29,382 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:05:31,593 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 770.51062 ± 112.187
2025-09-16 16:05:31,593 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [867.5934, 780.8547, 906.26013, 821.2779, 724.35034, 681.8613, 593.8314, 594.9283, 915.2379, 818.9107]
2025-09-16 16:05:31,593 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [170.0, 146.0, 181.0, 154.0, 135.0, 133.0, 120.0, 110.0, 174.0, 156.0]
2025-09-16 16:05:31,598 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 42 seconds)
2025-09-16 16:07:31,719 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:07:33,737 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 698.16364 ± 127.202
2025-09-16 16:07:33,738 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [714.68024, 700.4805, 978.10626, 590.1244, 722.3533, 790.6257, 550.5531, 606.3472, 542.074, 786.29156]
2025-09-16 16:07:33,738 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [154.0, 137.0, 189.0, 123.0, 139.0, 151.0, 99.0, 123.0, 101.0, 144.0]
2025-09-16 16:07:33,753 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 58 minutes, 38 seconds)
2025-09-16 16:09:35,125 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:09:37,716 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 840.06573 ± 168.442
2025-09-16 16:09:37,716 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [977.86615, 627.48206, 954.9762, 901.71405, 1058.4246, 605.5394, 701.0956, 1074.0166, 680.82355, 818.71924]
2025-09-16 16:09:37,716 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [209.0, 135.0, 183.0, 181.0, 205.0, 128.0, 130.0, 218.0, 146.0, 154.0]
2025-09-16 16:09:37,716 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (840.07) for latency 21
2025-09-16 16:09:37,750 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 56 minutes, 40 seconds)
2025-09-16 16:11:34,466 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:11:36,576 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 731.14954 ± 60.076
2025-09-16 16:11:36,577 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [627.2881, 713.9327, 734.1574, 688.38513, 681.1657, 792.21606, 851.625, 704.6801, 768.32477, 749.72034]
2025-09-16 16:11:36,577 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [119.0, 135.0, 136.0, 127.0, 124.0, 149.0, 169.0, 139.0, 157.0, 146.0]
2025-09-16 16:11:36,581 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 53 minutes, 46 seconds)
2025-09-16 16:13:36,113 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:13:38,538 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 808.10535 ± 160.722
2025-09-16 16:13:38,539 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [661.90076, 872.5713, 648.579, 930.11346, 1113.1196, 882.334, 715.7847, 599.14343, 971.693, 685.8141]
2025-09-16 16:13:38,539 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [126.0, 186.0, 128.0, 187.0, 220.0, 189.0, 142.0, 120.0, 197.0, 125.0]
2025-09-16 16:13:38,553 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 51 minutes, 40 seconds)
2025-09-16 16:15:37,373 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:15:39,509 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 727.13232 ± 262.983
2025-09-16 16:15:39,510 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [772.01746, 492.64474, 668.91943, 595.4104, 1241.3718, 516.8643, 514.88806, 615.62177, 1216.6168, 636.9681]
2025-09-16 16:15:39,510 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [147.0, 100.0, 141.0, 113.0, 239.0, 97.0, 111.0, 116.0, 236.0, 118.0]
2025-09-16 16:15:39,520 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 49 minutes, 25 seconds)
2025-09-16 16:17:39,694 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:17:41,972 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 768.25031 ± 163.146
2025-09-16 16:17:41,973 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [949.20233, 1069.1465, 900.8573, 550.1627, 744.72577, 707.80786, 843.77, 593.32336, 747.03467, 576.4724]
2025-09-16 16:17:41,973 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [182.0, 210.0, 167.0, 107.0, 158.0, 137.0, 161.0, 121.0, 144.0, 116.0]
2025-09-16 16:17:41,979 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 47 minutes, 27 seconds)
2025-09-16 16:19:40,591 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:19:42,735 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 700.11859 ± 119.469
2025-09-16 16:19:42,735 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [608.059, 611.3312, 598.34607, 741.9161, 815.10016, 946.1585, 816.1427, 549.96576, 677.0504, 637.1161]
2025-09-16 16:19:42,736 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [123.0, 122.0, 127.0, 155.0, 164.0, 188.0, 161.0, 115.0, 139.0, 136.0]
2025-09-16 16:19:42,743 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 44 minutes, 51 seconds)
2025-09-16 16:21:42,175 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:21:45,041 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 948.90686 ± 200.666
2025-09-16 16:21:45,041 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1120.8022, 1024.6881, 1447.9283, 849.2558, 932.4092, 768.24646, 893.50385, 924.05273, 722.189, 805.99225]
2025-09-16 16:21:45,041 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [221.0, 187.0, 312.0, 162.0, 192.0, 160.0, 175.0, 176.0, 138.0, 153.0]
2025-09-16 16:21:45,041 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (948.91) for latency 21
2025-09-16 16:21:45,066 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 43 minutes, 26 seconds)
2025-09-16 16:23:43,880 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:23:46,439 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 861.31464 ± 219.446
2025-09-16 16:23:46,439 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [818.1729, 1188.8613, 869.03937, 684.0173, 584.81714, 526.3174, 883.2072, 1124.0521, 793.7529, 1140.9089]
2025-09-16 16:23:46,439 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [152.0, 214.0, 195.0, 122.0, 120.0, 94.0, 188.0, 220.0, 159.0, 240.0]
2025-09-16 16:23:46,448 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 41 minutes, 18 seconds)
2025-09-16 16:25:45,968 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:25:48,786 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 949.56219 ± 282.937
2025-09-16 16:25:48,786 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [978.13257, 1061.2476, 1621.4319, 716.3015, 1279.725, 727.851, 744.744, 790.77875, 830.72034, 744.6895]
2025-09-16 16:25:48,786 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [190.0, 233.0, 293.0, 135.0, 264.0, 148.0, 148.0, 144.0, 156.0, 147.0]
2025-09-16 16:25:48,786 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (949.56) for latency 21
2025-09-16 16:25:48,792 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 39 minutes, 30 seconds)
2025-09-16 16:27:49,128 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:27:52,002 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 955.69238 ± 245.890
2025-09-16 16:27:52,003 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [633.6541, 1119.092, 686.74396, 1115.3472, 1191.0868, 768.3519, 1158.3079, 1056.2794, 574.0012, 1254.06]
2025-09-16 16:27:52,003 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [115.0, 221.0, 126.0, 222.0, 226.0, 154.0, 228.0, 208.0, 106.0, 264.0]
2025-09-16 16:27:52,003 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (955.69) for latency 21
2025-09-16 16:27:52,008 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 37 minutes, 36 seconds)
2025-09-16 16:29:50,765 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:29:53,527 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 890.63849 ± 316.413
2025-09-16 16:29:53,527 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1683.3751, 821.16394, 1026.632, 617.94635, 699.77637, 1093.8041, 567.9486, 602.8671, 824.5236, 968.34766]
2025-09-16 16:29:53,527 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [333.0, 174.0, 206.0, 127.0, 151.0, 200.0, 120.0, 131.0, 176.0, 189.0]
2025-09-16 16:29:53,552 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 35 minutes, 41 seconds)
2025-09-16 16:31:52,794 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:31:55,460 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 859.40576 ± 226.397
2025-09-16 16:31:55,461 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [526.80035, 933.98517, 1101.7686, 738.9767, 832.08484, 1177.0311, 1213.7739, 640.01245, 769.4362, 660.1884]
2025-09-16 16:31:55,461 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [111.0, 184.0, 212.0, 157.0, 179.0, 224.0, 242.0, 126.0, 157.0, 141.0]
2025-09-16 16:31:55,481 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 33 minutes, 35 seconds)
2025-09-16 16:33:55,125 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:33:57,867 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 860.85712 ± 199.887
2025-09-16 16:33:57,867 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1176.0641, 1161.2495, 861.5586, 633.9355, 1051.9929, 749.8345, 875.9028, 708.8886, 570.1125, 819.033]
2025-09-16 16:33:57,867 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [229.0, 238.0, 175.0, 137.0, 211.0, 164.0, 184.0, 155.0, 123.0, 179.0]
2025-09-16 16:33:57,874 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 31 minutes, 42 seconds)
2025-09-16 16:35:57,076 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:35:59,689 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 892.90076 ± 157.979
2025-09-16 16:35:59,689 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [763.82996, 1080.1716, 628.6853, 982.45764, 636.1695, 961.9018, 868.3739, 1033.3658, 1062.1567, 911.8957]
2025-09-16 16:35:59,689 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [154.0, 201.0, 130.0, 189.0, 126.0, 193.0, 168.0, 193.0, 201.0, 182.0]
2025-09-16 16:35:59,696 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 29 minutes, 35 seconds)
2025-09-16 16:37:59,277 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:38:02,443 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1069.27332 ± 272.744
2025-09-16 16:38:02,443 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [757.25885, 902.9806, 1067.5232, 1495.9594, 895.64325, 1004.78705, 971.9619, 1675.5614, 990.6185, 930.43933]
2025-09-16 16:38:02,443 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [155.0, 167.0, 202.0, 296.0, 180.0, 191.0, 185.0, 319.0, 184.0, 196.0]
2025-09-16 16:38:02,443 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (1069.27) for latency 21
2025-09-16 16:38:02,450 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 27 minutes, 29 seconds)
2025-09-16 16:40:01,298 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:40:04,156 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 933.42712 ± 350.756
2025-09-16 16:40:04,156 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [543.38165, 1247.3828, 1115.1084, 410.1611, 647.5791, 1187.7129, 969.3791, 882.546, 1620.9222, 710.09827]
2025-09-16 16:40:04,156 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [118.0, 248.0, 238.0, 85.0, 136.0, 237.0, 200.0, 170.0, 313.0, 140.0]
2025-09-16 16:40:04,161 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 25 minutes, 29 seconds)
2025-09-16 16:42:04,296 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:42:07,443 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1010.67218 ± 259.561
2025-09-16 16:42:07,443 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [910.8227, 1104.4253, 919.1353, 765.06494, 1272.7095, 742.87744, 977.6804, 1632.0701, 768.50085, 1013.43536]
2025-09-16 16:42:07,443 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [185.0, 232.0, 192.0, 172.0, 249.0, 163.0, 201.0, 347.0, 134.0, 199.0]
2025-09-16 16:42:07,457 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 23 minutes, 38 seconds)
2025-09-16 16:44:07,497 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:44:11,775 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1413.02612 ± 239.398
2025-09-16 16:44:11,775 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1218.7706, 1618.2572, 1384.4025, 1897.288, 1359.6213, 1361.3756, 1235.1101, 1574.2064, 980.4921, 1500.7377]
2025-09-16 16:44:11,775 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [225.0, 319.0, 289.0, 381.0, 252.0, 269.0, 260.0, 304.0, 197.0, 295.0]
2025-09-16 16:44:11,775 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (1413.03) for latency 21
2025-09-16 16:44:11,787 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 21 minutes, 51 seconds)
2025-09-16 16:46:12,645 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:46:15,335 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 902.53284 ± 181.890
2025-09-16 16:46:15,335 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1095.048, 833.2366, 655.9473, 784.06433, 1045.4042, 1094.9418, 1185.3622, 644.1604, 882.3331, 804.83]
2025-09-16 16:46:15,335 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [240.0, 157.0, 144.0, 161.0, 205.0, 200.0, 221.0, 123.0, 186.0, 142.0]
2025-09-16 16:46:15,341 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 20 minutes, 2 seconds)
2025-09-16 16:48:12,895 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:48:17,276 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1431.86255 ± 431.658
2025-09-16 16:48:17,276 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [2444.4502, 1465.3157, 1182.8112, 1277.728, 1671.1239, 1456.3171, 1808.8934, 989.4355, 1038.8494, 983.7021]
2025-09-16 16:48:17,276 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [463.0, 292.0, 247.0, 245.0, 343.0, 275.0, 347.0, 189.0, 229.0, 203.0]
2025-09-16 16:48:17,276 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (1431.86) for latency 21
2025-09-16 16:48:17,293 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 17 minutes, 52 seconds)
2025-09-16 16:50:18,426 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:50:22,166 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1226.17883 ± 229.287
2025-09-16 16:50:22,166 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1259.7579, 1664.9023, 1113.7678, 1431.823, 1166.2505, 964.76245, 1288.1453, 1153.8113, 818.71643, 1399.8507]
2025-09-16 16:50:22,166 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [245.0, 354.0, 203.0, 270.0, 233.0, 222.0, 250.0, 205.0, 150.0, 271.0]
2025-09-16 16:50:22,176 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 16 minutes, 13 seconds)
2025-09-16 16:52:21,385 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:52:24,786 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1159.83154 ± 388.991
2025-09-16 16:52:24,786 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1483.9786, 885.396, 988.99365, 807.08734, 1663.3779, 1175.8596, 988.35126, 1970.7504, 859.778, 774.7423]
2025-09-16 16:52:24,786 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [285.0, 186.0, 201.0, 141.0, 310.0, 222.0, 192.0, 370.0, 178.0, 137.0]
2025-09-16 16:52:24,795 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 14 minutes, 4 seconds)
2025-09-16 16:54:24,092 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:54:27,936 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1266.08606 ± 458.191
2025-09-16 16:54:27,936 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1902.4066, 1818.2972, 965.4128, 921.6156, 1164.4536, 1466.7899, 1880.581, 1231.129, 608.11694, 702.0574]
2025-09-16 16:54:27,936 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [384.0, 357.0, 198.0, 185.0, 217.0, 293.0, 361.0, 232.0, 116.0, 130.0]
2025-09-16 16:54:27,944 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 11 minutes, 53 seconds)
2025-09-16 16:56:27,631 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:56:32,137 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1450.14563 ± 646.736
2025-09-16 16:56:32,137 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [826.17535, 2347.8667, 1881.9679, 2746.1335, 971.3221, 1366.1556, 1439.2235, 709.64056, 882.4563, 1330.5149]
2025-09-16 16:56:32,137 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [171.0, 465.0, 368.0, 543.0, 199.0, 271.0, 293.0, 136.0, 189.0, 258.0]
2025-09-16 16:56:32,137 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (1450.15) for latency 21
2025-09-16 16:56:32,145 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 9 minutes, 54 seconds)
2025-09-16 16:58:31,666 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:58:35,136 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1160.37671 ± 406.641
2025-09-16 16:58:35,136 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1124.376, 883.1379, 910.9819, 1560.4614, 773.62305, 1044.089, 2187.998, 954.1066, 1281.6938, 883.2994]
2025-09-16 16:58:35,136 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [228.0, 164.0, 162.0, 276.0, 145.0, 216.0, 438.0, 190.0, 263.0, 179.0]
2025-09-16 16:58:35,144 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 7 minutes, 57 seconds)
2025-09-16 17:00:36,987 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:00:40,576 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1201.40588 ± 274.185
2025-09-16 17:00:40,576 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1180.116, 1585.9432, 1092.7903, 1271.873, 1360.6896, 721.91187, 1377.8098, 1036.1361, 824.0402, 1562.7487]
2025-09-16 17:00:40,576 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [207.0, 306.0, 206.0, 236.0, 247.0, 161.0, 278.0, 210.0, 173.0, 298.0]
2025-09-16 17:00:40,598 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 5 minutes, 57 seconds)
2025-09-16 17:02:40,469 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:02:44,851 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1488.53918 ± 564.122
2025-09-16 17:02:44,851 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1122.0077, 2468.5525, 1730.4873, 2480.116, 891.0349, 1230.9333, 836.40247, 1678.8427, 1324.6226, 1122.3918]
2025-09-16 17:02:44,851 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [230.0, 461.0, 323.0, 445.0, 193.0, 242.0, 184.0, 307.0, 256.0, 214.0]
2025-09-16 17:02:44,851 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (1488.54) for latency 21
2025-09-16 17:02:44,858 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 4 minutes, 4 seconds)
2025-09-16 17:04:46,802 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:04:51,566 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1513.38000 ± 609.259
2025-09-16 17:04:51,566 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [734.64844, 870.3901, 1412.1742, 1946.0294, 1768.2776, 885.1845, 2846.0327, 1243.749, 1483.2335, 1944.0803]
2025-09-16 17:04:51,566 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [167.0, 181.0, 267.0, 380.0, 351.0, 183.0, 559.0, 254.0, 312.0, 409.0]
2025-09-16 17:04:51,566 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (1513.38) for latency 21
2025-09-16 17:04:51,574 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 2 minutes, 21 seconds)
2025-09-16 17:06:49,154 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:06:54,633 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1722.85510 ± 857.872
2025-09-16 17:06:54,633 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [746.34216, 1880.3937, 4038.8108, 2038.7637, 1273.6937, 1186.6924, 1330.6906, 1372.6492, 1407.9424, 1952.5707]
2025-09-16 17:06:54,633 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [159.0, 372.0, 787.0, 414.0, 253.0, 221.0, 267.0, 255.0, 272.0, 366.0]
2025-09-16 17:06:54,633 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (1722.86) for latency 21
2025-09-16 17:06:54,654 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 10 seconds)
2025-09-16 17:08:53,405 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:08:57,628 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1460.02417 ± 868.919
2025-09-16 17:08:57,628 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [919.94135, 1190.8928, 1007.15204, 1713.5444, 3886.3796, 1311.7235, 924.6346, 1629.6931, 1384.763, 631.51715]
2025-09-16 17:08:57,628 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [160.0, 235.0, 197.0, 311.0, 718.0, 257.0, 178.0, 280.0, 258.0, 135.0]
2025-09-16 17:08:57,634 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 58 minutes, 5 seconds)
2025-09-16 17:10:59,596 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:11:03,199 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1248.48901 ± 742.253
2025-09-16 17:11:03,199 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [760.5174, 1181.8907, 670.0461, 1155.1215, 955.6205, 3345.1047, 826.70447, 1263.7393, 810.0578, 1516.0884]
2025-09-16 17:11:03,200 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [162.0, 246.0, 150.0, 214.0, 171.0, 593.0, 172.0, 231.0, 146.0, 270.0]
2025-09-16 17:11:03,206 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 56 minutes, 2 seconds)
2025-09-16 17:13:05,402 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:13:10,082 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1519.59155 ± 457.794
2025-09-16 17:13:10,082 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1018.2959, 1510.3235, 964.4336, 1553.3972, 1403.6138, 1489.6385, 1717.6456, 976.31665, 2166.1538, 2396.0972]
2025-09-16 17:13:10,082 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [188.0, 310.0, 177.0, 314.0, 282.0, 308.0, 337.0, 207.0, 430.0, 451.0]
2025-09-16 17:13:10,094 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 54 minutes, 11 seconds)
2025-09-16 17:15:07,784 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:15:14,216 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2180.74268 ± 1034.025
2025-09-16 17:15:14,216 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1416.3152, 1164.6407, 1915.1448, 4846.7314, 1617.2899, 1647.7748, 1840.4445, 2869.6604, 2835.0933, 1654.3295]
2025-09-16 17:15:14,216 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [280.0, 209.0, 346.0, 871.0, 313.0, 303.0, 344.0, 574.0, 559.0, 312.0]
2025-09-16 17:15:14,216 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (2180.74) for latency 21
2025-09-16 17:15:14,225 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 51 minutes, 53 seconds)
2025-09-16 17:17:16,040 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:17:22,203 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2054.28149 ± 628.645
2025-09-16 17:17:22,203 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [2354.3965, 1750.2946, 1614.9739, 1374.6681, 1140.1879, 2412.49, 2427.8103, 3108.8716, 1520.9069, 2838.215]
2025-09-16 17:17:22,203 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [474.0, 333.0, 312.0, 245.0, 217.0, 451.0, 449.0, 594.0, 285.0, 535.0]
2025-09-16 17:17:22,213 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 50 minutes, 12 seconds)
2025-09-16 17:19:21,201 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:19:27,752 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2188.20850 ± 1072.302
2025-09-16 17:19:27,752 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1396.7828, 1616.9856, 3054.917, 1245.9514, 4298.312, 1692.3811, 3629.6748, 780.8738, 1830.3644, 2335.842]
2025-09-16 17:19:27,752 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [242.0, 314.0, 581.0, 250.0, 833.0, 309.0, 691.0, 142.0, 364.0, 462.0]
2025-09-16 17:19:27,752 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (2188.21) for latency 21
2025-09-16 17:19:27,761 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 48 minutes, 18 seconds)
2025-09-16 17:21:29,753 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:21:34,101 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1499.18018 ± 389.423
2025-09-16 17:21:34,101 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1786.2794, 1520.9337, 1616.2676, 1510.9376, 1278.4688, 1005.72546, 1663.0133, 2091.0312, 695.8387, 1823.3065]
2025-09-16 17:21:34,101 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [339.0, 281.0, 309.0, 283.0, 238.0, 187.0, 339.0, 397.0, 131.0, 324.0]
2025-09-16 17:21:34,134 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 46 minutes, 16 seconds)
2025-09-16 17:23:41,938 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:23:46,824 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1771.79077 ± 527.263
2025-09-16 17:23:46,824 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1244.5874, 2624.4993, 1129.361, 2119.5312, 2082.2703, 1589.7377, 1999.1746, 2394.5295, 1552.5122, 981.7053]
2025-09-16 17:23:46,824 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [224.0, 498.0, 220.0, 371.0, 375.0, 278.0, 359.0, 420.0, 270.0, 174.0]
2025-09-16 17:23:46,849 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 44 minutes, 34 seconds)
2025-09-16 17:25:37,784 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:25:42,257 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1472.75610 ± 822.052
2025-09-16 17:25:42,257 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1827.3478, 1133.1602, 1276.218, 869.6607, 1251.1395, 1024.4656, 1200.9639, 1211.8928, 3835.2625, 1097.4503]
2025-09-16 17:25:42,257 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [322.0, 221.0, 246.0, 173.0, 237.0, 190.0, 236.0, 256.0, 753.0, 215.0]
2025-09-16 17:25:42,271 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 41 minutes, 52 seconds)
2025-09-16 17:27:44,470 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:27:49,424 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 1653.71936 ± 872.089
2025-09-16 17:27:49,424 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1664.4395, 979.5991, 1377.3201, 704.48724, 1845.8256, 2495.2659, 1291.8726, 3837.7764, 1343.3289, 997.2792]
2025-09-16 17:27:49,424 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [318.0, 175.0, 265.0, 148.0, 316.0, 492.0, 242.0, 737.0, 271.0, 205.0]
2025-09-16 17:27:49,432 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 39 minutes, 43 seconds)
2025-09-16 17:29:49,562 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:29:57,329 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2484.18823 ± 1263.314
2025-09-16 17:29:57,330 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [4545.383, 2184.5344, 2119.4111, 3947.465, 1732.1476, 877.24133, 4468.1865, 1757.0582, 1297.7302, 1912.7261]
2025-09-16 17:29:57,330 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [881.0, 416.0, 433.0, 720.0, 323.0, 186.0, 904.0, 343.0, 251.0, 382.0]
2025-09-16 17:29:57,330 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (2484.19) for latency 21
2025-09-16 17:29:57,338 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 37 minutes, 46 seconds)
2025-09-16 17:32:01,980 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:32:08,825 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2145.97290 ± 864.735
2025-09-16 17:32:08,826 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [3119.5415, 1589.6471, 3166.718, 1334.7152, 1118.6239, 2210.629, 1096.005, 3678.116, 2078.2246, 2067.5078]
2025-09-16 17:32:08,826 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [631.0, 334.0, 608.0, 264.0, 246.0, 451.0, 240.0, 738.0, 425.0, 412.0]
2025-09-16 17:32:08,833 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 35 minutes, 57 seconds)
2025-09-16 17:34:06,389 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:34:12,976 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2199.39478 ± 1323.168
2025-09-16 17:34:12,976 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [2760.4478, 1266.5437, 5302.611, 925.12177, 3097.7944, 1401.4803, 2549.3513, 2722.9858, 1353.461, 614.1499]
2025-09-16 17:34:12,976 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [542.0, 249.0, 1000.0, 184.0, 586.0, 269.0, 478.0, 503.0, 271.0, 116.0]
2025-09-16 17:34:12,989 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 33 minutes, 23 seconds)
2025-09-16 17:36:11,012 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:36:19,061 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2693.01074 ± 1029.311
2025-09-16 17:36:19,061 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [2456.9182, 2028.7437, 2422.8416, 3458.3833, 2980.7493, 2275.104, 5302.9775, 2673.4417, 2024.3828, 1306.5653]
2025-09-16 17:36:19,061 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [473.0, 353.0, 438.0, 673.0, 580.0, 417.0, 1000.0, 514.0, 385.0, 248.0]
2025-09-16 17:36:19,061 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (2693.01) for latency 21
2025-09-16 17:36:19,070 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 31 minutes, 50 seconds)
2025-09-16 17:38:19,965 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:38:26,625 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2293.19092 ± 1069.393
2025-09-16 17:38:26,625 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1818.3252, 1751.6918, 4807.5356, 2736.3545, 2269.6616, 997.2303, 953.5614, 3218.141, 2343.567, 2035.8401]
2025-09-16 17:38:26,625 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [326.0, 302.0, 884.0, 521.0, 416.0, 183.0, 177.0, 596.0, 431.0, 374.0]
2025-09-16 17:38:26,634 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 29 minutes, 44 seconds)
2025-09-16 17:40:27,016 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:40:35,888 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2978.59570 ± 1728.593
2025-09-16 17:40:35,889 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5459.48, 5431.108, 798.1198, 3471.5718, 1632.9828, 1441.8882, 2381.894, 1630.3258, 5368.7925, 2169.7935]
2025-09-16 17:40:35,889 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 164.0, 633.0, 327.0, 263.0, 433.0, 289.0, 1000.0, 434.0]
2025-09-16 17:40:35,889 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (2978.60) for latency 21
2025-09-16 17:40:35,918 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 27 minutes, 40 seconds)
2025-09-16 17:42:45,302 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:42:52,148 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2301.24756 ± 1224.155
2025-09-16 17:42:52,148 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [2077.5776, 3470.5254, 1045.4353, 1274.916, 1233.8495, 2400.8948, 5064.644, 3129.4292, 2222.3257, 1092.8778]
2025-09-16 17:42:52,148 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [387.0, 653.0, 205.0, 239.0, 253.0, 442.0, 941.0, 619.0, 403.0, 227.0]
2025-09-16 17:42:52,158 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 25 minutes, 43 seconds)
2025-09-16 17:44:45,428 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:44:54,891 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3043.58057 ± 1878.306
2025-09-16 17:44:54,891 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1733.6636, 1954.9874, 4959.6562, 5524.3325, 2018.5524, 5256.206, 5418.9893, 1829.7518, 726.23157, 1013.43555]
2025-09-16 17:44:54,891 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [325.0, 377.0, 966.0, 1000.0, 388.0, 1000.0, 1000.0, 363.0, 159.0, 215.0]
2025-09-16 17:44:54,891 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (3043.58) for latency 21
2025-09-16 17:44:54,900 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 23 minutes, 32 seconds)
2025-09-16 17:46:56,181 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:47:04,491 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2809.62817 ± 1748.537
2025-09-16 17:47:04,491 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1713.2812, 2011.665, 852.02637, 4893.853, 2914.009, 2762.7102, 856.4884, 1172.1377, 5451.2876, 5468.8223]
2025-09-16 17:47:04,491 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [325.0, 358.0, 146.0, 876.0, 600.0, 515.0, 150.0, 213.0, 1000.0, 1000.0]
2025-09-16 17:47:04,504 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 21 minutes, 30 seconds)
2025-09-16 17:49:04,041 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:49:14,085 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3390.38623 ± 1053.301
2025-09-16 17:49:14,085 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1738.1512, 2282.6655, 3895.5916, 4944.0454, 2537.498, 4227.6724, 3994.8032, 4601.8887, 3389.7454, 2291.803]
2025-09-16 17:49:14,085 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [336.0, 421.0, 682.0, 905.0, 452.0, 800.0, 769.0, 827.0, 645.0, 406.0]
2025-09-16 17:49:14,085 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (3390.39) for latency 21
2025-09-16 17:49:14,124 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 19 minutes, 25 seconds)
2025-09-16 17:51:16,943 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:51:25,250 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2767.97705 ± 1356.878
2025-09-16 17:51:25,250 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1851.2601, 4781.3438, 1594.4988, 3525.7285, 692.8853, 4866.9775, 3127.191, 3632.1648, 2048.014, 1559.7041]
2025-09-16 17:51:25,250 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [333.0, 878.0, 286.0, 640.0, 146.0, 908.0, 590.0, 661.0, 389.0, 297.0]
2025-09-16 17:51:25,265 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 17 minutes, 18 seconds)
2025-09-16 17:53:29,958 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:53:38,181 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2587.49561 ± 1387.975
2025-09-16 17:53:38,181 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1447.5125, 4647.733, 2002.4292, 4933.7104, 2469.7097, 1662.0754, 4258.3716, 2064.113, 1025.055, 1364.2471]
2025-09-16 17:53:38,181 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [300.0, 928.0, 398.0, 1000.0, 433.0, 349.0, 799.0, 432.0, 205.0, 294.0]
2025-09-16 17:53:38,211 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 15 minutes, 4 seconds)
2025-09-16 17:55:32,039 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:55:40,768 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2946.00464 ± 1613.463
2025-09-16 17:55:40,768 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [2487.5503, 3771.2, 5510.996, 1459.503, 842.88794, 2230.6985, 1615.1456, 4434.393, 1724.5669, 5383.106]
2025-09-16 17:55:40,768 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [483.0, 705.0, 1000.0, 278.0, 160.0, 432.0, 290.0, 828.0, 305.0, 1000.0]
2025-09-16 17:55:40,786 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 12 minutes, 55 seconds)
2025-09-16 17:57:46,684 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:57:54,087 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 2497.97607 ± 1294.543
2025-09-16 17:57:54,087 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1530.5709, 1555.7949, 4801.4995, 1240.1956, 3446.1536, 2261.1113, 1862.4584, 1922.3136, 4827.001, 1532.6627]
2025-09-16 17:57:54,087 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [284.0, 301.0, 870.0, 248.0, 649.0, 455.0, 349.0, 355.0, 886.0, 271.0]
2025-09-16 17:57:54,094 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 10 minutes, 49 seconds)
2025-09-16 17:59:49,441 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:00:01,484 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3978.63232 ± 1833.134
2025-09-16 18:00:01,485 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [953.9123, 5405.1855, 5345.9, 1544.7085, 1414.5192, 3589.6184, 5325.1533, 5424.8354, 5440.403, 5342.0913]
2025-09-16 18:00:01,485 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [184.0, 1000.0, 1000.0, 294.0, 240.0, 671.0, 1000.0, 980.0, 1000.0, 1000.0]
2025-09-16 18:00:01,485 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (3978.63) for latency 21
2025-09-16 18:00:01,498 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes, 37 seconds)
2025-09-16 18:02:04,187 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:02:17,307 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4436.34717 ± 1304.423
2025-09-16 18:02:17,307 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5306.58, 5521.606, 2938.073, 5521.264, 5452.9326, 5626.8267, 3165.113, 2175.7712, 3219.1262, 5436.18]
2025-09-16 18:02:17,307 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 522.0, 1000.0, 1000.0, 1000.0, 597.0, 406.0, 560.0, 1000.0]
2025-09-16 18:02:17,307 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (4436.35) for latency 21
2025-09-16 18:02:17,330 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 31 seconds)
2025-09-16 18:04:21,079 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:04:36,013 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4836.91504 ± 1019.752
2025-09-16 18:04:36,013 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [5263.903, 5272.3804, 5296.384, 5361.2163, 5341.5435, 2115.492, 5278.743, 3757.9885, 5259.4897, 5422.013]
2025-09-16 18:04:36,013 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 427.0, 1000.0, 740.0, 1000.0, 1000.0]
2025-09-16 18:04:36,013 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1226 [INFO]: New best (4836.92) for latency 21
2025-09-16 18:04:36,023 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 23 seconds)
2025-09-16 18:06:43,375 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:06:56,173 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 4294.99463 ± 1362.049
2025-09-16 18:06:56,173 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [2423.5396, 5598.6113, 4802.7295, 2702.3333, 5530.778, 4332.841, 4861.056, 5430.9756, 1825.0685, 5442.0107]
2025-09-16 18:06:56,173 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [446.0, 1000.0, 867.0, 530.0, 1000.0, 767.0, 886.0, 1000.0, 358.0, 1000.0]
2025-09-16 18:06:56,183 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 15 seconds)
2025-09-16 18:08:49,504 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:09:01,654 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1221 [DEBUG]: Total Reward: 3967.98047 ± 1729.224
2025-09-16 18:09:01,654 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1222 [DEBUG]: All rewards: [1697.0237, 5293.0117, 5456.24, 4370.1377, 1123.597, 4233.2856, 5507.55, 5297.93, 1407.566, 5293.4614]
2025-09-16 18:09:01,654 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1223 [DEBUG]: All trajectory lengths: [302.0, 1000.0, 1000.0, 821.0, 190.0, 796.0, 1000.0, 1000.0, 255.0, 1000.0]
2025-09-16 18:09:01,681 latency_env.delayed_mdp:training_loop(baseline-bpql-humanoid):1251 [DEBUG]: Training session finished
