2025-08-07 03:38:55,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc25-humanoid/ExtremeSparseL4U32-bpql-mem32
2025-08-07 03:38:55,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc25-humanoid/ExtremeSparseL4U32-bpql-mem32
2025-08-07 03:38:55,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x14d65fa83c50>}
2025-08-07 03:38:55,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1111 [DEBUG]: using device: cuda
2025-08-07 03:38:55,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1133 [INFO]: Creating new trainer
2025-08-07 03:38:55,816 baseline-bpql-noiseperc25-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=920, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-08-07 03:38:55,816 baseline-bpql-noiseperc25-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 03:38:57,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1194 [DEBUG]: Starting training session...
2025-08-07 03:38:57,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 1/100
2025-08-07 03:40:47,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:40:47,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 116.67516 ± 19.004
2025-08-07 03:40:47,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [89.49251, 89.48214, 121.74944, 121.8429, 119.80023, 120.27582, 161.26826, 113.616646, 119.697464, 109.52609]
2025-08-07 03:40:47,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 18.0, 24.0, 24.0, 23.0, 23.0, 31.0, 22.0, 24.0, 22.0]
2025-08-07 03:40:47,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1226 [INFO]: New best (116.68) for latency ExtremeSparseL4U32
2025-08-07 03:40:47,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 1 minute, 38 seconds)
2025-08-07 03:42:43,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:42:44,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 147.98032 ± 80.376
2025-08-07 03:42:44,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [135.4604, 117.85658, 178.75058, 120.63857, 107.20499, 89.05599, 117.646545, 375.1918, 83.748764, 154.24895]
2025-08-07 03:42:44,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 23.0, 36.0, 24.0, 21.0, 18.0, 23.0, 75.0, 17.0, 30.0]
2025-08-07 03:42:44,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1226 [INFO]: New best (147.98) for latency ExtremeSparseL4U32
2025-08-07 03:42:44,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 5 minutes, 5 seconds)
2025-08-07 03:44:40,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:44:40,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 114.43016 ± 24.842
2025-08-07 03:44:40,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [129.95685, 84.032646, 100.63136, 161.72481, 101.57714, 90.341255, 135.45493, 121.34839, 84.14746, 135.0868]
2025-08-07 03:44:40,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 17.0, 20.0, 31.0, 20.0, 18.0, 26.0, 24.0, 17.0, 26.0]
2025-08-07 03:44:40,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 4 minutes, 55 seconds)
2025-08-07 03:46:37,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:46:38,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 112.53633 ± 27.582
2025-08-07 03:46:38,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [174.46944, 103.25053, 88.65868, 88.50576, 145.77344, 89.28835, 103.39714, 114.25391, 128.62633, 89.13972]
2025-08-07 03:46:38,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 20.0, 18.0, 18.0, 29.0, 18.0, 21.0, 22.0, 25.0, 18.0]
2025-08-07 03:46:38,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 4 minutes, 17 seconds)
2025-08-07 03:48:35,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:48:36,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 132.81870 ± 29.943
2025-08-07 03:48:36,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [113.295296, 148.33566, 194.09187, 124.01471, 107.91636, 165.6703, 113.626366, 137.48503, 84.07658, 139.67485]
2025-08-07 03:48:36,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 29.0, 40.0, 24.0, 21.0, 32.0, 22.0, 27.0, 17.0, 28.0]
2025-08-07 03:48:36,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 3 minutes, 10 seconds)
2025-08-07 03:50:33,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:50:33,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 134.27057 ± 42.865
2025-08-07 03:50:33,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [195.5889, 94.84524, 189.73003, 100.512566, 164.52039, 105.3081, 89.36368, 88.74673, 188.38336, 125.70667]
2025-08-07 03:50:33,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [39.0, 19.0, 37.0, 20.0, 32.0, 21.0, 18.0, 18.0, 37.0, 25.0]
2025-08-07 03:50:33,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 3 minutes, 33 seconds)
2025-08-07 03:52:30,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:52:30,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 130.78055 ± 12.843
2025-08-07 03:52:30,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [125.92648, 148.6848, 143.99515, 133.67451, 129.67946, 125.48479, 144.60579, 134.26082, 104.14743, 117.34614]
2025-08-07 03:52:30,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 30.0, 28.0, 26.0, 25.0, 25.0, 29.0, 26.0, 21.0, 23.0]
2025-08-07 03:52:30,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 1 minute, 49 seconds)
2025-08-07 03:54:27,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:54:28,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 139.52383 ± 78.365
2025-08-07 03:54:28,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [94.86712, 149.7877, 113.063324, 127.6922, 124.6144, 96.62142, 369.78906, 100.80507, 105.62056, 112.377556]
2025-08-07 03:54:28,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 29.0, 22.0, 25.0, 24.0, 19.0, 70.0, 20.0, 21.0, 22.0]
2025-08-07 03:54:28,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 12 seconds)
2025-08-07 03:56:25,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:56:25,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 129.35605 ± 26.649
2025-08-07 03:56:25,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [118.50287, 164.82417, 128.02164, 113.00266, 163.82536, 84.06877, 107.45186, 146.20811, 160.41644, 107.23863]
2025-08-07 03:56:25,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 33.0, 25.0, 22.0, 33.0, 17.0, 21.0, 30.0, 33.0, 21.0]
2025-08-07 03:56:25,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 58 minutes, 10 seconds)
2025-08-07 03:58:23,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:58:23,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 130.05057 ± 54.983
2025-08-07 03:58:23,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [90.1651, 172.12276, 270.92654, 101.18626, 95.587585, 167.99202, 95.138695, 106.66653, 104.57321, 96.14704]
2025-08-07 03:58:23,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 34.0, 53.0, 20.0, 19.0, 34.0, 19.0, 21.0, 21.0, 19.0]
2025-08-07 03:58:23,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 56 minutes, 15 seconds)
2025-08-07 04:00:21,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:00:21,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 125.48311 ± 40.721
2025-08-07 04:00:21,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [135.49382, 94.649414, 89.45157, 146.52715, 220.45296, 84.67185, 95.184074, 161.74089, 130.31546, 96.34392]
2025-08-07 04:00:21,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 19.0, 18.0, 29.0, 42.0, 17.0, 19.0, 31.0, 26.0, 19.0]
2025-08-07 04:00:21,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 54 minutes, 27 seconds)
2025-08-07 04:02:18,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:02:19,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 121.16862 ± 32.716
2025-08-07 04:02:19,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [177.75294, 138.88857, 167.08862, 84.04899, 120.39304, 89.15308, 101.530426, 83.94775, 103.43142, 145.45126]
2025-08-07 04:02:19,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [38.0, 27.0, 33.0, 17.0, 23.0, 18.0, 20.0, 17.0, 20.0, 28.0]
2025-08-07 04:02:19,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 52 minutes, 37 seconds)
2025-08-07 04:04:16,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:04:16,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 146.43701 ± 67.229
2025-08-07 04:04:16,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [88.61878, 116.518234, 102.24854, 140.75047, 106.75493, 100.594376, 330.97266, 172.83658, 137.16362, 167.91183]
2025-08-07 04:04:16,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 23.0, 20.0, 28.0, 21.0, 20.0, 67.0, 33.0, 28.0, 35.0]
2025-08-07 04:04:16,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 50 minutes, 38 seconds)
2025-08-07 04:06:14,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:06:14,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 112.02119 ± 17.347
2025-08-07 04:06:14,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [95.49925, 120.58483, 95.26457, 122.64441, 151.21198, 124.48194, 95.73759, 106.26595, 95.33061, 113.19082]
2025-08-07 04:06:14,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 24.0, 19.0, 24.0, 29.0, 24.0, 19.0, 21.0, 19.0, 22.0]
2025-08-07 04:06:14,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 48 minutes, 48 seconds)
2025-08-07 04:08:11,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:08:11,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 135.51057 ± 46.799
2025-08-07 04:08:11,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [107.45379, 102.99788, 90.91904, 139.02132, 120.70607, 182.03503, 117.54426, 126.40242, 256.6841, 111.34182]
2025-08-07 04:08:11,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 20.0, 18.0, 27.0, 24.0, 36.0, 23.0, 25.0, 53.0, 22.0]
2025-08-07 04:08:11,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 46 minutes, 41 seconds)
2025-08-07 04:10:09,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:10:09,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 128.14336 ± 30.088
2025-08-07 04:10:09,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [111.556244, 89.54168, 164.50134, 168.27185, 127.90322, 95.858025, 112.92238, 169.22672, 95.46675, 146.18529]
2025-08-07 04:10:09,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 18.0, 32.0, 34.0, 25.0, 19.0, 22.0, 34.0, 19.0, 30.0]
2025-08-07 04:10:09,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 44 minutes, 39 seconds)
2025-08-07 04:12:06,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:12:07,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 130.57210 ± 39.389
2025-08-07 04:12:07,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [151.49449, 94.66168, 95.30313, 176.62053, 96.84278, 99.872475, 201.79486, 84.03071, 160.86035, 144.23993]
2025-08-07 04:12:07,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 19.0, 19.0, 35.0, 19.0, 20.0, 38.0, 17.0, 31.0, 28.0]
2025-08-07 04:12:07,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 42 minutes, 39 seconds)
2025-08-07 04:14:04,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:14:04,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 141.54634 ± 52.955
2025-08-07 04:14:04,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [116.54913, 111.56956, 94.886406, 268.21872, 150.42992, 106.46207, 163.17854, 83.99455, 124.60143, 195.5731]
2025-08-07 04:14:04,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 22.0, 19.0, 58.0, 30.0, 21.0, 32.0, 17.0, 25.0, 38.0]
2025-08-07 04:14:04,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 40 minutes, 43 seconds)
2025-08-07 04:16:02,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:16:02,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 136.81473 ± 90.205
2025-08-07 04:16:02,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [137.89449, 88.928696, 89.19219, 95.70963, 118.61817, 89.02347, 169.96524, 84.04849, 98.77628, 395.9906]
2025-08-07 04:16:02,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 18.0, 18.0, 19.0, 23.0, 18.0, 34.0, 17.0, 20.0, 71.0]
2025-08-07 04:16:02,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 38 minutes, 51 seconds)
2025-08-07 04:18:00,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:18:00,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 123.37694 ± 27.934
2025-08-07 04:18:00,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [156.55507, 101.36023, 175.40254, 114.50648, 95.876434, 123.42694, 143.60101, 89.01939, 94.58994, 139.43138]
2025-08-07 04:18:00,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 20.0, 34.0, 23.0, 19.0, 24.0, 28.0, 18.0, 19.0, 29.0]
2025-08-07 04:18:00,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 36 minutes, 59 seconds)
2025-08-07 04:19:57,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:19:58,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 122.59041 ± 34.137
2025-08-07 04:19:58,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [89.88581, 90.50317, 89.19726, 89.51744, 149.89798, 134.78348, 135.73361, 169.69691, 96.1376, 180.55075]
2025-08-07 04:19:58,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 18.0, 18.0, 18.0, 29.0, 27.0, 26.0, 33.0, 19.0, 35.0]
2025-08-07 04:19:58,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 34 minutes, 58 seconds)
2025-08-07 04:21:55,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:21:55,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 131.04784 ± 34.414
2025-08-07 04:21:55,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [184.22182, 138.83836, 155.30775, 88.95046, 123.383865, 144.90985, 95.411674, 184.16393, 94.4144, 100.8763]
2025-08-07 04:21:55,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [37.0, 28.0, 32.0, 18.0, 25.0, 29.0, 19.0, 37.0, 19.0, 20.0]
2025-08-07 04:21:55,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 32 minutes, 57 seconds)
2025-08-07 04:23:52,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:23:53,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 140.12195 ± 61.989
2025-08-07 04:23:53,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [95.05409, 89.02013, 95.301476, 295.21442, 171.86792, 193.99908, 114.68217, 141.63663, 114.23962, 90.204025]
2025-08-07 04:23:53,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 18.0, 19.0, 57.0, 33.0, 40.0, 25.0, 29.0, 22.0, 18.0]
2025-08-07 04:23:53,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 31 minutes, 3 seconds)
2025-08-07 04:25:50,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:25:50,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 149.81456 ± 34.989
2025-08-07 04:25:50,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [109.54598, 156.75769, 173.8237, 117.39226, 204.59924, 114.32794, 151.4567, 132.45898, 211.62665, 126.15638]
2025-08-07 04:25:50,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 31.0, 33.0, 23.0, 39.0, 22.0, 31.0, 26.0, 41.0, 26.0]
2025-08-07 04:25:50,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1226 [INFO]: New best (149.81) for latency ExtremeSparseL4U32
2025-08-07 04:25:50,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 28 minutes, 55 seconds)
2025-08-07 04:27:47,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:27:48,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 129.04013 ± 28.814
2025-08-07 04:27:48,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [139.10646, 149.38152, 162.95583, 181.06416, 94.25908, 84.15454, 118.401184, 107.26301, 119.50839, 134.30725]
2025-08-07 04:27:48,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 29.0, 31.0, 35.0, 19.0, 17.0, 23.0, 21.0, 23.0, 26.0]
2025-08-07 04:27:48,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 26 minutes, 54 seconds)
2025-08-07 04:29:45,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:29:46,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 110.88302 ± 24.789
2025-08-07 04:29:46,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [129.25166, 105.16803, 89.04939, 89.15527, 83.57328, 154.38675, 88.808044, 95.614815, 144.40103, 129.42203]
2025-08-07 04:29:46,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 21.0, 18.0, 18.0, 17.0, 30.0, 18.0, 19.0, 30.0, 25.0]
2025-08-07 04:29:46,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 25 minutes)
2025-08-07 04:31:43,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:31:43,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 119.76316 ± 31.607
2025-08-07 04:31:43,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [88.86787, 134.24065, 190.1564, 151.89682, 88.604866, 112.400246, 88.87354, 95.07819, 113.32494, 134.18805]
2025-08-07 04:31:43,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 26.0, 40.0, 29.0, 18.0, 22.0, 18.0, 19.0, 22.0, 29.0]
2025-08-07 04:31:43,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 23 minutes, 5 seconds)
2025-08-07 04:33:40,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:33:41,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 149.93480 ± 88.428
2025-08-07 04:33:41,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [173.59973, 187.32407, 95.642136, 83.77113, 394.13293, 144.48903, 88.99414, 95.45665, 101.78701, 134.15126]
2025-08-07 04:33:41,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 36.0, 19.0, 17.0, 74.0, 28.0, 18.0, 19.0, 20.0, 26.0]
2025-08-07 04:33:41,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1226 [INFO]: New best (149.93) for latency ExtremeSparseL4U32
2025-08-07 04:33:41,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 21 minutes, 4 seconds)
2025-08-07 04:35:38,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:35:38,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 178.55247 ± 99.490
2025-08-07 04:35:38,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [95.37932, 456.1272, 152.21024, 84.19478, 112.35012, 164.61853, 168.55319, 163.2074, 193.27069, 195.61316]
2025-08-07 04:35:38,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 84.0, 30.0, 17.0, 22.0, 33.0, 33.0, 32.0, 37.0, 40.0]
2025-08-07 04:35:38,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1226 [INFO]: New best (178.55) for latency ExtremeSparseL4U32
2025-08-07 04:35:38,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 19 minutes, 12 seconds)
2025-08-07 04:37:36,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:37:36,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 129.65140 ± 30.833
2025-08-07 04:37:36,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [96.59568, 128.26244, 83.77552, 142.96, 112.65373, 194.87389, 127.85706, 167.11063, 116.81253, 125.61237]
2025-08-07 04:37:36,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 25.0, 17.0, 28.0, 22.0, 39.0, 27.0, 33.0, 23.0, 25.0]
2025-08-07 04:37:36,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 17 minutes, 18 seconds)
2025-08-07 04:39:34,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:39:34,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 127.34707 ± 29.611
2025-08-07 04:39:34,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [118.2749, 94.81727, 157.84773, 180.51335, 107.67914, 100.27184, 90.06093, 148.9304, 156.07365, 119.0015]
2025-08-07 04:39:34,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 19.0, 31.0, 35.0, 21.0, 20.0, 18.0, 29.0, 29.0, 23.0]
2025-08-07 04:39:34,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 15 minutes, 20 seconds)
2025-08-07 04:41:31,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:41:32,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 150.92146 ± 74.270
2025-08-07 04:41:32,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [130.14427, 88.98681, 118.08084, 340.6258, 90.35885, 144.71164, 120.29498, 236.1042, 117.96109, 121.94614]
2025-08-07 04:41:32,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 18.0, 23.0, 67.0, 18.0, 28.0, 24.0, 45.0, 23.0, 24.0]
2025-08-07 04:41:32,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 13 minutes, 26 seconds)
2025-08-07 04:43:29,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:43:29,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 119.21523 ± 25.943
2025-08-07 04:43:29,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [121.76653, 103.60964, 95.97191, 109.73252, 177.96477, 152.52484, 117.664154, 100.54539, 123.48433, 88.88817]
2025-08-07 04:43:29,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 21.0, 19.0, 22.0, 34.0, 30.0, 24.0, 20.0, 24.0, 18.0]
2025-08-07 04:43:29,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 11 minutes, 24 seconds)
2025-08-07 04:45:26,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:45:26,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 138.66408 ± 33.860
2025-08-07 04:45:26,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [152.19188, 199.92897, 100.2402, 165.83391, 159.40358, 146.38988, 158.6353, 89.63936, 111.70453, 102.67324]
2025-08-07 04:45:26,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 41.0, 20.0, 32.0, 31.0, 28.0, 31.0, 18.0, 22.0, 20.0]
2025-08-07 04:45:26,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 9 minutes, 20 seconds)
2025-08-07 04:47:23,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:47:24,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 153.45195 ± 90.415
2025-08-07 04:47:24,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [110.74988, 83.789024, 354.11786, 128.70325, 120.17747, 110.98869, 133.79, 89.659004, 306.7546, 95.789856]
2025-08-07 04:47:24,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 17.0, 65.0, 28.0, 24.0, 22.0, 28.0, 18.0, 67.0, 19.0]
2025-08-07 04:47:24,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 7 minutes, 16 seconds)
2025-08-07 04:49:21,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:49:21,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 158.38333 ± 112.811
2025-08-07 04:49:21,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [96.270546, 118.95557, 479.70358, 161.26566, 153.69597, 94.70173, 108.17801, 198.40747, 83.967224, 88.68751]
2025-08-07 04:49:21,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 23.0, 98.0, 31.0, 30.0, 19.0, 21.0, 39.0, 17.0, 18.0]
2025-08-07 04:49:21,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 5 minutes, 14 seconds)
2025-08-07 04:51:18,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:51:18,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 114.94784 ± 19.018
2025-08-07 04:51:18,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [101.29441, 126.70293, 100.86706, 101.46684, 113.65122, 95.59115, 95.44398, 123.54196, 155.94023, 134.97856]
2025-08-07 04:51:18,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 25.0, 20.0, 20.0, 23.0, 19.0, 19.0, 24.0, 30.0, 26.0]
2025-08-07 04:51:18,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 3 minutes, 10 seconds)
2025-08-07 04:53:16,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:53:16,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 152.69644 ± 83.536
2025-08-07 04:53:16,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [117.846466, 113.24645, 89.840614, 393.64145, 160.38132, 102.18909, 126.97723, 112.10948, 149.30415, 161.42827]
2025-08-07 04:53:16,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 22.0, 18.0, 74.0, 31.0, 20.0, 25.0, 22.0, 29.0, 31.0]
2025-08-07 04:53:16,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 1 minute, 19 seconds)
2025-08-07 04:55:13,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:55:14,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 146.65079 ± 47.710
2025-08-07 04:55:14,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [119.07717, 84.15046, 123.07963, 107.141495, 161.64204, 149.49443, 168.15248, 117.5608, 172.13222, 264.0772]
2025-08-07 04:55:14,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 17.0, 24.0, 21.0, 31.0, 29.0, 33.0, 23.0, 33.0, 48.0]
2025-08-07 04:55:14,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 59 minutes, 25 seconds)
2025-08-07 04:57:11,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:57:11,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 132.29922 ± 72.945
2025-08-07 04:57:11,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [125.794136, 83.76963, 107.775, 139.01065, 123.59205, 88.74146, 107.38504, 111.609634, 345.27454, 90.04007]
2025-08-07 04:57:11,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 17.0, 21.0, 27.0, 24.0, 18.0, 21.0, 22.0, 64.0, 18.0]
2025-08-07 04:57:11,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 57 minutes, 29 seconds)
2025-08-07 04:59:08,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:59:09,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 138.16321 ± 38.230
2025-08-07 04:59:09,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [105.51441, 88.762024, 153.06596, 128.7592, 121.5801, 100.434204, 200.78087, 112.17747, 190.93544, 179.62238]
2025-08-07 04:59:09,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 18.0, 30.0, 25.0, 24.0, 20.0, 40.0, 22.0, 39.0, 34.0]
2025-08-07 04:59:09,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 55 minutes, 33 seconds)
2025-08-07 05:01:06,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:01:06,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 133.89828 ± 28.864
2025-08-07 05:01:06,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [185.20921, 101.348, 165.1229, 159.48424, 89.74423, 129.52634, 126.26166, 127.00798, 106.94336, 148.33496]
2025-08-07 05:01:06,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [38.0, 20.0, 34.0, 31.0, 18.0, 25.0, 26.0, 25.0, 21.0, 29.0]
2025-08-07 05:01:06,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 53 minutes, 38 seconds)
2025-08-07 05:03:03,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:03:04,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 124.77373 ± 35.913
2025-08-07 05:03:04,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [94.90705, 159.52011, 139.01811, 88.75064, 83.98054, 97.31467, 144.36452, 89.13684, 168.12794, 182.61688]
2025-08-07 05:03:04,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 31.0, 28.0, 18.0, 17.0, 19.0, 28.0, 18.0, 33.0, 38.0]
2025-08-07 05:03:04,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 51 minutes, 41 seconds)
2025-08-07 05:05:01,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:05:01,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 127.46519 ± 27.918
2025-08-07 05:05:01,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [153.16708, 127.03415, 155.45557, 88.87079, 128.1337, 157.79929, 163.93184, 115.04504, 96.16253, 89.051956]
2025-08-07 05:05:01,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 25.0, 30.0, 18.0, 25.0, 33.0, 32.0, 23.0, 19.0, 18.0]
2025-08-07 05:05:01,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 49 minutes, 40 seconds)
2025-08-07 05:06:58,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:06:59,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 140.10123 ± 25.977
2025-08-07 05:06:59,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [121.21335, 121.520744, 150.48488, 149.91333, 120.79186, 166.22969, 177.36209, 90.07867, 134.01135, 169.40627]
2025-08-07 05:06:59,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 24.0, 29.0, 29.0, 24.0, 32.0, 34.0, 18.0, 26.0, 33.0]
2025-08-07 05:06:59,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 47 minutes, 42 seconds)
2025-08-07 05:08:56,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:08:56,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 111.42966 ± 25.896
2025-08-07 05:08:56,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [161.00381, 89.4694, 106.958885, 100.32585, 102.030785, 161.38441, 114.062325, 94.15401, 89.862015, 95.0452]
2025-08-07 05:08:56,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 18.0, 21.0, 20.0, 20.0, 31.0, 22.0, 19.0, 18.0, 19.0]
2025-08-07 05:08:56,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 45 minutes, 41 seconds)
2025-08-07 05:10:53,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:10:54,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 119.91994 ± 25.214
2025-08-07 05:10:54,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [104.16656, 116.306145, 117.08232, 95.099106, 95.06596, 146.9423, 90.30652, 173.25981, 121.03478, 139.9359]
2025-08-07 05:10:54,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 23.0, 23.0, 19.0, 19.0, 28.0, 18.0, 33.0, 24.0, 27.0]
2025-08-07 05:10:54,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 43 minutes, 46 seconds)
2025-08-07 05:12:51,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:12:51,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 109.64768 ± 15.567
2025-08-07 05:12:51,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [112.47115, 134.26349, 101.73377, 108.64668, 114.40224, 89.48427, 96.0465, 136.93849, 90.161674, 112.3286]
2025-08-07 05:12:51,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 26.0, 20.0, 21.0, 22.0, 18.0, 19.0, 26.0, 18.0, 22.0]
2025-08-07 05:12:51,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 41 minutes, 47 seconds)
2025-08-07 05:14:48,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:14:49,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 195.87595 ± 123.200
2025-08-07 05:14:49,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [101.65792, 427.7869, 113.03918, 403.08194, 88.939674, 124.891014, 295.56235, 150.82257, 151.54106, 101.436844]
2025-08-07 05:14:49,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 90.0, 22.0, 89.0, 18.0, 25.0, 58.0, 29.0, 29.0, 20.0]
2025-08-07 05:14:49,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1226 [INFO]: New best (195.88) for latency ExtremeSparseL4U32
2025-08-07 05:14:49,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 39 minutes, 52 seconds)
2025-08-07 05:16:46,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:16:46,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 117.57272 ± 23.252
2025-08-07 05:16:46,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [128.95341, 105.64351, 154.47618, 94.64067, 158.06746, 132.57076, 102.30002, 100.96542, 108.7309, 89.37876]
2025-08-07 05:16:46,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 21.0, 30.0, 19.0, 31.0, 26.0, 20.0, 20.0, 21.0, 18.0]
2025-08-07 05:16:46,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 37 minutes, 58 seconds)
2025-08-07 05:18:43,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:18:44,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 130.91988 ± 52.010
2025-08-07 05:18:44,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [94.70148, 119.844986, 113.3656, 130.4083, 89.19885, 149.1291, 274.66983, 139.7935, 83.87643, 114.21065]
2025-08-07 05:18:44,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 24.0, 22.0, 25.0, 18.0, 29.0, 60.0, 27.0, 17.0, 22.0]
2025-08-07 05:18:44,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 36 minutes, 2 seconds)
2025-08-07 05:20:41,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:20:41,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 132.74869 ± 33.634
2025-08-07 05:20:41,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [89.75536, 89.5682, 121.13498, 95.460884, 164.95206, 110.36787, 142.87698, 173.954, 168.2993, 171.11726]
2025-08-07 05:20:41,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 18.0, 24.0, 19.0, 32.0, 22.0, 27.0, 33.0, 32.0, 35.0]
2025-08-07 05:20:41,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 34 minutes, 3 seconds)
2025-08-07 05:22:38,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:22:39,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 113.55729 ± 24.307
2025-08-07 05:22:39,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [79.0083, 135.0165, 101.17098, 112.94631, 174.28374, 108.3293, 112.62477, 108.35371, 95.33831, 108.50103]
2025-08-07 05:22:39,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [16.0, 29.0, 20.0, 22.0, 34.0, 21.0, 22.0, 21.0, 19.0, 21.0]
2025-08-07 05:22:39,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 32 minutes, 3 seconds)
2025-08-07 05:24:36,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:24:36,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 130.43320 ± 34.594
2025-08-07 05:24:36,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [202.15611, 126.94693, 161.03159, 166.63007, 106.72834, 89.56454, 107.37099, 141.11418, 101.18293, 101.60627]
2025-08-07 05:24:36,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [39.0, 25.0, 31.0, 34.0, 21.0, 18.0, 21.0, 27.0, 20.0, 20.0]
2025-08-07 05:24:36,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 30 minutes, 6 seconds)
2025-08-07 05:26:33,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:26:33,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 128.78108 ± 22.890
2025-08-07 05:26:33,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [148.90665, 145.00624, 142.30772, 114.04859, 140.7111, 101.266945, 94.58567, 117.15582, 113.428314, 170.39374]
2025-08-07 05:26:33,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 28.0, 27.0, 22.0, 27.0, 20.0, 19.0, 23.0, 22.0, 34.0]
2025-08-07 05:26:33,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 28 minutes, 2 seconds)
2025-08-07 05:28:31,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:28:31,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 125.45964 ± 24.260
2025-08-07 05:28:31,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [128.66452, 96.32249, 96.87254, 133.11534, 100.57746, 140.61475, 147.21063, 116.007225, 117.956696, 177.25476]
2025-08-07 05:28:31,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 19.0, 19.0, 26.0, 20.0, 28.0, 29.0, 23.0, 23.0, 35.0]
2025-08-07 05:28:31,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 26 minutes, 7 seconds)
2025-08-07 05:30:28,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:30:29,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 134.82109 ± 39.237
2025-08-07 05:30:29,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [119.37136, 89.15856, 201.84683, 84.167404, 90.19946, 112.37987, 147.0369, 173.93011, 162.70607, 167.41447]
2025-08-07 05:30:29,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 18.0, 41.0, 17.0, 18.0, 22.0, 28.0, 37.0, 32.0, 34.0]
2025-08-07 05:30:29,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 24 minutes, 8 seconds)
2025-08-07 05:32:25,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:32:26,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 143.41525 ± 79.620
2025-08-07 05:32:26,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [100.835014, 89.10769, 90.10497, 124.862686, 132.59668, 112.02156, 125.509285, 157.28604, 126.98484, 374.84378]
2025-08-07 05:32:26,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 18.0, 18.0, 24.0, 26.0, 22.0, 24.0, 32.0, 26.0, 79.0]
2025-08-07 05:32:26,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 22 minutes, 12 seconds)
2025-08-07 05:34:23,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:34:23,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 117.14516 ± 32.375
2025-08-07 05:34:23,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [173.76341, 94.8545, 96.33905, 123.24638, 101.079926, 181.67514, 89.58266, 122.222145, 99.89528, 88.792984]
2025-08-07 05:34:23,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 19.0, 19.0, 26.0, 20.0, 36.0, 18.0, 24.0, 20.0, 18.0]
2025-08-07 05:34:23,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 20 minutes, 14 seconds)
2025-08-07 05:36:20,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:36:21,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 132.27737 ± 28.550
2025-08-07 05:36:21,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [191.98105, 121.703926, 133.906, 137.49698, 148.02455, 96.722694, 103.08096, 101.70253, 123.25923, 164.89578]
2025-08-07 05:36:21,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [40.0, 25.0, 26.0, 27.0, 29.0, 19.0, 20.0, 20.0, 24.0, 33.0]
2025-08-07 05:36:21,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 18 minutes, 17 seconds)
2025-08-07 05:38:18,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:38:18,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 118.11798 ± 19.032
2025-08-07 05:38:18,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [123.55273, 111.99534, 113.19857, 101.02828, 154.83148, 95.67322, 144.22746, 112.96295, 129.18176, 94.52795]
2025-08-07 05:38:18,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 22.0, 22.0, 20.0, 31.0, 19.0, 28.0, 22.0, 25.0, 19.0]
2025-08-07 05:38:18,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 16 minutes, 17 seconds)
2025-08-07 05:40:15,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:40:16,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 152.75993 ± 48.861
2025-08-07 05:40:16,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [134.1137, 140.11333, 126.44205, 155.54117, 138.94484, 101.440346, 113.45217, 172.77225, 158.36008, 286.41946]
2025-08-07 05:40:16,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 28.0, 25.0, 32.0, 27.0, 20.0, 22.0, 33.0, 32.0, 57.0]
2025-08-07 05:40:16,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 14 minutes, 21 seconds)
2025-08-07 05:42:13,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:42:13,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 114.74585 ± 24.811
2025-08-07 05:42:13,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [96.102036, 89.34198, 133.93982, 84.14388, 158.68431, 101.57029, 100.854385, 117.89441, 153.94624, 110.981026]
2025-08-07 05:42:13,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 18.0, 26.0, 17.0, 31.0, 20.0, 20.0, 23.0, 30.0, 22.0]
2025-08-07 05:42:13,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 12 minutes, 25 seconds)
2025-08-07 05:44:10,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:44:11,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 129.78665 ± 30.261
2025-08-07 05:44:11,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [163.16884, 163.30139, 88.728294, 181.34474, 101.68221, 96.561195, 106.75417, 130.54788, 141.00632, 124.771545]
2025-08-07 05:44:11,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 32.0, 18.0, 36.0, 20.0, 19.0, 21.0, 25.0, 27.0, 25.0]
2025-08-07 05:44:11,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 10 minutes, 28 seconds)
2025-08-07 05:46:08,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:46:09,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 147.66299 ± 62.232
2025-08-07 05:46:09,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [156.63861, 187.87083, 311.80197, 123.028145, 96.201035, 111.435524, 91.487404, 135.98863, 160.07188, 102.10586]
2025-08-07 05:46:09,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 36.0, 64.0, 24.0, 19.0, 22.0, 18.0, 26.0, 31.0, 20.0]
2025-08-07 05:46:09,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 8 minutes, 37 seconds)
2025-08-07 05:48:06,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:48:06,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 145.13496 ± 68.213
2025-08-07 05:48:06,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [102.12905, 186.86581, 116.07452, 333.7878, 135.48389, 95.140945, 121.828514, 139.27762, 131.82172, 88.93973]
2025-08-07 05:48:06,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 36.0, 23.0, 70.0, 27.0, 19.0, 24.0, 28.0, 26.0, 18.0]
2025-08-07 05:48:06,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 6 minutes, 41 seconds)
2025-08-07 05:50:04,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:50:04,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 129.23260 ± 25.602
2025-08-07 05:50:04,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [119.571236, 83.94095, 160.81157, 112.48526, 159.42328, 113.532, 160.71202, 122.66445, 151.2728, 107.9125]
2025-08-07 05:50:04,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 17.0, 31.0, 22.0, 31.0, 22.0, 32.0, 24.0, 30.0, 21.0]
2025-08-07 05:50:04,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 4 minutes, 43 seconds)
2025-08-07 05:52:01,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:52:02,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 110.78463 ± 25.616
2025-08-07 05:52:02,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [83.87625, 153.57358, 102.349495, 161.85956, 100.64265, 90.805405, 88.634705, 94.58428, 112.4126, 119.107765]
2025-08-07 05:52:02,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [17.0, 31.0, 20.0, 31.0, 20.0, 18.0, 18.0, 19.0, 22.0, 23.0]
2025-08-07 05:52:02,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 2 minutes, 47 seconds)
2025-08-07 05:53:59,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:53:59,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 117.94463 ± 30.250
2025-08-07 05:53:59,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [107.01196, 121.79143, 175.91109, 168.05399, 96.55375, 84.13083, 132.76855, 89.58073, 101.66321, 101.98068]
2025-08-07 05:53:59,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 24.0, 35.0, 33.0, 19.0, 17.0, 26.0, 18.0, 20.0, 20.0]
2025-08-07 05:53:59,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 47 seconds)
2025-08-07 05:55:57,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:55:57,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 135.63632 ± 40.861
2025-08-07 05:55:57,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [152.61722, 95.1398, 148.53523, 89.149376, 100.16738, 161.83115, 116.555145, 123.59331, 235.54135, 133.23328]
2025-08-07 05:55:57,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 19.0, 29.0, 18.0, 20.0, 33.0, 23.0, 24.0, 45.0, 26.0]
2025-08-07 05:55:57,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 58 minutes, 49 seconds)
2025-08-07 05:57:54,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:57:55,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 134.02939 ± 22.692
2025-08-07 05:57:55,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [124.42842, 109.94663, 138.65962, 137.43484, 139.49353, 153.27171, 101.44122, 125.271164, 186.88547, 123.46117]
2025-08-07 05:57:55,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 22.0, 27.0, 27.0, 27.0, 33.0, 20.0, 26.0, 37.0, 24.0]
2025-08-07 05:57:55,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 56 minutes, 52 seconds)
2025-08-07 05:59:52,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:59:53,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 136.23380 ± 39.445
2025-08-07 05:59:53,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [188.19376, 105.78912, 147.72827, 157.52693, 113.171936, 99.40492, 102.69172, 214.10411, 144.42346, 89.30388]
2025-08-07 05:59:53,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [37.0, 21.0, 29.0, 31.0, 22.0, 20.0, 20.0, 42.0, 28.0, 18.0]
2025-08-07 05:59:53,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 54 minutes, 57 seconds)
2025-08-07 06:01:50,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:01:50,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 108.94452 ± 19.955
2025-08-07 06:01:50,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [100.71711, 113.82214, 121.3887, 155.82227, 97.05594, 124.36221, 88.37061, 83.57706, 102.50681, 101.8223]
2025-08-07 06:01:50,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 22.0, 24.0, 33.0, 19.0, 25.0, 18.0, 17.0, 20.0, 20.0]
2025-08-07 06:01:50,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 52 minutes, 57 seconds)
2025-08-07 06:03:48,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:03:48,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 119.40641 ± 19.850
2025-08-07 06:03:48,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [142.79762, 142.05206, 107.47138, 100.242836, 95.52988, 106.03581, 151.67477, 134.64429, 110.734474, 102.88095]
2025-08-07 06:03:48,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 28.0, 21.0, 20.0, 19.0, 21.0, 29.0, 26.0, 22.0, 20.0]
2025-08-07 06:03:48,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 51 minutes, 2 seconds)
2025-08-07 06:05:46,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:05:46,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 105.96587 ± 18.833
2025-08-07 06:05:46,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [111.68525, 128.015, 145.19801, 88.91556, 96.910194, 111.65363, 89.3373, 114.10199, 89.69933, 84.14243]
2025-08-07 06:05:46,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 26.0, 28.0, 18.0, 19.0, 22.0, 18.0, 22.0, 18.0, 17.0]
2025-08-07 06:05:46,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 49 minutes, 4 seconds)
2025-08-07 06:07:43,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:07:44,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 134.68521 ± 42.342
2025-08-07 06:07:44,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [112.81755, 145.12453, 95.25053, 172.87608, 137.2316, 100.64835, 135.59698, 240.05806, 95.60296, 111.645485]
2025-08-07 06:07:44,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 29.0, 19.0, 34.0, 28.0, 20.0, 27.0, 47.0, 19.0, 22.0]
2025-08-07 06:07:44,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 47 minutes, 6 seconds)
2025-08-07 06:09:41,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:09:41,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 127.67056 ± 24.579
2025-08-07 06:09:41,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [151.53415, 121.548805, 142.42815, 120.26566, 102.51305, 161.43513, 135.83815, 96.12855, 88.78747, 156.22644]
2025-08-07 06:09:41,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 24.0, 28.0, 24.0, 20.0, 32.0, 26.0, 19.0, 18.0, 31.0]
2025-08-07 06:09:41,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 45 minutes, 6 seconds)
2025-08-07 06:11:38,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:11:39,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 120.38275 ± 27.519
2025-08-07 06:11:39,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [188.68013, 98.432076, 123.011696, 105.5585, 96.43939, 102.99271, 141.26442, 94.89811, 117.079475, 135.47107]
2025-08-07 06:11:39,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [37.0, 20.0, 24.0, 21.0, 19.0, 20.0, 28.0, 19.0, 23.0, 26.0]
2025-08-07 06:11:39,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 43 minutes, 9 seconds)
2025-08-07 06:13:36,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:13:36,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 144.05043 ± 62.133
2025-08-07 06:13:36,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [135.31062, 317.65268, 121.21142, 90.213684, 144.011, 95.72188, 107.52762, 145.1277, 167.27171, 116.45594]
2025-08-07 06:13:36,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 68.0, 24.0, 18.0, 28.0, 19.0, 21.0, 28.0, 34.0, 23.0]
2025-08-07 06:13:36,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 41 minutes, 11 seconds)
2025-08-07 06:15:33,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:15:34,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 135.43262 ± 35.802
2025-08-07 06:15:34,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [107.2295, 184.10982, 139.9526, 100.94224, 168.83733, 101.19956, 188.39066, 88.85237, 113.18595, 161.62622]
2025-08-07 06:15:34,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 36.0, 28.0, 20.0, 34.0, 20.0, 37.0, 18.0, 22.0, 31.0]
2025-08-07 06:15:34,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 39 minutes, 11 seconds)
2025-08-07 06:17:31,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:17:32,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 118.58236 ± 26.065
2025-08-07 06:17:32,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [83.94743, 101.37545, 136.44649, 132.54771, 153.47249, 157.67523, 90.097176, 90.042366, 134.38945, 105.82987]
2025-08-07 06:17:32,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [17.0, 20.0, 27.0, 26.0, 30.0, 31.0, 18.0, 18.0, 27.0, 21.0]
2025-08-07 06:17:32,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 37 minutes, 14 seconds)
2025-08-07 06:19:28,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:19:29,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 133.61304 ± 30.122
2025-08-07 06:19:29,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [89.072556, 166.20674, 136.8848, 121.58406, 174.18434, 101.57412, 155.72636, 156.11296, 146.35358, 88.43088]
2025-08-07 06:19:29,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 32.0, 26.0, 24.0, 34.0, 20.0, 31.0, 31.0, 30.0, 18.0]
2025-08-07 06:19:29,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 35 minutes, 15 seconds)
2025-08-07 06:21:26,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:21:27,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 105.60574 ± 7.827
2025-08-07 06:21:27,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [113.00663, 95.34111, 100.68602, 106.237045, 102.560074, 114.06169, 113.24076, 90.477234, 107.27877, 113.168076]
2025-08-07 06:21:27,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 19.0, 20.0, 21.0, 20.0, 22.0, 22.0, 18.0, 21.0, 22.0]
2025-08-07 06:21:27,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 33 minutes, 18 seconds)
2025-08-07 06:23:24,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:23:24,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 121.18098 ± 34.979
2025-08-07 06:23:24,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [106.51344, 210.73724, 138.62288, 151.8877, 94.74059, 94.37556, 96.86182, 111.91284, 101.910385, 104.247475]
2025-08-07 06:23:24,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 42.0, 29.0, 29.0, 19.0, 19.0, 19.0, 22.0, 20.0, 21.0]
2025-08-07 06:23:24,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 31 minutes, 20 seconds)
2025-08-07 06:25:21,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:25:22,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 144.84616 ± 49.763
2025-08-07 06:25:22,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [121.488945, 268.6573, 107.97086, 188.37396, 153.81111, 134.14676, 108.65073, 116.43745, 159.75531, 89.16916]
2025-08-07 06:25:22,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 52.0, 21.0, 37.0, 29.0, 26.0, 21.0, 23.0, 31.0, 18.0]
2025-08-07 06:25:22,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 29 minutes, 24 seconds)
2025-08-07 06:27:19,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:27:20,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 147.65965 ± 69.521
2025-08-07 06:27:20,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [133.8105, 94.83849, 155.89842, 108.81029, 339.59128, 156.34625, 110.29119, 112.75861, 175.4945, 88.75706]
2025-08-07 06:27:20,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 19.0, 30.0, 21.0, 62.0, 31.0, 22.0, 22.0, 34.0, 18.0]
2025-08-07 06:27:20,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 27 minutes, 26 seconds)
2025-08-07 06:29:17,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:29:17,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 141.04819 ± 57.156
2025-08-07 06:29:17,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [107.90337, 121.08533, 283.63608, 100.60687, 84.065315, 114.6548, 124.652855, 174.35611, 106.57176, 192.94936]
2025-08-07 06:29:17,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 23.0, 62.0, 20.0, 17.0, 23.0, 24.0, 35.0, 21.0, 37.0]
2025-08-07 06:29:17,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 25 minutes, 30 seconds)
2025-08-07 06:31:15,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:31:15,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 135.93979 ± 33.128
2025-08-07 06:31:15,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [174.43277, 175.53659, 111.434105, 96.182434, 100.95682, 128.53665, 152.22394, 150.77596, 90.06837, 179.25021]
2025-08-07 06:31:15,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 36.0, 22.0, 19.0, 20.0, 25.0, 30.0, 30.0, 18.0, 35.0]
2025-08-07 06:31:15,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 23 minutes, 32 seconds)
2025-08-07 06:33:12,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:33:13,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 126.03468 ± 29.792
2025-08-07 06:33:13,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [175.835, 106.592354, 95.90272, 124.958694, 102.12545, 162.21657, 90.36803, 100.726845, 157.42159, 144.19951]
2025-08-07 06:33:13,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 21.0, 19.0, 24.0, 20.0, 31.0, 18.0, 20.0, 32.0, 28.0]
2025-08-07 06:33:13,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 21 minutes, 35 seconds)
2025-08-07 06:35:10,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:35:11,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 131.06332 ± 65.153
2025-08-07 06:35:11,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [89.86795, 130.3029, 95.72169, 117.077324, 95.78196, 89.28653, 123.73364, 124.88581, 321.21115, 122.76426]
2025-08-07 06:35:11,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 26.0, 19.0, 23.0, 19.0, 18.0, 24.0, 25.0, 70.0, 24.0]
2025-08-07 06:35:11,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 19 minutes, 37 seconds)
2025-08-07 06:37:08,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:37:08,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 118.99878 ± 29.743
2025-08-07 06:37:08,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [119.40083, 117.97358, 101.40258, 147.07823, 101.709595, 173.08708, 89.65921, 161.60907, 94.04411, 84.023575]
2025-08-07 06:37:08,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 23.0, 20.0, 29.0, 20.0, 37.0, 18.0, 34.0, 19.0, 17.0]
2025-08-07 06:37:08,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 17 minutes, 39 seconds)
2025-08-07 06:39:06,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:39:07,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 107.54155 ± 17.715
2025-08-07 06:39:07,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [118.937164, 128.22437, 94.84033, 118.32205, 88.93955, 102.504486, 83.96353, 139.57515, 89.15642, 110.9525]
2025-08-07 06:39:07,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 25.0, 19.0, 23.0, 18.0, 20.0, 17.0, 27.0, 18.0, 22.0]
2025-08-07 06:39:07,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 15 minutes, 42 seconds)
2025-08-07 06:41:04,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:41:04,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 129.39212 ± 46.314
2025-08-07 06:41:04,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [128.06189, 89.114975, 126.99976, 238.14766, 124.142166, 134.11491, 88.92617, 88.71177, 90.381714, 185.32011]
2025-08-07 06:41:04,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 18.0, 25.0, 46.0, 25.0, 26.0, 18.0, 18.0, 18.0, 36.0]
2025-08-07 06:41:04,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 13 minutes, 45 seconds)
2025-08-07 06:43:02,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:43:02,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 120.11140 ± 22.971
2025-08-07 06:43:02,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [101.43238, 105.92321, 83.78829, 151.74037, 113.61411, 146.4294, 117.981445, 145.16528, 94.867966, 140.17168]
2025-08-07 06:43:02,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 21.0, 17.0, 30.0, 22.0, 28.0, 23.0, 29.0, 19.0, 27.0]
2025-08-07 06:43:02,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 11 minutes, 47 seconds)
2025-08-07 06:44:59,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:45:00,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 126.11704 ± 29.772
2025-08-07 06:45:00,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [167.39809, 117.45488, 83.88318, 102.857735, 104.92634, 116.40377, 165.38054, 95.33485, 143.79907, 163.73187]
2025-08-07 06:45:00,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 23.0, 17.0, 20.0, 21.0, 23.0, 33.0, 19.0, 29.0, 34.0]
2025-08-07 06:45:00,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes, 49 seconds)
2025-08-07 06:46:57,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:46:57,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 129.91287 ± 72.997
2025-08-07 06:46:57,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [345.3926, 114.095535, 136.5965, 96.37475, 99.94263, 112.0859, 113.39945, 90.078476, 94.692055, 96.47087]
2025-08-07 06:46:57,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [66.0, 23.0, 27.0, 19.0, 20.0, 22.0, 23.0, 18.0, 19.0, 19.0]
2025-08-07 06:46:57,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 50 seconds)
2025-08-07 06:48:54,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:48:55,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 164.08426 ± 46.969
2025-08-07 06:48:55,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [122.57872, 139.75099, 131.67787, 136.99892, 127.71014, 265.9083, 242.29102, 160.20543, 161.05933, 152.66199]
2025-08-07 06:48:55,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 27.0, 26.0, 27.0, 25.0, 52.0, 48.0, 32.0, 32.0, 29.0]
2025-08-07 06:48:55,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 52 seconds)
2025-08-07 06:50:52,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:50:53,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 133.92592 ± 41.154
2025-08-07 06:50:53,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [118.917145, 173.44363, 129.31783, 95.651596, 120.698685, 117.13179, 108.612236, 109.15334, 123.57428, 242.75871]
2025-08-07 06:50:53,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 35.0, 25.0, 19.0, 25.0, 23.0, 21.0, 22.0, 24.0, 46.0]
2025-08-07 06:50:53,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 55 seconds)
2025-08-07 06:52:49,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:52:50,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 117.55412 ± 23.302
2025-08-07 06:52:50,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [119.38307, 95.83567, 163.52022, 100.76703, 125.19566, 95.682106, 114.38377, 138.26573, 138.68376, 83.82428]
2025-08-07 06:52:50,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 19.0, 33.0, 20.0, 24.0, 19.0, 23.0, 27.0, 27.0, 17.0]
2025-08-07 06:52:50,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 57 seconds)
2025-08-07 06:54:47,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:54:47,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 131.92014 ± 40.239
2025-08-07 06:54:47,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [145.91249, 102.09087, 95.921074, 154.48535, 142.47598, 94.75388, 101.09195, 108.22635, 141.28581, 232.95767]
2025-08-07 06:54:47,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 20.0, 19.0, 30.0, 29.0, 19.0, 20.0, 21.0, 27.0, 46.0]
2025-08-07 06:54:47,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1251 [DEBUG]: Training session finished
