2025-08-07 03:38:04,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc20-humanoid/ExtremeSparseL4U32-bpql-mem32
2025-08-07 03:38:04,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc20-humanoid/ExtremeSparseL4U32-bpql-mem32
2025-08-07 03:38:04,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x149431ba3010>}
2025-08-07 03:38:04,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1111 [DEBUG]: using device: cuda
2025-08-07 03:38:04,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1133 [INFO]: Creating new trainer
2025-08-07 03:38:04,139 baseline-bpql-noiseperc20-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=920, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-08-07 03:38:04,139 baseline-bpql-noiseperc20-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 03:38:05,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1194 [DEBUG]: Starting training session...
2025-08-07 03:38:05,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 1/100
2025-08-07 03:39:59,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:40:00,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 141.82169 ± 27.567
2025-08-07 03:40:00,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [107.03368, 127.704094, 159.13258, 162.3864, 101.65258, 156.12029, 141.39226, 124.32246, 199.56656, 138.90596]
2025-08-07 03:40:00,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 25.0, 33.0, 31.0, 20.0, 30.0, 27.0, 24.0, 38.0, 27.0]
2025-08-07 03:40:00,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1226 [INFO]: New best (141.82) for latency ExtremeSparseL4U32
2025-08-07 03:40:00,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 8 minutes, 48 seconds)
2025-08-07 03:42:01,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:42:02,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 167.22720 ± 95.953
2025-08-07 03:42:02,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [114.66121, 295.68582, 101.68338, 133.0015, 113.178566, 161.16212, 112.9027, 135.28305, 100.842865, 403.87073]
2025-08-07 03:42:02,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 66.0, 20.0, 26.0, 22.0, 31.0, 22.0, 26.0, 20.0, 76.0]
2025-08-07 03:42:02,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1226 [INFO]: New best (167.23) for latency ExtremeSparseL4U32
2025-08-07 03:42:02,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 12 minutes, 55 seconds)
2025-08-07 03:44:03,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:44:03,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 138.43758 ± 39.824
2025-08-07 03:44:03,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [96.53551, 146.29576, 101.36911, 100.39354, 170.59128, 188.09203, 117.601364, 101.75302, 215.07916, 146.66502]
2025-08-07 03:44:03,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 28.0, 20.0, 20.0, 33.0, 36.0, 23.0, 20.0, 41.0, 29.0]
2025-08-07 03:44:03,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 12 minutes, 42 seconds)
2025-08-07 03:46:05,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:46:05,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 150.15895 ± 75.726
2025-08-07 03:46:05,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [129.1934, 112.81091, 140.01816, 124.625565, 111.78855, 96.02241, 157.60034, 372.3415, 127.619095, 129.56943]
2025-08-07 03:46:05,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 22.0, 28.0, 24.0, 22.0, 19.0, 31.0, 71.0, 25.0, 25.0]
2025-08-07 03:46:06,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 12 minutes)
2025-08-07 03:48:06,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:48:06,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 117.27563 ± 27.838
2025-08-07 03:48:06,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [95.93345, 89.69276, 113.371414, 111.97746, 187.46718, 101.5273, 101.63816, 125.009834, 101.347694, 144.79108]
2025-08-07 03:48:06,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 18.0, 22.0, 22.0, 36.0, 20.0, 20.0, 24.0, 20.0, 29.0]
2025-08-07 03:48:06,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 10 minutes, 13 seconds)
2025-08-07 03:50:07,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:50:07,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 159.43217 ± 103.089
2025-08-07 03:50:07,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [113.4812, 102.666534, 112.08332, 128.05846, 96.744026, 145.7091, 84.580345, 265.62753, 112.90101, 432.47028]
2025-08-07 03:50:07,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 20.0, 22.0, 25.0, 19.0, 28.0, 17.0, 55.0, 22.0, 85.0]
2025-08-07 03:50:07,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 10 minutes, 20 seconds)
2025-08-07 03:52:08,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:52:09,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 142.37608 ± 53.323
2025-08-07 03:52:09,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [130.3451, 113.75222, 102.385765, 285.42654, 147.57413, 108.63136, 113.55089, 184.42659, 135.16638, 102.501854]
2025-08-07 03:52:09,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 23.0, 20.0, 56.0, 28.0, 21.0, 22.0, 36.0, 26.0, 20.0]
2025-08-07 03:52:09,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 8 minutes, 11 seconds)
2025-08-07 03:54:10,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:54:10,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 163.24857 ± 115.249
2025-08-07 03:54:10,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [121.06557, 108.58571, 144.68086, 107.174324, 112.68099, 96.18315, 160.19273, 149.14632, 128.77054, 504.00546]
2025-08-07 03:54:10,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 21.0, 28.0, 21.0, 22.0, 19.0, 31.0, 29.0, 25.0, 106.0]
2025-08-07 03:54:10,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 6 minutes, 12 seconds)
2025-08-07 03:56:12,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:56:12,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 166.59166 ± 84.331
2025-08-07 03:56:12,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [96.00482, 131.10349, 175.19138, 176.10234, 116.694916, 187.68529, 403.28296, 114.681274, 113.27297, 151.89731]
2025-08-07 03:56:12,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 25.0, 34.0, 34.0, 23.0, 38.0, 77.0, 22.0, 22.0, 30.0]
2025-08-07 03:56:12,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 3 hours, 4 minutes, 3 seconds)
2025-08-07 03:58:13,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:58:14,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 133.60889 ± 27.849
2025-08-07 03:58:14,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [119.03027, 177.11455, 90.47883, 130.7698, 125.16254, 144.05856, 188.19969, 116.14441, 116.543884, 128.58638]
2025-08-07 03:58:14,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 35.0, 18.0, 26.0, 24.0, 28.0, 37.0, 23.0, 23.0, 25.0]
2025-08-07 03:58:14,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 3 hours, 2 minutes, 19 seconds)
2025-08-07 04:00:15,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:00:16,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 180.49130 ± 122.516
2025-08-07 04:00:16,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [317.38586, 83.96478, 114.314644, 229.4593, 89.44167, 483.16568, 112.515594, 149.58766, 95.885124, 129.19266]
2025-08-07 04:00:16,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [63.0, 17.0, 22.0, 44.0, 18.0, 92.0, 22.0, 31.0, 19.0, 25.0]
2025-08-07 04:00:16,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1226 [INFO]: New best (180.49) for latency ExtremeSparseL4U32
2025-08-07 04:00:16,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 3 hours, 32 seconds)
2025-08-07 04:02:17,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:02:18,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 132.29225 ± 31.325
2025-08-07 04:02:18,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [143.19777, 102.17445, 150.02406, 106.98195, 126.05621, 118.49925, 96.530624, 107.88654, 184.20073, 187.37086]
2025-08-07 04:02:18,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 20.0, 29.0, 21.0, 24.0, 23.0, 19.0, 21.0, 36.0, 37.0]
2025-08-07 04:02:18,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 58 minutes, 41 seconds)
2025-08-07 04:04:19,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:04:20,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 129.64880 ± 59.430
2025-08-07 04:04:20,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [97.02833, 106.29255, 143.76616, 106.4979, 102.07549, 96.83192, 299.86142, 84.33493, 137.35703, 122.44231]
2025-08-07 04:04:20,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 21.0, 28.0, 21.0, 20.0, 19.0, 63.0, 17.0, 27.0, 24.0]
2025-08-07 04:04:20,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 56 minutes, 46 seconds)
2025-08-07 04:06:20,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:06:21,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 201.02136 ± 109.717
2025-08-07 04:06:21,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [149.5047, 96.21889, 113.202805, 143.39215, 127.06689, 288.17166, 395.11823, 163.63278, 399.0075, 134.8979]
2025-08-07 04:06:21,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 19.0, 22.0, 28.0, 25.0, 58.0, 77.0, 32.0, 76.0, 26.0]
2025-08-07 04:06:21,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1226 [INFO]: New best (201.02) for latency ExtremeSparseL4U32
2025-08-07 04:06:21,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 54 minutes, 22 seconds)
2025-08-07 04:08:20,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:08:20,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 121.32064 ± 24.790
2025-08-07 04:08:20,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [95.7057, 101.60148, 156.4434, 84.37826, 159.33482, 137.78206, 130.87631, 95.9654, 122.75139, 128.36754]
2025-08-07 04:08:20,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 20.0, 31.0, 17.0, 31.0, 27.0, 25.0, 19.0, 24.0, 25.0]
2025-08-07 04:08:20,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 51 minutes, 47 seconds)
2025-08-07 04:10:19,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:10:20,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 122.09473 ± 32.856
2025-08-07 04:10:20,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [90.41313, 196.27467, 123.80323, 107.81582, 84.46318, 107.584595, 153.93056, 149.42252, 106.07884, 101.16065]
2025-08-07 04:10:20,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 39.0, 24.0, 21.0, 17.0, 21.0, 32.0, 30.0, 21.0, 20.0]
2025-08-07 04:10:20,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 49 minutes, 5 seconds)
2025-08-07 04:12:20,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:12:21,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 125.24953 ± 22.174
2025-08-07 04:12:21,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [96.35447, 89.19895, 125.39166, 126.972786, 166.24596, 137.96077, 114.06133, 119.726135, 123.437614, 153.14572]
2025-08-07 04:12:21,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 18.0, 24.0, 25.0, 33.0, 27.0, 22.0, 23.0, 24.0, 30.0]
2025-08-07 04:12:21,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 46 minutes, 44 seconds)
2025-08-07 04:14:19,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:14:20,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 136.21674 ± 27.843
2025-08-07 04:14:20,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [160.55537, 102.05818, 145.848, 153.33006, 135.85323, 148.75308, 89.95958, 95.67911, 166.21936, 163.91135]
2025-08-07 04:14:20,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 20.0, 28.0, 30.0, 27.0, 29.0, 18.0, 19.0, 32.0, 33.0]
2025-08-07 04:14:20,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 44 minutes, 1 second)
2025-08-07 04:16:20,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:16:20,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 123.63530 ± 27.350
2025-08-07 04:16:20,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [187.9383, 84.249985, 104.867096, 113.75134, 118.33931, 134.11522, 110.378845, 152.77663, 118.92769, 111.0086]
2025-08-07 04:16:20,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [36.0, 17.0, 21.0, 22.0, 23.0, 26.0, 22.0, 31.0, 23.0, 22.0]
2025-08-07 04:16:20,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 41 minutes, 53 seconds)
2025-08-07 04:18:19,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:18:20,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 145.39252 ± 83.379
2025-08-07 04:18:20,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [96.51531, 149.24641, 113.05225, 90.206566, 161.78604, 137.446, 385.64655, 108.34828, 90.51442, 121.163315]
2025-08-07 04:18:20,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 29.0, 22.0, 18.0, 31.0, 27.0, 83.0, 21.0, 18.0, 23.0]
2025-08-07 04:18:20,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 39 minutes, 49 seconds)
2025-08-07 04:20:19,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:20:19,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 128.85269 ± 34.656
2025-08-07 04:20:19,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [165.3467, 164.13283, 106.64083, 95.90306, 142.27022, 197.1312, 89.605034, 95.74045, 120.32126, 111.435265]
2025-08-07 04:20:19,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 33.0, 21.0, 19.0, 28.0, 38.0, 18.0, 19.0, 25.0, 22.0]
2025-08-07 04:20:19,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 37 minutes, 49 seconds)
2025-08-07 04:22:18,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:22:19,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 147.64067 ± 33.949
2025-08-07 04:22:19,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [106.64906, 162.45692, 174.98593, 161.98904, 107.9312, 223.03444, 134.76971, 125.0314, 122.31131, 157.24774]
2025-08-07 04:22:19,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 32.0, 34.0, 31.0, 21.0, 45.0, 26.0, 24.0, 24.0, 31.0]
2025-08-07 04:22:19,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 35 minutes, 34 seconds)
2025-08-07 04:24:17,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:24:18,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 130.73634 ± 28.308
2025-08-07 04:24:18,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [118.154236, 192.67178, 133.01839, 145.94746, 129.00674, 107.8588, 158.98909, 101.900444, 90.426674, 129.38994]
2025-08-07 04:24:18,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 37.0, 26.0, 28.0, 25.0, 21.0, 31.0, 20.0, 18.0, 25.0]
2025-08-07 04:24:18,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 33 minutes, 26 seconds)
2025-08-07 04:26:17,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:26:17,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 123.82932 ± 24.631
2025-08-07 04:26:17,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [101.11879, 143.02226, 102.59867, 102.452576, 108.025475, 137.22594, 149.26427, 175.4286, 101.50674, 117.64986]
2025-08-07 04:26:17,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 28.0, 20.0, 20.0, 21.0, 27.0, 31.0, 34.0, 20.0, 23.0]
2025-08-07 04:26:17,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 31 minutes, 13 seconds)
2025-08-07 04:28:16,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:28:17,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 158.41685 ± 57.157
2025-08-07 04:28:17,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [83.96058, 182.70946, 304.0981, 141.3012, 147.44101, 159.25826, 152.86629, 89.91575, 152.4288, 170.18904]
2025-08-07 04:28:17,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [17.0, 36.0, 58.0, 27.0, 29.0, 31.0, 30.0, 18.0, 32.0, 33.0]
2025-08-07 04:28:17,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 29 minutes, 17 seconds)
2025-08-07 04:30:16,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:30:16,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 144.61606 ± 31.526
2025-08-07 04:30:16,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [148.3605, 155.1862, 130.49403, 170.22873, 207.56808, 148.72356, 135.21338, 102.360054, 90.43757, 157.58833]
2025-08-07 04:30:16,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 30.0, 26.0, 34.0, 40.0, 29.0, 26.0, 20.0, 18.0, 31.0]
2025-08-07 04:30:16,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 27 minutes, 14 seconds)
2025-08-07 04:32:16,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:32:17,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 141.19505 ± 62.823
2025-08-07 04:32:17,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [310.42123, 177.10837, 95.533394, 84.52545, 102.261826, 145.44061, 141.51178, 107.642136, 145.83813, 101.66758]
2025-08-07 04:32:17,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [60.0, 35.0, 19.0, 17.0, 20.0, 28.0, 27.0, 21.0, 28.0, 20.0]
2025-08-07 04:32:17,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 25 minutes, 27 seconds)
2025-08-07 04:34:17,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:34:18,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 170.34088 ± 44.755
2025-08-07 04:34:18,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [100.827965, 119.39153, 171.98604, 176.2816, 181.1781, 171.48979, 161.52904, 280.77692, 179.24211, 160.70586]
2025-08-07 04:34:18,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 23.0, 34.0, 36.0, 35.0, 34.0, 32.0, 56.0, 36.0, 31.0]
2025-08-07 04:34:18,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 24 minutes, 4 seconds)
2025-08-07 04:36:19,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:36:19,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 113.98502 ± 13.765
2025-08-07 04:36:19,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [101.875465, 124.02028, 122.19163, 102.08474, 138.1499, 114.65772, 123.625496, 120.78892, 89.73544, 102.72062]
2025-08-07 04:36:19,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 24.0, 24.0, 20.0, 27.0, 22.0, 24.0, 23.0, 18.0, 20.0]
2025-08-07 04:36:19,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 22 minutes, 29 seconds)
2025-08-07 04:38:20,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:38:21,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 135.68474 ± 54.355
2025-08-07 04:38:21,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [112.47526, 102.961, 101.25051, 279.9168, 139.06961, 171.65314, 103.64879, 95.229935, 95.62107, 155.02122]
2025-08-07 04:38:21,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 20.0, 20.0, 53.0, 28.0, 36.0, 20.0, 19.0, 19.0, 30.0]
2025-08-07 04:38:21,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 20 minutes, 55 seconds)
2025-08-07 04:40:22,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:40:22,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 148.93373 ± 29.041
2025-08-07 04:40:22,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [142.33835, 135.41786, 111.1713, 170.47124, 90.02487, 187.25565, 147.52396, 177.75845, 171.13025, 156.24539]
2025-08-07 04:40:22,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 26.0, 22.0, 34.0, 18.0, 36.0, 28.0, 35.0, 33.0, 31.0]
2025-08-07 04:40:22,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 19 minutes, 24 seconds)
2025-08-07 04:42:23,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:42:24,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 162.67761 ± 78.947
2025-08-07 04:42:24,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [107.12339, 251.30283, 162.28036, 125.3327, 107.86115, 101.84711, 120.91148, 147.63573, 364.44122, 138.04007]
2025-08-07 04:42:24,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 50.0, 31.0, 24.0, 21.0, 20.0, 24.0, 29.0, 71.0, 27.0]
2025-08-07 04:42:24,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 17 minutes, 33 seconds)
2025-08-07 04:44:24,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:44:25,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 138.81917 ± 27.814
2025-08-07 04:44:25,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [118.62914, 111.999886, 128.07153, 167.1934, 127.29802, 162.997, 179.9256, 100.77634, 116.14369, 175.15709]
2025-08-07 04:44:25,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 22.0, 25.0, 33.0, 25.0, 31.0, 35.0, 20.0, 23.0, 34.0]
2025-08-07 04:44:25,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 15 minutes, 33 seconds)
2025-08-07 04:46:26,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:46:26,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 144.27487 ± 30.938
2025-08-07 04:46:26,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [162.67876, 152.41603, 96.07639, 170.95396, 117.600174, 142.08063, 102.06172, 180.62148, 189.71558, 128.54407]
2025-08-07 04:46:26,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 29.0, 19.0, 35.0, 23.0, 28.0, 20.0, 36.0, 37.0, 25.0]
2025-08-07 04:46:26,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 13 minutes, 31 seconds)
2025-08-07 04:48:27,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:48:28,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 138.63513 ± 52.346
2025-08-07 04:48:28,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [109.056984, 102.60117, 278.12515, 150.1479, 164.32721, 124.0991, 89.75027, 110.88435, 155.21309, 102.146034]
2025-08-07 04:48:28,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 20.0, 54.0, 30.0, 32.0, 24.0, 18.0, 22.0, 31.0, 20.0]
2025-08-07 04:48:28,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 11 minutes, 28 seconds)
2025-08-07 04:50:28,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:50:28,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 130.93327 ± 30.768
2025-08-07 04:50:28,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [96.19264, 170.83269, 113.46101, 191.2717, 134.48802, 153.0764, 89.407715, 117.28314, 113.41512, 129.90431]
2025-08-07 04:50:28,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 33.0, 22.0, 37.0, 26.0, 30.0, 18.0, 23.0, 22.0, 25.0]
2025-08-07 04:50:28,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 9 minutes, 18 seconds)
2025-08-07 04:52:29,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:52:30,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 105.96492 ± 11.777
2025-08-07 04:52:30,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [89.36554, 101.95244, 112.492744, 100.98192, 107.93054, 129.50142, 89.008446, 107.757454, 118.73577, 101.922905]
2025-08-07 04:52:30,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 20.0, 22.0, 20.0, 21.0, 25.0, 18.0, 21.0, 23.0, 20.0]
2025-08-07 04:52:30,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 7 minutes, 17 seconds)
2025-08-07 04:54:30,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:54:31,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 111.51073 ± 18.810
2025-08-07 04:54:31,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [106.23114, 95.93894, 89.18008, 103.00249, 140.27086, 151.35121, 105.49417, 102.71507, 119.58444, 101.33887]
2025-08-07 04:54:31,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 19.0, 18.0, 20.0, 27.0, 29.0, 21.0, 20.0, 23.0, 20.0]
2025-08-07 04:54:31,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 5 minutes, 11 seconds)
2025-08-07 04:56:31,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:56:32,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 162.19479 ± 79.309
2025-08-07 04:56:32,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [340.9387, 132.68611, 129.66786, 155.50449, 112.26843, 150.5191, 108.01972, 102.25237, 101.074265, 289.01688]
2025-08-07 04:56:32,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 27.0, 26.0, 30.0, 22.0, 29.0, 21.0, 20.0, 20.0, 57.0]
2025-08-07 04:56:32,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 3 minutes, 8 seconds)
2025-08-07 04:58:33,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:58:33,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 141.29581 ± 34.050
2025-08-07 04:58:33,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [178.05072, 155.22878, 147.04103, 121.32855, 113.50261, 134.18443, 107.91565, 96.900826, 141.95596, 216.84962]
2025-08-07 04:58:33,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 31.0, 28.0, 23.0, 22.0, 26.0, 21.0, 19.0, 28.0, 43.0]
2025-08-07 04:58:33,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 1 minute, 8 seconds)
2025-08-07 05:00:34,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:00:35,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 150.40292 ± 31.776
2025-08-07 05:00:35,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [189.35197, 160.80524, 90.002686, 123.68685, 125.33477, 142.55078, 204.90764, 146.1724, 150.04341, 171.1735]
2025-08-07 05:00:35,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [38.0, 32.0, 18.0, 24.0, 25.0, 28.0, 39.0, 29.0, 30.0, 33.0]
2025-08-07 05:00:35,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 59 minutes, 16 seconds)
2025-08-07 05:02:36,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:02:36,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 126.30542 ± 27.227
2025-08-07 05:02:36,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [161.63287, 136.81357, 107.219376, 142.45834, 165.07443, 96.20206, 90.54356, 108.2162, 100.882576, 154.01126]
2025-08-07 05:02:36,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 27.0, 21.0, 28.0, 33.0, 19.0, 18.0, 21.0, 20.0, 32.0]
2025-08-07 05:02:36,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 57 minutes, 15 seconds)
2025-08-07 05:04:37,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:04:38,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 162.22488 ± 65.319
2025-08-07 05:04:38,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [194.61258, 328.66937, 146.33354, 106.44196, 215.33994, 137.15102, 95.51972, 137.84569, 130.29494, 130.04008]
2025-08-07 05:04:38,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [37.0, 73.0, 29.0, 21.0, 41.0, 27.0, 19.0, 27.0, 25.0, 25.0]
2025-08-07 05:04:38,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 55 minutes, 18 seconds)
2025-08-07 05:06:39,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:06:39,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 147.00792 ± 55.652
2025-08-07 05:06:39,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [162.3153, 296.8124, 133.82715, 152.38937, 95.23453, 168.69328, 118.75453, 129.16603, 122.79441, 90.09219]
2025-08-07 05:06:39,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 60.0, 26.0, 30.0, 19.0, 33.0, 23.0, 25.0, 24.0, 18.0]
2025-08-07 05:06:39,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 53 minutes, 24 seconds)
2025-08-07 05:08:40,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:08:41,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 161.63966 ± 67.376
2025-08-07 05:08:41,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [176.97046, 193.83566, 135.36714, 100.10574, 144.95491, 114.43104, 122.18387, 347.76022, 144.85437, 135.93306]
2025-08-07 05:08:41,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 37.0, 26.0, 20.0, 28.0, 23.0, 24.0, 68.0, 28.0, 26.0]
2025-08-07 05:08:41,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 51 minutes, 22 seconds)
2025-08-07 05:10:41,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:10:42,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 161.69359 ± 57.931
2025-08-07 05:10:42,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [302.5926, 108.99574, 240.29272, 127.97269, 146.92278, 140.55417, 154.07161, 143.89543, 124.79648, 126.84171]
2025-08-07 05:10:42,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [61.0, 21.0, 47.0, 25.0, 28.0, 27.0, 30.0, 28.0, 24.0, 25.0]
2025-08-07 05:10:42,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 49 minutes, 15 seconds)
2025-08-07 05:12:43,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:12:43,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 131.22842 ± 64.782
2025-08-07 05:12:43,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [89.10094, 158.55919, 107.63934, 141.69853, 94.92121, 89.0326, 115.661644, 100.22362, 101.013756, 314.43335]
2025-08-07 05:12:43,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 31.0, 21.0, 27.0, 19.0, 18.0, 23.0, 20.0, 20.0, 64.0]
2025-08-07 05:12:43,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 47 minutes, 13 seconds)
2025-08-07 05:14:44,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:14:45,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 176.66273 ± 103.047
2025-08-07 05:14:45,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [131.86, 143.65492, 116.42716, 95.85042, 90.44542, 288.3107, 141.6905, 443.17337, 152.36736, 162.84749]
2025-08-07 05:14:45,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 28.0, 23.0, 19.0, 18.0, 56.0, 27.0, 93.0, 31.0, 32.0]
2025-08-07 05:14:45,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 45 minutes, 13 seconds)
2025-08-07 05:16:46,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:16:46,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 120.48170 ± 19.510
2025-08-07 05:16:46,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [132.9432, 139.91649, 103.05633, 104.76559, 89.367714, 96.9075, 140.37471, 147.71729, 121.53147, 128.23682]
2025-08-07 05:16:46,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 27.0, 20.0, 21.0, 18.0, 19.0, 27.0, 29.0, 24.0, 25.0]
2025-08-07 05:16:46,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 43 minutes, 10 seconds)
2025-08-07 05:18:47,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:18:47,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 130.88582 ± 26.187
2025-08-07 05:18:47,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [129.14117, 140.5506, 150.78416, 107.52555, 107.5163, 169.84773, 95.62898, 95.67047, 155.47241, 156.72081]
2025-08-07 05:18:47,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 27.0, 30.0, 21.0, 21.0, 34.0, 19.0, 19.0, 30.0, 30.0]
2025-08-07 05:18:47,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 41 minutes, 2 seconds)
2025-08-07 05:20:48,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:20:48,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 122.30650 ± 25.472
2025-08-07 05:20:48,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [148.5806, 124.66847, 96.510185, 173.45035, 95.24449, 118.73463, 103.54909, 100.23875, 111.366356, 150.72217]
2025-08-07 05:20:48,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 24.0, 19.0, 35.0, 19.0, 23.0, 20.0, 20.0, 22.0, 30.0]
2025-08-07 05:20:48,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 39 minutes)
2025-08-07 05:22:49,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:22:49,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 145.21605 ± 59.232
2025-08-07 05:22:49,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [89.18199, 142.00491, 177.31343, 171.7513, 157.53647, 297.14798, 102.240715, 94.42152, 117.233574, 103.32863]
2025-08-07 05:22:49,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 28.0, 34.0, 33.0, 32.0, 66.0, 20.0, 19.0, 23.0, 20.0]
2025-08-07 05:22:49,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 36 minutes, 58 seconds)
2025-08-07 05:24:50,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:24:50,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 153.56102 ± 72.729
2025-08-07 05:24:50,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [101.94966, 94.9799, 149.36375, 124.19732, 113.23395, 273.17947, 95.33447, 309.412, 107.35153, 166.60818]
2025-08-07 05:24:50,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 19.0, 29.0, 24.0, 22.0, 53.0, 19.0, 61.0, 21.0, 34.0]
2025-08-07 05:24:50,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 34 minutes, 52 seconds)
2025-08-07 05:26:51,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:26:52,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 170.63771 ± 77.214
2025-08-07 05:26:52,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [107.3042, 391.88718, 176.92245, 155.20337, 167.91295, 140.36671, 119.8808, 178.46686, 125.126945, 143.30559]
2025-08-07 05:26:52,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 78.0, 34.0, 30.0, 32.0, 27.0, 23.0, 35.0, 24.0, 29.0]
2025-08-07 05:26:52,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 32 minutes, 51 seconds)
2025-08-07 05:28:53,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:28:53,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 134.32755 ± 51.704
2025-08-07 05:28:53,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [100.205025, 131.0077, 90.17779, 271.4187, 144.89316, 121.535484, 167.49002, 130.52478, 95.36726, 90.65544]
2025-08-07 05:28:53,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 25.0, 18.0, 53.0, 28.0, 24.0, 32.0, 25.0, 19.0, 18.0]
2025-08-07 05:28:53,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 30 minutes, 56 seconds)
2025-08-07 05:30:54,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:30:55,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 147.01309 ± 71.497
2025-08-07 05:30:55,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [354.79025, 89.443665, 147.73898, 111.95683, 130.3822, 149.90381, 121.49638, 130.81235, 130.58696, 103.019646]
2025-08-07 05:30:55,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [69.0, 18.0, 29.0, 22.0, 26.0, 29.0, 24.0, 25.0, 26.0, 20.0]
2025-08-07 05:30:55,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 28 minutes, 59 seconds)
2025-08-07 05:32:55,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:32:56,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 136.92842 ± 56.375
2025-08-07 05:32:56,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [146.2894, 203.75255, 274.66446, 111.55801, 113.28472, 126.036644, 89.24097, 125.09859, 84.33204, 95.02681]
2025-08-07 05:32:56,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 39.0, 58.0, 22.0, 22.0, 24.0, 18.0, 24.0, 17.0, 19.0]
2025-08-07 05:32:56,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 26 minutes, 58 seconds)
2025-08-07 05:34:57,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:34:57,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 136.12617 ± 30.790
2025-08-07 05:34:57,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [182.16916, 166.22856, 102.467415, 157.41826, 112.84405, 95.47045, 134.99377, 108.610245, 123.15028, 177.9095]
2025-08-07 05:34:57,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 32.0, 20.0, 31.0, 22.0, 19.0, 26.0, 21.0, 24.0, 34.0]
2025-08-07 05:34:57,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 25 minutes)
2025-08-07 05:36:58,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:36:59,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 152.11145 ± 50.394
2025-08-07 05:36:59,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [107.00067, 113.427864, 118.28547, 141.27603, 289.09476, 143.32683, 145.97398, 145.19238, 130.60513, 186.93134]
2025-08-07 05:36:59,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 22.0, 23.0, 28.0, 59.0, 28.0, 28.0, 28.0, 25.0, 36.0]
2025-08-07 05:36:59,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 22 minutes, 59 seconds)
2025-08-07 05:39:00,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:39:01,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 120.97459 ± 18.631
2025-08-07 05:39:01,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [115.149414, 108.394424, 161.44681, 105.37351, 89.93995, 124.22985, 118.90318, 119.76204, 140.87044, 125.676445]
2025-08-07 05:39:01,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 21.0, 32.0, 21.0, 18.0, 24.0, 23.0, 23.0, 27.0, 25.0]
2025-08-07 05:39:01,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 20 minutes, 58 seconds)
2025-08-07 05:41:01,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:41:02,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 173.93890 ± 98.845
2025-08-07 05:41:02,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [162.54623, 131.2217, 112.26475, 151.41495, 102.18636, 125.130684, 318.9777, 125.27962, 408.97174, 101.39537]
2025-08-07 05:41:02,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 25.0, 22.0, 29.0, 20.0, 24.0, 64.0, 24.0, 80.0, 20.0]
2025-08-07 05:41:02,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 18 minutes, 53 seconds)
2025-08-07 05:43:03,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:43:03,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 126.45884 ± 28.366
2025-08-07 05:43:03,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [154.64287, 166.37589, 171.75809, 96.27679, 106.94256, 101.57242, 96.0221, 102.2719, 128.92395, 139.8019]
2025-08-07 05:43:03,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 33.0, 34.0, 19.0, 21.0, 20.0, 19.0, 20.0, 25.0, 28.0]
2025-08-07 05:43:03,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 16 minutes, 56 seconds)
2025-08-07 05:45:04,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:45:05,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 128.54034 ± 25.228
2025-08-07 05:45:05,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [99.78795, 113.211624, 161.90631, 145.26675, 181.83246, 110.40523, 107.61766, 112.159424, 130.29388, 122.92228]
2025-08-07 05:45:05,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 22.0, 31.0, 28.0, 35.0, 22.0, 21.0, 22.0, 25.0, 24.0]
2025-08-07 05:45:05,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 14 minutes, 53 seconds)
2025-08-07 05:47:06,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:47:06,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 125.70260 ± 23.392
2025-08-07 05:47:06,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [112.40114, 163.29419, 112.94065, 95.48943, 161.06177, 128.24197, 140.52206, 139.43797, 101.76032, 101.876465]
2025-08-07 05:47:06,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 32.0, 22.0, 19.0, 32.0, 26.0, 28.0, 27.0, 20.0, 20.0]
2025-08-07 05:47:06,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 12 minutes, 53 seconds)
2025-08-07 05:49:07,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:49:07,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 152.55380 ± 80.354
2025-08-07 05:49:07,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [119.20522, 89.740616, 159.78413, 106.172615, 102.4806, 154.01184, 188.63673, 376.75284, 101.822716, 126.930565]
2025-08-07 05:49:07,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 18.0, 31.0, 21.0, 20.0, 30.0, 37.0, 81.0, 20.0, 25.0]
2025-08-07 05:49:07,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 10 minutes, 48 seconds)
2025-08-07 05:51:08,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:51:08,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 114.52336 ± 17.709
2025-08-07 05:51:08,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [123.496544, 96.18908, 143.76163, 101.74368, 84.25328, 128.71574, 96.24413, 124.08071, 124.20967, 122.539215]
2025-08-07 05:51:08,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 19.0, 28.0, 20.0, 17.0, 25.0, 19.0, 25.0, 24.0, 24.0]
2025-08-07 05:51:08,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 8 minutes, 45 seconds)
2025-08-07 05:53:09,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:53:10,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 138.53423 ± 33.860
2025-08-07 05:53:10,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [111.91229, 154.88742, 174.92908, 95.40352, 89.41523, 166.22441, 141.33052, 102.30634, 163.92537, 185.00803]
2025-08-07 05:53:10,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 30.0, 34.0, 19.0, 18.0, 33.0, 27.0, 20.0, 32.0, 36.0]
2025-08-07 05:53:10,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 6 minutes, 41 seconds)
2025-08-07 05:55:10,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:55:11,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 130.35030 ± 27.559
2025-08-07 05:55:11,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [95.95609, 108.36069, 164.2406, 97.09861, 183.35713, 133.07455, 119.04603, 137.42142, 150.77948, 114.16818]
2025-08-07 05:55:11,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 21.0, 33.0, 19.0, 36.0, 26.0, 23.0, 27.0, 29.0, 22.0]
2025-08-07 05:55:11,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 4 minutes, 37 seconds)
2025-08-07 05:57:11,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:57:12,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 125.67039 ± 21.534
2025-08-07 05:57:12,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [135.76782, 138.1424, 157.25632, 135.42085, 96.13326, 106.25516, 122.695206, 97.468185, 157.25122, 110.31365]
2025-08-07 05:57:12,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 27.0, 31.0, 26.0, 19.0, 21.0, 24.0, 19.0, 32.0, 22.0]
2025-08-07 05:57:12,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 2 minutes, 32 seconds)
2025-08-07 05:59:12,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:59:13,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 111.37543 ± 22.794
2025-08-07 05:59:13,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [94.96094, 107.78195, 95.88714, 106.05712, 123.48792, 97.00087, 84.12352, 163.25177, 102.058876, 139.14418]
2025-08-07 05:59:13,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 21.0, 19.0, 21.0, 24.0, 19.0, 17.0, 32.0, 20.0, 27.0]
2025-08-07 05:59:13,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 31 seconds)
2025-08-07 06:01:14,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:01:14,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 118.05886 ± 22.774
2025-08-07 06:01:14,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [164.89165, 124.92147, 105.916145, 116.17328, 90.04459, 101.99863, 95.82388, 144.94038, 101.57667, 134.3019]
2025-08-07 06:01:14,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 24.0, 21.0, 23.0, 18.0, 20.0, 19.0, 28.0, 20.0, 26.0]
2025-08-07 06:01:14,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 58 minutes, 34 seconds)
2025-08-07 06:03:15,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:03:15,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 119.86442 ± 22.727
2025-08-07 06:03:15,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [121.89308, 104.25157, 119.9102, 122.51161, 117.504616, 177.96109, 90.765465, 128.25342, 120.47084, 95.12214]
2025-08-07 06:03:15,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 21.0, 24.0, 24.0, 23.0, 35.0, 18.0, 25.0, 23.0, 19.0]
2025-08-07 06:03:15,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 56 minutes, 31 seconds)
2025-08-07 06:05:16,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:05:17,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 129.97281 ± 22.704
2025-08-07 06:05:17,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [153.83258, 129.99335, 114.39897, 166.44107, 153.28996, 121.017204, 102.54077, 134.88757, 133.36615, 89.9604]
2025-08-07 06:05:17,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 25.0, 22.0, 32.0, 30.0, 23.0, 20.0, 26.0, 26.0, 18.0]
2025-08-07 06:05:17,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 54 minutes, 33 seconds)
2025-08-07 06:07:17,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:07:18,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 119.84013 ± 26.636
2025-08-07 06:07:18,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [102.14449, 159.2328, 107.795334, 138.4763, 95.018326, 84.486176, 134.15048, 107.946365, 166.90727, 102.24382]
2025-08-07 06:07:18,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 31.0, 21.0, 27.0, 19.0, 17.0, 26.0, 21.0, 33.0, 20.0]
2025-08-07 06:07:18,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 52 minutes, 31 seconds)
2025-08-07 06:09:18,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:09:18,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 144.68095 ± 24.452
2025-08-07 06:09:18,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [194.28076, 128.40604, 161.63602, 145.10275, 134.79256, 150.04018, 101.76375, 169.34341, 130.76083, 130.68326]
2025-08-07 06:09:18,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [37.0, 25.0, 31.0, 29.0, 26.0, 29.0, 20.0, 33.0, 25.0, 25.0]
2025-08-07 06:09:18,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 50 minutes, 27 seconds)
2025-08-07 06:11:17,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:11:18,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 165.01657 ± 82.425
2025-08-07 06:11:18,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [395.82043, 95.90928, 122.34469, 95.888535, 166.7361, 167.33923, 135.9591, 121.67703, 160.25087, 188.2406]
2025-08-07 06:11:18,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 19.0, 24.0, 19.0, 33.0, 32.0, 26.0, 24.0, 32.0, 36.0]
2025-08-07 06:11:18,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 48 minutes, 17 seconds)
2025-08-07 06:13:17,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:13:17,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 131.33450 ± 21.094
2025-08-07 06:13:17,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [112.091156, 103.57943, 101.727974, 150.58064, 124.08697, 170.97054, 128.80238, 147.82318, 143.89311, 129.78973]
2025-08-07 06:13:17,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 21.0, 20.0, 29.0, 24.0, 33.0, 25.0, 29.0, 28.0, 25.0]
2025-08-07 06:13:17,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 46 minutes, 9 seconds)
2025-08-07 06:15:17,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:15:17,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 173.63374 ± 91.906
2025-08-07 06:15:17,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [102.02652, 154.07758, 406.30304, 155.02563, 89.563644, 186.53897, 123.511086, 272.05563, 128.09277, 119.1426]
2025-08-07 06:15:17,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 30.0, 78.0, 32.0, 18.0, 36.0, 24.0, 54.0, 25.0, 23.0]
2025-08-07 06:15:17,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 44 minutes, 2 seconds)
2025-08-07 06:17:16,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:17:17,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 159.77866 ± 69.014
2025-08-07 06:17:17,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [187.67633, 112.25289, 176.907, 114.49275, 344.06747, 129.90858, 127.02754, 112.750725, 189.64824, 103.05509]
2025-08-07 06:17:17,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [38.0, 22.0, 33.0, 22.0, 72.0, 25.0, 25.0, 22.0, 37.0, 20.0]
2025-08-07 06:17:17,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 41 minutes, 56 seconds)
2025-08-07 06:19:16,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:19:16,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 130.76578 ± 28.963
2025-08-07 06:19:16,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [84.23787, 161.45508, 156.69543, 108.34529, 111.83238, 119.53304, 170.83615, 158.83997, 139.14764, 96.73483]
2025-08-07 06:19:16,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [17.0, 32.0, 31.0, 21.0, 22.0, 23.0, 33.0, 32.0, 29.0, 19.0]
2025-08-07 06:19:16,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 39 minutes, 53 seconds)
2025-08-07 06:21:15,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:21:15,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 187.16641 ± 107.986
2025-08-07 06:21:15,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [90.10427, 428.9818, 221.59383, 120.6306, 128.92543, 113.38652, 161.00009, 126.3836, 130.25021, 350.40784]
2025-08-07 06:21:15,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 83.0, 41.0, 23.0, 25.0, 22.0, 31.0, 25.0, 26.0, 77.0]
2025-08-07 06:21:15,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 37 minutes, 51 seconds)
2025-08-07 06:23:15,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:23:15,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 129.00206 ± 26.289
2025-08-07 06:23:15,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [138.16681, 102.72022, 172.612, 165.96922, 126.267494, 90.172676, 96.394135, 129.93855, 122.91675, 144.86275]
2025-08-07 06:23:15,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 20.0, 34.0, 32.0, 25.0, 18.0, 19.0, 25.0, 24.0, 28.0]
2025-08-07 06:23:15,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 35 minutes, 51 seconds)
2025-08-07 06:25:15,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:25:15,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 131.81873 ± 29.197
2025-08-07 06:25:15,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [90.17866, 166.97664, 165.50378, 133.85959, 131.16518, 99.42205, 160.49078, 113.34729, 161.65263, 95.59072]
2025-08-07 06:25:15,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 32.0, 33.0, 26.0, 25.0, 20.0, 31.0, 22.0, 31.0, 19.0]
2025-08-07 06:25:15,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 33 minutes, 52 seconds)
2025-08-07 06:27:14,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:27:15,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 133.61951 ± 29.272
2025-08-07 06:27:15,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [89.19847, 147.0278, 143.07982, 122.30268, 145.0512, 106.154945, 164.83678, 89.37434, 179.80438, 149.3647]
2025-08-07 06:27:15,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 28.0, 28.0, 24.0, 28.0, 21.0, 34.0, 18.0, 36.0, 29.0]
2025-08-07 06:27:15,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 31 minutes, 52 seconds)
2025-08-07 06:29:14,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:29:14,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 123.42464 ± 19.303
2025-08-07 06:29:14,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [113.80788, 127.264465, 110.62903, 119.77582, 105.92629, 108.43792, 164.59738, 153.59352, 105.96278, 124.2512]
2025-08-07 06:29:14,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 25.0, 22.0, 23.0, 21.0, 21.0, 32.0, 30.0, 21.0, 24.0]
2025-08-07 06:29:14,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 29 minutes, 53 seconds)
2025-08-07 06:31:14,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:31:14,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 160.28735 ± 45.231
2025-08-07 06:31:14,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [107.45255, 109.4387, 145.68431, 209.28276, 143.80818, 221.4571, 162.34077, 140.55583, 241.28983, 121.5635]
2025-08-07 06:31:14,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 21.0, 28.0, 41.0, 29.0, 43.0, 32.0, 27.0, 52.0, 24.0]
2025-08-07 06:31:14,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 27 minutes, 56 seconds)
2025-08-07 06:33:13,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:33:14,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 133.04572 ± 50.014
2025-08-07 06:33:14,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [191.82141, 90.36033, 256.1877, 115.95096, 106.198166, 148.72725, 107.903465, 95.38206, 101.76171, 116.1641]
2025-08-07 06:33:14,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [38.0, 18.0, 50.0, 23.0, 21.0, 29.0, 21.0, 19.0, 20.0, 23.0]
2025-08-07 06:33:14,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 25 minutes, 56 seconds)
2025-08-07 06:35:13,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:35:13,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 156.25211 ± 74.770
2025-08-07 06:35:13,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [148.28185, 155.027, 118.201225, 139.4221, 143.86151, 111.9457, 113.57691, 371.99362, 163.7227, 96.4887]
2025-08-07 06:35:13,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 31.0, 23.0, 27.0, 28.0, 23.0, 22.0, 78.0, 32.0, 19.0]
2025-08-07 06:35:13,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 23 minutes, 56 seconds)
2025-08-07 06:37:13,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:37:13,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 128.58064 ± 23.282
2025-08-07 06:37:13,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [141.76642, 118.83752, 166.88814, 134.4087, 96.0769, 96.30018, 153.43394, 145.41046, 129.59459, 103.08954]
2025-08-07 06:37:13,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 23.0, 32.0, 26.0, 19.0, 19.0, 30.0, 28.0, 25.0, 20.0]
2025-08-07 06:37:13,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 21 minutes, 56 seconds)
2025-08-07 06:39:12,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:39:12,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 152.11955 ± 69.954
2025-08-07 06:39:12,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [122.82289, 130.43277, 132.66014, 349.36633, 182.11755, 160.06137, 95.699875, 122.93435, 122.36096, 102.73934]
2025-08-07 06:39:12,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 25.0, 26.0, 72.0, 35.0, 31.0, 19.0, 24.0, 24.0, 20.0]
2025-08-07 06:39:12,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 19 minutes, 56 seconds)
2025-08-07 06:41:11,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:41:12,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 147.44281 ± 39.836
2025-08-07 06:41:12,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [227.81729, 101.77668, 172.07965, 122.28569, 195.00063, 149.05522, 107.66047, 123.953224, 110.16975, 164.62944]
2025-08-07 06:41:12,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [47.0, 20.0, 34.0, 24.0, 39.0, 30.0, 21.0, 24.0, 22.0, 31.0]
2025-08-07 06:41:12,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 17 minutes, 55 seconds)
2025-08-07 06:43:11,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:43:12,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 164.03812 ± 90.596
2025-08-07 06:43:12,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [162.5816, 96.52629, 213.55128, 96.93432, 155.17093, 413.41675, 96.00582, 108.45618, 156.4454, 141.29247]
2025-08-07 06:43:12,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 19.0, 42.0, 19.0, 30.0, 82.0, 19.0, 22.0, 30.0, 27.0]
2025-08-07 06:43:12,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 15 minutes, 56 seconds)
2025-08-07 06:45:11,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:45:11,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 142.21568 ± 26.457
2025-08-07 06:45:11,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [130.62738, 128.31053, 158.85608, 84.37655, 130.69058, 160.86938, 186.59575, 128.65154, 155.57568, 157.60344]
2025-08-07 06:45:11,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 25.0, 31.0, 17.0, 25.0, 31.0, 36.0, 25.0, 30.0, 31.0]
2025-08-07 06:45:11,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 13 minutes, 56 seconds)
2025-08-07 06:47:10,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:47:11,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 156.75601 ± 95.636
2025-08-07 06:47:11,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [435.92566, 124.209114, 142.97542, 145.72635, 108.94705, 146.3787, 163.6158, 102.39915, 95.66168, 101.7213]
2025-08-07 06:47:11,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 25.0, 29.0, 28.0, 21.0, 28.0, 32.0, 20.0, 19.0, 20.0]
2025-08-07 06:47:11,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 11 minutes, 57 seconds)
2025-08-07 06:49:10,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:49:11,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 184.97937 ± 92.839
2025-08-07 06:49:11,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [418.7685, 287.13556, 135.14818, 182.05264, 118.66006, 123.19889, 171.78546, 138.21956, 95.5492, 179.27571]
2025-08-07 06:49:11,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 51.0, 26.0, 36.0, 23.0, 24.0, 33.0, 27.0, 19.0, 34.0]
2025-08-07 06:49:11,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes, 58 seconds)
2025-08-07 06:51:10,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:51:10,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 168.55069 ± 127.019
2025-08-07 06:51:10,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [155.28893, 96.90052, 96.479805, 125.363014, 142.29692, 106.933014, 157.49889, 160.6358, 101.63402, 542.47595]
2025-08-07 06:51:10,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 19.0, 19.0, 24.0, 27.0, 21.0, 33.0, 31.0, 20.0, 102.0]
2025-08-07 06:51:10,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 58 seconds)
2025-08-07 06:53:09,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:53:09,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 136.20462 ± 33.963
2025-08-07 06:53:09,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [100.37326, 89.37023, 112.043106, 179.85043, 133.34004, 123.29113, 118.14203, 145.95596, 155.71075, 203.96933]
2025-08-07 06:53:09,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 18.0, 22.0, 35.0, 26.0, 24.0, 23.0, 29.0, 30.0, 39.0]
2025-08-07 06:53:09,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 58 seconds)
2025-08-07 06:55:09,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:55:10,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 196.37094 ± 103.715
2025-08-07 06:55:10,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [129.39377, 113.27208, 115.443665, 113.28256, 143.57816, 177.7769, 118.51398, 315.4543, 361.68945, 375.30447]
2025-08-07 06:55:10,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 22.0, 23.0, 22.0, 28.0, 35.0, 23.0, 61.0, 68.0, 71.0]
2025-08-07 06:55:10,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 59 seconds)
2025-08-07 06:57:09,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:57:09,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 173.08632 ± 83.334
2025-08-07 06:57:09,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [327.4843, 152.55542, 113.97014, 102.20361, 340.16306, 138.84752, 169.85875, 116.952, 161.12729, 107.701004]
2025-08-07 06:57:09,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [61.0, 30.0, 22.0, 20.0, 66.0, 27.0, 33.0, 23.0, 31.0, 21.0]
2025-08-07 06:57:09,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 59 seconds)
2025-08-07 06:59:08,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:59:09,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1221 [DEBUG]: Total Reward: 114.84413 ± 32.685
2025-08-07 06:59:09,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1222 [DEBUG]: All rewards: [89.30522, 108.3898, 169.64534, 183.57643, 117.41616, 113.58692, 84.25978, 96.513916, 95.778404, 89.96927]
2025-08-07 06:59:09,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 21.0, 33.0, 36.0, 23.0, 22.0, 17.0, 19.0, 19.0, 18.0]
2025-08-07 06:59:09,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-humanoid):1251 [DEBUG]: Training session finished
