2025-08-07 06:53:35,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc10-humanoid/ExtremeClogL1U23-bpql-mem24
2025-08-07 06:53:35,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc10-humanoid/ExtremeClogL1U23-bpql-mem24
2025-08-07 06:53:35,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x15309b447c10>}
2025-08-07 06:53:35,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1111 [DEBUG]: using device: cuda
2025-08-07 06:53:35,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1133 [INFO]: Creating new trainer
2025-08-07 06:53:35,318 baseline-bpql-noiseperc10-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=784, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-08-07 06:53:35,318 baseline-bpql-noiseperc10-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 06:53:37,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1194 [DEBUG]: Starting training session...
2025-08-07 06:53:37,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 1/100
2025-08-07 06:55:26,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:55:26,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 168.41083 ± 48.736
2025-08-07 06:55:26,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [140.63019, 255.7228, 137.47461, 195.13063, 119.04372, 135.54483, 119.91629, 224.99522, 130.71986, 224.93016]
2025-08-07 06:55:26,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 50.0, 27.0, 38.0, 23.0, 26.0, 23.0, 45.0, 25.0, 45.0]
2025-08-07 06:55:26,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1226 [INFO]: New best (168.41) for latency ExtremeClogL1U23
2025-08-07 06:55:26,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 47 seconds)
2025-08-07 06:57:23,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:57:24,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 260.78012 ± 111.453
2025-08-07 06:57:24,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [362.23642, 308.6627, 157.61462, 124.62098, 128.1939, 108.69253, 354.89505, 318.34302, 323.75446, 420.78772]
2025-08-07 06:57:24,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [70.0, 61.0, 30.0, 24.0, 25.0, 21.0, 67.0, 60.0, 63.0, 77.0]
2025-08-07 06:57:24,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1226 [INFO]: New best (260.78) for latency ExtremeClogL1U23
2025-08-07 06:57:24,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 5 minutes, 32 seconds)
2025-08-07 06:59:21,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:59:21,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 243.32828 ± 132.826
2025-08-07 06:59:21,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [380.34964, 149.57399, 124.09188, 372.77344, 495.80283, 341.962, 113.28603, 130.09906, 133.72787, 191.6159]
2025-08-07 06:59:21,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [71.0, 29.0, 24.0, 73.0, 96.0, 68.0, 22.0, 25.0, 26.0, 37.0]
2025-08-07 06:59:21,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 5 minutes, 34 seconds)
2025-08-07 07:01:18,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:01:19,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 203.75685 ± 95.302
2025-08-07 07:01:19,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [143.92455, 192.2631, 125.055504, 376.59042, 108.8564, 103.17608, 128.88414, 312.82465, 223.83296, 322.16055]
2025-08-07 07:01:19,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 37.0, 24.0, 73.0, 21.0, 20.0, 25.0, 63.0, 42.0, 61.0]
2025-08-07 07:01:19,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 4 minutes, 43 seconds)
2025-08-07 07:03:15,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:03:16,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 261.11862 ± 125.969
2025-08-07 07:03:16,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [356.05026, 130.53519, 151.81981, 389.27463, 269.3342, 227.07832, 329.38757, 119.840126, 129.57126, 508.295]
2025-08-07 07:03:16,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 25.0, 29.0, 76.0, 57.0, 43.0, 65.0, 23.0, 25.0, 99.0]
2025-08-07 07:03:16,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1226 [INFO]: New best (261.12) for latency ExtremeClogL1U23
2025-08-07 07:03:16,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 3 minutes, 26 seconds)
2025-08-07 07:05:13,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:05:14,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 207.11304 ± 98.419
2025-08-07 07:05:14,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [267.03806, 197.05322, 151.64478, 135.09183, 124.9893, 316.1291, 108.68232, 228.14485, 116.35895, 425.99814]
2025-08-07 07:05:14,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [53.0, 38.0, 29.0, 26.0, 24.0, 65.0, 21.0, 44.0, 23.0, 88.0]
2025-08-07 07:05:14,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 4 minutes, 3 seconds)
2025-08-07 07:07:10,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:07:10,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 184.00241 ± 65.035
2025-08-07 07:07:10,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [135.28949, 197.98282, 230.84242, 113.85046, 180.55983, 345.09134, 120.20965, 163.41745, 144.4398, 208.34071]
2025-08-07 07:07:10,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 38.0, 45.0, 22.0, 35.0, 71.0, 23.0, 32.0, 28.0, 41.0]
2025-08-07 07:07:10,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 1 minute, 39 seconds)
2025-08-07 07:09:06,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:09:06,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 197.30290 ± 87.606
2025-08-07 07:09:06,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [103.56854, 171.99353, 345.6109, 127.390144, 129.46123, 344.02896, 188.48015, 156.10628, 124.25301, 282.13635]
2025-08-07 07:09:06,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 33.0, 76.0, 25.0, 25.0, 77.0, 36.0, 31.0, 24.0, 55.0]
2025-08-07 07:09:06,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 59 minutes, 25 seconds)
2025-08-07 07:11:02,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:11:03,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 200.21657 ± 59.836
2025-08-07 07:11:03,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [191.91267, 259.39615, 211.01498, 160.22307, 199.3211, 114.20178, 119.054115, 221.53986, 326.99155, 198.51033]
2025-08-07 07:11:03,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [37.0, 51.0, 41.0, 31.0, 38.0, 22.0, 23.0, 43.0, 68.0, 38.0]
2025-08-07 07:11:03,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 57 minutes, 5 seconds)
2025-08-07 07:12:58,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:12:59,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 147.57855 ± 29.304
2025-08-07 07:12:59,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [130.46442, 225.69656, 129.43536, 130.27011, 139.783, 154.78606, 124.48031, 167.75296, 148.11626, 125.00047]
2025-08-07 07:12:59,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 44.0, 25.0, 25.0, 27.0, 30.0, 24.0, 32.0, 29.0, 24.0]
2025-08-07 07:12:59,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 54 minutes, 45 seconds)
2025-08-07 07:14:55,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:14:56,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 175.20224 ± 51.211
2025-08-07 07:14:56,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [103.26074, 190.26068, 239.21559, 171.4798, 176.59021, 277.8167, 118.28637, 134.94551, 144.90222, 195.2645]
2025-08-07 07:14:56,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 37.0, 48.0, 33.0, 34.0, 56.0, 23.0, 26.0, 28.0, 38.0]
2025-08-07 07:14:56,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 52 minutes, 32 seconds)
2025-08-07 07:16:51,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:16:51,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 147.44553 ± 29.509
2025-08-07 07:16:51,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [168.98616, 118.98931, 129.54228, 108.17301, 119.27795, 197.64578, 191.4007, 161.50246, 138.78966, 140.14801]
2025-08-07 07:16:51,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 23.0, 25.0, 21.0, 23.0, 39.0, 37.0, 31.0, 27.0, 27.0]
2025-08-07 07:16:51,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 50 minutes, 26 seconds)
2025-08-07 07:18:47,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:18:47,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 210.59885 ± 113.373
2025-08-07 07:18:47,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [129.17471, 169.10875, 133.34305, 411.34833, 237.39632, 140.41808, 145.32536, 446.75732, 142.29886, 150.8178]
2025-08-07 07:18:47,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 33.0, 26.0, 82.0, 48.0, 27.0, 28.0, 97.0, 28.0, 29.0]
2025-08-07 07:18:47,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 48 minutes, 28 seconds)
2025-08-07 07:20:43,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:20:43,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 155.84677 ± 47.889
2025-08-07 07:20:43,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [253.66655, 97.51121, 124.33496, 222.63794, 129.85687, 175.5646, 130.47719, 140.62503, 108.88845, 174.90504]
2025-08-07 07:20:43,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [52.0, 19.0, 24.0, 45.0, 25.0, 35.0, 25.0, 27.0, 21.0, 34.0]
2025-08-07 07:20:43,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 46 minutes, 28 seconds)
2025-08-07 07:22:39,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:22:39,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 213.61105 ± 89.304
2025-08-07 07:22:39,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [124.94706, 333.44406, 102.96247, 139.68993, 225.53746, 364.2552, 134.74956, 283.7698, 267.4383, 159.3169]
2025-08-07 07:22:39,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 69.0, 20.0, 27.0, 45.0, 79.0, 26.0, 57.0, 57.0, 31.0]
2025-08-07 07:22:39,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 44 minutes, 31 seconds)
2025-08-07 07:24:35,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:24:35,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 165.89932 ± 47.393
2025-08-07 07:24:35,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [150.4691, 231.8515, 174.3069, 108.446465, 153.3954, 148.41914, 273.30923, 139.71284, 124.916, 154.16649]
2025-08-07 07:24:35,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 45.0, 34.0, 21.0, 29.0, 29.0, 56.0, 27.0, 24.0, 30.0]
2025-08-07 07:24:35,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 42 minutes, 21 seconds)
2025-08-07 07:26:31,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:26:32,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 133.88300 ± 22.755
2025-08-07 07:26:32,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [108.661766, 125.10161, 122.73636, 114.46243, 119.61867, 171.03105, 134.4151, 173.02632, 155.44531, 114.33138]
2025-08-07 07:26:32,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 24.0, 24.0, 22.0, 23.0, 33.0, 26.0, 34.0, 30.0, 22.0]
2025-08-07 07:26:32,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 40 minutes, 34 seconds)
2025-08-07 07:28:27,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:28:27,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 140.61263 ± 21.977
2025-08-07 07:28:27,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [162.66896, 176.89389, 150.10187, 161.6124, 137.5248, 119.235054, 108.95611, 108.57294, 139.81725, 140.74297]
2025-08-07 07:28:27,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 35.0, 29.0, 31.0, 27.0, 23.0, 21.0, 21.0, 27.0, 27.0]
2025-08-07 07:28:27,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 38 minutes, 30 seconds)
2025-08-07 07:30:23,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:30:24,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 176.07693 ± 72.216
2025-08-07 07:30:24,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [193.22598, 234.13551, 125.49607, 145.55463, 138.92613, 129.86714, 108.05086, 152.11183, 167.06783, 366.3332]
2025-08-07 07:30:24,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [38.0, 48.0, 24.0, 28.0, 27.0, 25.0, 21.0, 29.0, 33.0, 82.0]
2025-08-07 07:30:24,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 36 minutes, 42 seconds)
2025-08-07 07:32:19,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:32:20,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 146.91785 ± 24.122
2025-08-07 07:32:20,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [128.77011, 168.03749, 145.11789, 114.02542, 174.60576, 118.72387, 155.48436, 183.92825, 119.450676, 161.03468]
2025-08-07 07:32:20,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 33.0, 28.0, 22.0, 34.0, 23.0, 30.0, 36.0, 23.0, 31.0]
2025-08-07 07:32:20,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 34 minutes, 43 seconds)
2025-08-07 07:34:14,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:34:15,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 195.66176 ± 63.354
2025-08-07 07:34:15,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [118.98973, 201.4433, 136.43921, 215.52232, 144.9034, 165.13118, 257.30356, 336.36792, 150.51726, 229.99971]
2025-08-07 07:34:15,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 40.0, 26.0, 42.0, 28.0, 32.0, 51.0, 64.0, 29.0, 45.0]
2025-08-07 07:34:15,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 32 minutes, 37 seconds)
2025-08-07 07:36:11,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:36:11,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 158.08411 ± 60.677
2025-08-07 07:36:11,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [114.30536, 119.91684, 143.65834, 326.99966, 139.19487, 119.06828, 156.58034, 196.5702, 130.19958, 134.3475]
2025-08-07 07:36:11,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 23.0, 28.0, 68.0, 27.0, 23.0, 30.0, 39.0, 25.0, 26.0]
2025-08-07 07:36:11,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 30 minutes, 40 seconds)
2025-08-07 07:38:06,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:38:07,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 143.68718 ± 35.375
2025-08-07 07:38:07,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [208.68993, 140.37134, 114.057495, 124.47642, 114.43534, 124.552475, 101.8125, 191.02173, 132.62878, 184.82585]
2025-08-07 07:38:07,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [42.0, 27.0, 22.0, 24.0, 22.0, 24.0, 20.0, 37.0, 26.0, 36.0]
2025-08-07 07:38:07,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 28 minutes, 46 seconds)
2025-08-07 07:40:03,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:40:03,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 150.58687 ± 30.469
2025-08-07 07:40:03,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [108.865814, 153.4129, 140.18608, 156.52579, 186.35284, 142.86592, 108.310356, 125.09279, 183.06953, 201.18652]
2025-08-07 07:40:03,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 30.0, 27.0, 30.0, 38.0, 28.0, 21.0, 24.0, 36.0, 40.0]
2025-08-07 07:40:03,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 26 minutes, 51 seconds)
2025-08-07 07:41:58,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:41:59,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 159.13597 ± 19.494
2025-08-07 07:41:59,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [166.17754, 168.34636, 146.74068, 182.66017, 171.53087, 119.4079, 139.58847, 150.9807, 158.89957, 187.02762]
2025-08-07 07:41:59,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 33.0, 28.0, 36.0, 33.0, 23.0, 27.0, 29.0, 31.0, 36.0]
2025-08-07 07:41:59,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 24 minutes, 45 seconds)
2025-08-07 07:43:54,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:43:54,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 140.01176 ± 19.850
2025-08-07 07:43:54,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [129.3697, 160.8985, 175.71819, 108.672935, 129.92741, 130.26903, 120.14988, 140.954, 140.88948, 163.2685]
2025-08-07 07:43:54,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 31.0, 34.0, 21.0, 25.0, 25.0, 23.0, 27.0, 27.0, 32.0]
2025-08-07 07:43:54,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 22 minutes, 52 seconds)
2025-08-07 07:45:49,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:45:49,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 159.86417 ± 63.537
2025-08-07 07:45:49,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [125.50327, 119.211525, 109.56042, 171.39925, 339.4474, 135.51057, 143.43073, 119.49201, 169.89357, 165.19301]
2025-08-07 07:45:49,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 23.0, 21.0, 33.0, 76.0, 26.0, 28.0, 23.0, 33.0, 32.0]
2025-08-07 07:45:49,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 20 minutes, 42 seconds)
2025-08-07 07:47:43,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:47:44,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 144.86252 ± 16.354
2025-08-07 07:47:44,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [158.69733, 135.60693, 159.12971, 162.78217, 153.32495, 109.3013, 154.74637, 125.23907, 139.70425, 150.09299]
2025-08-07 07:47:44,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 26.0, 31.0, 31.0, 30.0, 21.0, 30.0, 24.0, 27.0, 29.0]
2025-08-07 07:47:44,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 18 minutes, 28 seconds)
2025-08-07 07:49:37,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:49:38,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 146.91422 ± 29.095
2025-08-07 07:49:38,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [204.32726, 123.56159, 168.53131, 135.50449, 165.41579, 161.6791, 163.1297, 119.17538, 125.187164, 102.63033]
2025-08-07 07:49:38,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [41.0, 24.0, 33.0, 26.0, 32.0, 32.0, 32.0, 23.0, 24.0, 20.0]
2025-08-07 07:49:38,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 15 minutes, 59 seconds)
2025-08-07 07:51:32,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:51:32,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 168.00095 ± 49.232
2025-08-07 07:51:32,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [145.47957, 162.47511, 152.9319, 177.45326, 140.831, 162.00964, 309.69678, 159.98543, 123.78433, 145.3625]
2025-08-07 07:51:32,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 32.0, 31.0, 36.0, 27.0, 31.0, 64.0, 31.0, 24.0, 28.0]
2025-08-07 07:51:32,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 13 minutes, 52 seconds)
2025-08-07 07:53:26,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:53:26,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 177.75792 ± 26.422
2025-08-07 07:53:26,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [180.43608, 185.90761, 200.45505, 150.06143, 206.52704, 148.48093, 193.37933, 149.25597, 219.3242, 143.75163]
2025-08-07 07:53:26,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [36.0, 36.0, 40.0, 29.0, 42.0, 29.0, 38.0, 29.0, 43.0, 28.0]
2025-08-07 07:53:26,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 11 minutes, 37 seconds)
2025-08-07 07:55:20,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:55:20,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 153.91971 ± 28.578
2025-08-07 07:55:20,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [109.17914, 151.40053, 124.76575, 160.7614, 129.38914, 186.7985, 128.95561, 198.75885, 181.32886, 167.85924]
2025-08-07 07:55:20,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 29.0, 24.0, 31.0, 25.0, 36.0, 25.0, 39.0, 36.0, 33.0]
2025-08-07 07:55:20,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 9 minutes, 27 seconds)
2025-08-07 07:57:14,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:57:15,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 170.54929 ± 50.663
2025-08-07 07:57:15,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [169.58853, 139.7936, 107.99889, 177.02736, 140.4651, 173.40137, 149.68665, 148.40558, 192.81346, 306.31223]
2025-08-07 07:57:15,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 27.0, 21.0, 35.0, 27.0, 34.0, 29.0, 29.0, 38.0, 62.0]
2025-08-07 07:57:15,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 7 minutes, 32 seconds)
2025-08-07 07:59:09,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:59:09,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 148.78223 ± 22.985
2025-08-07 07:59:09,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [190.48106, 159.596, 173.84993, 134.82712, 119.68623, 157.23895, 124.76637, 135.33212, 167.39153, 124.65295]
2025-08-07 07:59:09,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [38.0, 31.0, 34.0, 26.0, 23.0, 31.0, 24.0, 26.0, 32.0, 24.0]
2025-08-07 07:59:09,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 5 minutes, 37 seconds)
2025-08-07 08:01:03,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:01:03,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 159.11993 ± 58.558
2025-08-07 08:01:03,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [186.23724, 103.13829, 125.30479, 96.79453, 179.78046, 212.97838, 125.15623, 132.91615, 130.4729, 298.42032]
2025-08-07 08:01:03,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [36.0, 20.0, 24.0, 19.0, 35.0, 43.0, 24.0, 26.0, 25.0, 64.0]
2025-08-07 08:01:03,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 3 minutes, 39 seconds)
2025-08-07 08:02:57,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:02:58,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 178.62285 ± 48.903
2025-08-07 08:02:58,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [167.27705, 130.18869, 168.58853, 170.38368, 143.36185, 174.1781, 314.26685, 190.85968, 185.98756, 141.13652]
2025-08-07 08:02:58,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 25.0, 33.0, 33.0, 28.0, 34.0, 64.0, 39.0, 37.0, 27.0]
2025-08-07 08:02:58,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 1 minute, 50 seconds)
2025-08-07 08:04:51,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:04:51,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 154.53616 ± 11.561
2025-08-07 08:04:51,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [156.03105, 155.55714, 135.74077, 160.66734, 140.86629, 169.40028, 174.49449, 151.30357, 143.73628, 157.56436]
2025-08-07 08:04:51,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 30.0, 26.0, 31.0, 27.0, 33.0, 34.0, 29.0, 28.0, 31.0]
2025-08-07 08:04:51,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 59 minutes, 54 seconds)
2025-08-07 08:06:44,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:06:44,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 156.54268 ± 63.545
2025-08-07 08:06:44,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [108.48698, 134.63306, 168.61009, 135.20728, 165.5166, 336.64578, 150.83548, 143.07954, 102.64217, 119.76981]
2025-08-07 08:06:44,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 26.0, 33.0, 26.0, 32.0, 69.0, 29.0, 28.0, 20.0, 23.0]
2025-08-07 08:06:44,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 57 minutes, 42 seconds)
2025-08-07 08:08:37,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:08:37,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 192.34299 ± 80.135
2025-08-07 08:08:37,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [214.94698, 139.12923, 108.87031, 160.42801, 141.42917, 192.4257, 415.51956, 188.10936, 170.66791, 191.90382]
2025-08-07 08:08:37,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [44.0, 27.0, 21.0, 31.0, 27.0, 38.0, 87.0, 37.0, 34.0, 37.0]
2025-08-07 08:08:37,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 55 minutes, 35 seconds)
2025-08-07 08:10:30,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:10:31,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 164.24179 ± 67.470
2025-08-07 08:10:31,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [184.13184, 140.24974, 355.56192, 150.72244, 113.9787, 129.69197, 143.86703, 176.37721, 134.6562, 113.180756]
2025-08-07 08:10:31,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [36.0, 27.0, 70.0, 29.0, 22.0, 25.0, 28.0, 34.0, 26.0, 22.0]
2025-08-07 08:10:31,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 53 minutes, 33 seconds)
2025-08-07 08:12:23,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:12:23,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 151.89993 ± 17.855
2025-08-07 08:12:23,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [154.1007, 149.81023, 186.8875, 119.77136, 139.78114, 163.96823, 130.0106, 154.42657, 162.27092, 157.97205]
2025-08-07 08:12:23,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 29.0, 37.0, 23.0, 27.0, 32.0, 25.0, 30.0, 32.0, 31.0]
2025-08-07 08:12:23,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 51 minutes, 12 seconds)
2025-08-07 08:14:15,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:14:15,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 184.22244 ± 56.695
2025-08-07 08:14:15,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [145.62367, 171.44742, 292.84937, 134.7643, 129.35757, 183.4552, 289.04916, 187.07559, 139.36319, 169.23904]
2025-08-07 08:14:15,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 34.0, 58.0, 26.0, 25.0, 36.0, 58.0, 37.0, 27.0, 33.0]
2025-08-07 08:14:15,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 48 minutes, 58 seconds)
2025-08-07 08:16:07,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:16:08,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 223.85059 ± 109.210
2025-08-07 08:16:08,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [159.11449, 155.2928, 447.85403, 169.70227, 374.8071, 134.36327, 224.53232, 312.92166, 151.24574, 108.67212]
2025-08-07 08:16:08,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 30.0, 89.0, 33.0, 78.0, 26.0, 44.0, 66.0, 29.0, 21.0]
2025-08-07 08:16:08,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 47 minutes, 5 seconds)
2025-08-07 08:18:00,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:18:01,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 159.66656 ± 61.871
2025-08-07 08:18:01,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [129.67575, 114.20357, 161.37447, 108.884346, 166.10608, 147.7682, 165.33858, 149.66249, 334.9842, 118.66814]
2025-08-07 08:18:01,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 22.0, 31.0, 21.0, 32.0, 29.0, 32.0, 29.0, 68.0, 23.0]
2025-08-07 08:18:01,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 45 minutes, 6 seconds)
2025-08-07 08:19:52,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:19:52,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 155.95828 ± 29.557
2025-08-07 08:19:52,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [143.37714, 178.70656, 165.88373, 129.6683, 162.21973, 227.9846, 140.73749, 133.92648, 119.56897, 157.50984]
2025-08-07 08:19:52,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 35.0, 32.0, 25.0, 31.0, 46.0, 27.0, 26.0, 23.0, 31.0]
2025-08-07 08:19:52,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 42 minutes, 55 seconds)
2025-08-07 08:21:44,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:21:45,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 142.21556 ± 21.347
2025-08-07 08:21:45,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [114.630325, 113.40475, 144.6638, 118.43303, 164.00395, 155.49356, 169.0441, 140.60208, 171.77867, 130.10135]
2025-08-07 08:21:45,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 22.0, 28.0, 23.0, 32.0, 30.0, 33.0, 27.0, 33.0, 25.0]
2025-08-07 08:21:45,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 41 minutes, 6 seconds)
2025-08-07 08:23:36,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:23:37,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 141.88261 ± 20.362
2025-08-07 08:23:37,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [107.570015, 145.01895, 165.90079, 154.43243, 149.35362, 102.83047, 135.93483, 164.61409, 141.72258, 151.44841]
2025-08-07 08:23:37,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 28.0, 32.0, 30.0, 29.0, 20.0, 26.0, 32.0, 27.0, 29.0]
2025-08-07 08:23:37,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 39 minutes, 13 seconds)
2025-08-07 08:25:29,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:25:29,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 175.21817 ± 58.830
2025-08-07 08:25:29,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [249.93648, 154.34067, 324.38, 152.19016, 150.41365, 149.3288, 138.63585, 140.04532, 158.61322, 134.29762]
2025-08-07 08:25:29,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [50.0, 30.0, 67.0, 30.0, 29.0, 29.0, 27.0, 27.0, 31.0, 26.0]
2025-08-07 08:25:29,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 37 minutes, 18 seconds)
2025-08-07 08:27:21,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:27:21,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 143.81766 ± 29.876
2025-08-07 08:27:21,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [166.0147, 140.0312, 205.5612, 177.67862, 135.2803, 140.88754, 135.56326, 97.20222, 120.50042, 119.4572]
2025-08-07 08:27:21,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 27.0, 40.0, 35.0, 26.0, 27.0, 26.0, 19.0, 23.0, 23.0]
2025-08-07 08:27:21,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 35 minutes, 20 seconds)
2025-08-07 08:29:13,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:29:13,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 152.14178 ± 28.998
2025-08-07 08:29:13,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [167.5565, 176.72137, 181.0222, 103.19597, 120.13746, 193.42583, 145.43925, 134.66135, 124.6732, 174.58473]
2025-08-07 08:29:13,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 34.0, 35.0, 20.0, 23.0, 38.0, 28.0, 26.0, 24.0, 34.0]
2025-08-07 08:29:13,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 33 minutes, 25 seconds)
2025-08-07 08:31:05,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:31:05,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 179.05330 ± 63.684
2025-08-07 08:31:05,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [190.36858, 125.286026, 168.99596, 317.0758, 124.83512, 164.7217, 166.21423, 142.73671, 113.789795, 276.509]
2025-08-07 08:31:05,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [38.0, 24.0, 33.0, 65.0, 24.0, 32.0, 32.0, 28.0, 22.0, 58.0]
2025-08-07 08:31:05,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 31 minutes, 32 seconds)
2025-08-07 08:32:57,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:32:58,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 168.02849 ± 56.216
2025-08-07 08:32:58,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [125.37186, 153.37503, 330.17545, 140.30217, 165.34418, 155.12433, 138.85825, 168.29811, 173.5266, 129.90889]
2025-08-07 08:32:58,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 30.0, 71.0, 27.0, 32.0, 30.0, 27.0, 32.0, 34.0, 25.0]
2025-08-07 08:32:58,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 29 minutes, 43 seconds)
2025-08-07 08:34:49,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:34:49,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 166.77318 ± 57.574
2025-08-07 08:34:49,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [129.41747, 181.79901, 144.46092, 129.91573, 160.0145, 159.29425, 139.10535, 146.07358, 333.67188, 143.9791]
2025-08-07 08:34:49,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 36.0, 28.0, 25.0, 31.0, 31.0, 27.0, 28.0, 68.0, 28.0]
2025-08-07 08:34:49,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 27 minutes, 43 seconds)
2025-08-07 08:36:41,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:36:42,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 139.18515 ± 24.439
2025-08-07 08:36:42,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [131.03592, 129.53987, 124.954765, 125.49606, 134.57437, 151.11781, 156.49776, 194.75421, 96.668144, 147.21259]
2025-08-07 08:36:42,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 25.0, 24.0, 24.0, 26.0, 29.0, 30.0, 40.0, 19.0, 29.0]
2025-08-07 08:36:42,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 25 minutes, 54 seconds)
2025-08-07 08:38:33,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:38:34,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 150.13449 ± 26.152
2025-08-07 08:38:34,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [124.34905, 164.06958, 173.01779, 119.751465, 169.9231, 197.9306, 130.03444, 135.53102, 119.402626, 167.33528]
2025-08-07 08:38:34,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 32.0, 33.0, 23.0, 33.0, 41.0, 25.0, 26.0, 23.0, 32.0]
2025-08-07 08:38:34,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 24 minutes, 7 seconds)
2025-08-07 08:40:26,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:40:26,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 139.43007 ± 27.908
2025-08-07 08:40:26,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [175.48657, 128.9974, 143.87552, 124.190216, 108.76187, 119.03402, 153.97632, 200.46584, 125.234375, 114.278534]
2025-08-07 08:40:26,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 25.0, 28.0, 24.0, 21.0, 23.0, 30.0, 39.0, 24.0, 22.0]
2025-08-07 08:40:26,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 22 minutes, 14 seconds)
2025-08-07 08:42:18,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:42:18,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 148.49341 ± 20.373
2025-08-07 08:42:18,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [173.93799, 165.89818, 119.35535, 149.13094, 134.43765, 114.84555, 140.0293, 160.59787, 176.79395, 149.90724]
2025-08-07 08:42:18,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 32.0, 23.0, 29.0, 26.0, 22.0, 27.0, 31.0, 35.0, 29.0]
2025-08-07 08:42:18,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 20 minutes, 22 seconds)
2025-08-07 08:44:10,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:44:11,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 184.62582 ± 86.679
2025-08-07 08:44:11,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [125.26776, 114.48271, 416.19724, 150.29231, 255.87311, 170.61165, 133.85533, 143.86128, 198.01889, 137.79793]
2025-08-07 08:44:11,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 22.0, 85.0, 29.0, 53.0, 33.0, 26.0, 28.0, 39.0, 27.0]
2025-08-07 08:44:11,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 18 minutes, 32 seconds)
2025-08-07 08:46:03,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:46:03,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 148.77371 ± 30.156
2025-08-07 08:46:03,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [160.95094, 188.98138, 114.906746, 150.06339, 135.31944, 119.30651, 124.56431, 193.85052, 114.46771, 185.32622]
2025-08-07 08:46:03,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 38.0, 22.0, 29.0, 26.0, 23.0, 24.0, 38.0, 22.0, 37.0]
2025-08-07 08:46:03,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 16 minutes, 42 seconds)
2025-08-07 08:47:55,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:47:55,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 166.73099 ± 55.794
2025-08-07 08:47:55,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [318.51678, 124.66752, 165.98688, 129.51709, 123.295166, 170.58505, 128.92911, 138.91977, 171.76732, 195.12512]
2025-08-07 08:47:55,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 24.0, 32.0, 25.0, 24.0, 33.0, 25.0, 27.0, 33.0, 38.0]
2025-08-07 08:47:55,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 14 minutes, 50 seconds)
2025-08-07 08:49:47,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:49:47,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 165.25882 ± 61.695
2025-08-07 08:49:47,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [113.29444, 144.30063, 130.44376, 208.0513, 114.0069, 124.41301, 190.0748, 327.93066, 145.36444, 154.70836]
2025-08-07 08:49:47,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 28.0, 25.0, 41.0, 22.0, 24.0, 37.0, 68.0, 28.0, 30.0]
2025-08-07 08:49:47,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 12 minutes, 56 seconds)
2025-08-07 08:51:39,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:51:40,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 154.27094 ± 21.658
2025-08-07 08:51:40,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [148.90997, 135.35022, 152.26825, 119.56872, 202.93398, 135.528, 159.69371, 155.66275, 171.44283, 161.35092]
2025-08-07 08:51:40,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 26.0, 30.0, 23.0, 40.0, 26.0, 31.0, 30.0, 33.0, 31.0]
2025-08-07 08:51:40,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 11 minutes, 5 seconds)
2025-08-07 08:53:31,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:53:32,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 150.64172 ± 15.867
2025-08-07 08:53:32,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [178.09671, 139.27052, 154.64188, 140.28922, 155.60748, 151.95029, 134.70244, 175.21404, 125.60489, 151.03983]
2025-08-07 08:53:32,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 27.0, 31.0, 27.0, 30.0, 30.0, 26.0, 34.0, 24.0, 29.0]
2025-08-07 08:53:32,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 9 minutes, 13 seconds)
2025-08-07 08:55:24,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:55:24,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 172.42215 ± 70.493
2025-08-07 08:55:24,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [371.33844, 166.67178, 187.24106, 171.32588, 154.94916, 119.328026, 113.92881, 150.84024, 119.02391, 169.57393]
2025-08-07 08:55:24,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 32.0, 37.0, 33.0, 30.0, 23.0, 22.0, 29.0, 23.0, 33.0]
2025-08-07 08:55:24,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 7 minutes, 20 seconds)
2025-08-07 08:57:16,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:57:17,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 154.42464 ± 47.545
2025-08-07 08:57:17,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [189.8989, 107.980446, 127.25279, 158.34976, 120.39537, 275.63675, 144.89592, 124.94744, 175.08899, 119.79993]
2025-08-07 08:57:17,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [38.0, 21.0, 25.0, 31.0, 23.0, 57.0, 28.0, 24.0, 35.0, 23.0]
2025-08-07 08:57:17,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 5 minutes, 29 seconds)
2025-08-07 08:59:08,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:59:09,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 174.51672 ± 73.354
2025-08-07 08:59:09,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [145.31737, 108.492676, 185.6559, 134.82991, 179.29515, 198.92758, 150.08589, 114.8413, 378.21884, 149.50253]
2025-08-07 08:59:09,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 21.0, 36.0, 26.0, 35.0, 39.0, 29.0, 22.0, 73.0, 29.0]
2025-08-07 08:59:09,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 3 minutes, 40 seconds)
2025-08-07 09:01:01,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:01:01,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 162.37642 ± 26.238
2025-08-07 09:01:01,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [114.15108, 173.17775, 135.85597, 155.52943, 205.59203, 134.83775, 178.76884, 184.64394, 161.50374, 179.70358]
2025-08-07 09:01:01,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 33.0, 26.0, 30.0, 40.0, 26.0, 35.0, 36.0, 31.0, 35.0]
2025-08-07 09:01:01,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 1 minute, 46 seconds)
2025-08-07 09:02:53,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:02:53,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 141.46724 ± 21.954
2025-08-07 09:02:53,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [167.09785, 119.45647, 168.81293, 156.0008, 128.74858, 114.243774, 136.04735, 166.11955, 149.9534, 108.19162]
2025-08-07 09:02:53,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 23.0, 33.0, 31.0, 25.0, 22.0, 26.0, 32.0, 29.0, 21.0]
2025-08-07 09:02:53,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 59 minutes, 52 seconds)
2025-08-07 09:04:45,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:04:46,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 173.05014 ± 88.856
2025-08-07 09:04:46,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [143.35512, 150.50032, 164.63483, 150.08781, 435.21463, 118.20061, 149.21677, 124.420006, 169.59232, 125.279]
2025-08-07 09:04:46,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 29.0, 32.0, 29.0, 91.0, 23.0, 29.0, 24.0, 33.0, 24.0]
2025-08-07 09:04:46,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 58 minutes, 1 second)
2025-08-07 09:06:37,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:06:38,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 191.61044 ± 73.933
2025-08-07 09:06:38,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [135.04868, 191.06972, 139.9343, 325.75232, 160.34961, 120.32806, 204.63678, 173.68736, 130.47577, 334.82178]
2025-08-07 09:06:38,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 37.0, 27.0, 67.0, 31.0, 23.0, 40.0, 34.0, 25.0, 75.0]
2025-08-07 09:06:38,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 56 minutes, 8 seconds)
2025-08-07 09:08:30,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:08:30,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 167.29832 ± 82.393
2025-08-07 09:08:30,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [150.63536, 114.51133, 270.85706, 373.9965, 119.26553, 140.00432, 133.88213, 113.701355, 103.0325, 153.09724]
2025-08-07 09:08:30,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 22.0, 55.0, 73.0, 23.0, 27.0, 26.0, 22.0, 20.0, 30.0]
2025-08-07 09:08:30,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 54 minutes, 16 seconds)
2025-08-07 09:10:22,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:10:22,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 147.55544 ± 23.918
2025-08-07 09:10:22,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [119.69315, 170.32912, 149.86281, 154.54782, 118.60996, 186.35417, 158.91632, 119.22242, 172.37671, 125.64171]
2025-08-07 09:10:22,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 33.0, 29.0, 30.0, 23.0, 37.0, 31.0, 23.0, 34.0, 24.0]
2025-08-07 09:10:23,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 52 minutes, 23 seconds)
2025-08-07 09:12:14,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:12:15,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 164.78264 ± 58.571
2025-08-07 09:12:15,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [149.35158, 130.34892, 108.86261, 157.45667, 145.12971, 140.549, 330.20074, 150.85869, 143.49916, 191.56937]
2025-08-07 09:12:15,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 25.0, 21.0, 31.0, 28.0, 27.0, 65.0, 29.0, 28.0, 37.0]
2025-08-07 09:12:15,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 50 minutes, 33 seconds)
2025-08-07 09:14:07,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:14:07,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 147.60007 ± 30.823
2025-08-07 09:14:07,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [109.10741, 134.5765, 180.73137, 207.64723, 102.93085, 149.40187, 119.733696, 159.09358, 149.89516, 162.88309]
2025-08-07 09:14:07,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 26.0, 36.0, 41.0, 20.0, 29.0, 23.0, 31.0, 29.0, 31.0]
2025-08-07 09:14:07,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 48 minutes, 39 seconds)
2025-08-07 09:15:59,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:16:00,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 175.72998 ± 77.512
2025-08-07 09:16:00,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [113.30812, 145.73729, 165.7474, 396.55695, 125.23212, 177.43938, 199.51299, 144.46533, 130.64346, 158.65689]
2025-08-07 09:16:00,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 28.0, 33.0, 83.0, 24.0, 34.0, 39.0, 28.0, 25.0, 31.0]
2025-08-07 09:16:00,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 46 minutes, 49 seconds)
2025-08-07 09:17:52,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:17:52,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 176.16412 ± 74.159
2025-08-07 09:17:52,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [124.65107, 215.98566, 108.22373, 160.30997, 213.05014, 371.7094, 125.79728, 168.23538, 154.13814, 119.54041]
2025-08-07 09:17:52,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 42.0, 21.0, 31.0, 42.0, 75.0, 24.0, 32.0, 30.0, 23.0]
2025-08-07 09:17:52,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 44 minutes, 57 seconds)
2025-08-07 09:19:45,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:19:45,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 175.76955 ± 67.559
2025-08-07 09:19:45,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [124.30214, 160.3196, 131.15553, 168.41934, 366.1244, 211.84721, 134.72449, 152.1115, 151.94357, 156.74753]
2025-08-07 09:19:45,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 31.0, 25.0, 33.0, 71.0, 41.0, 26.0, 30.0, 29.0, 30.0]
2025-08-07 09:19:45,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 43 minutes, 9 seconds)
2025-08-07 09:21:37,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:21:38,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 138.81941 ± 22.743
2025-08-07 09:21:38,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [171.37039, 152.05504, 119.93083, 182.90065, 132.02182, 113.7603, 139.75354, 119.517265, 114.75325, 142.13098]
2025-08-07 09:21:38,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 29.0, 23.0, 37.0, 26.0, 22.0, 27.0, 23.0, 22.0, 27.0]
2025-08-07 09:21:38,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 41 minutes, 15 seconds)
2025-08-07 09:23:30,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:23:30,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 175.22644 ± 45.479
2025-08-07 09:23:30,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [181.36217, 174.21574, 181.60329, 186.64, 143.55301, 131.01833, 196.43013, 130.82976, 135.2178, 291.3943]
2025-08-07 09:23:30,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 34.0, 36.0, 36.0, 28.0, 25.0, 39.0, 25.0, 26.0, 60.0]
2025-08-07 09:23:30,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 39 minutes, 24 seconds)
2025-08-07 09:25:22,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:25:22,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 179.27739 ± 71.366
2025-08-07 09:25:22,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [152.1856, 129.2843, 156.00214, 172.96889, 172.60022, 134.73235, 210.50429, 382.14026, 135.86375, 146.49223]
2025-08-07 09:25:22,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 25.0, 30.0, 34.0, 34.0, 26.0, 42.0, 82.0, 26.0, 28.0]
2025-08-07 09:25:22,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 37 minutes, 29 seconds)
2025-08-07 09:27:14,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:27:15,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 177.60071 ± 87.092
2025-08-07 09:27:15,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [162.44704, 126.11066, 197.80302, 429.61142, 129.8778, 180.5491, 135.09433, 124.83853, 141.10391, 148.57135]
2025-08-07 09:27:15,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 24.0, 38.0, 85.0, 25.0, 35.0, 26.0, 24.0, 27.0, 29.0]
2025-08-07 09:27:15,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 35 minutes, 37 seconds)
2025-08-07 09:29:07,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:29:07,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 179.04597 ± 67.173
2025-08-07 09:29:07,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [195.55548, 145.08852, 156.387, 349.39767, 114.66488, 129.48375, 157.15509, 245.38837, 128.22281, 169.1162]
2025-08-07 09:29:07,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [38.0, 28.0, 30.0, 70.0, 22.0, 25.0, 30.0, 49.0, 25.0, 33.0]
2025-08-07 09:29:07,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 33 minutes, 42 seconds)
2025-08-07 09:30:59,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:30:59,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 165.09174 ± 21.795
2025-08-07 09:30:59,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [170.2681, 118.94102, 181.63731, 151.04921, 169.73328, 154.12625, 204.78519, 165.77899, 153.24907, 181.34904]
2025-08-07 09:30:59,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 23.0, 37.0, 29.0, 33.0, 30.0, 41.0, 32.0, 30.0, 36.0]
2025-08-07 09:30:59,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 31 minutes, 49 seconds)
2025-08-07 09:32:52,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:32:52,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 160.33606 ± 35.386
2025-08-07 09:32:52,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [222.09883, 129.62271, 198.18796, 208.93756, 150.60458, 165.90625, 149.00716, 128.83638, 135.96219, 114.19696]
2025-08-07 09:32:52,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [43.0, 25.0, 39.0, 43.0, 29.0, 32.0, 29.0, 25.0, 26.0, 22.0]
2025-08-07 09:32:52,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 29 minutes, 57 seconds)
2025-08-07 09:34:44,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:34:44,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 192.69432 ± 92.657
2025-08-07 09:34:44,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [130.37393, 170.25574, 134.66364, 151.08804, 150.44922, 383.94104, 141.51167, 171.91057, 125.55795, 367.19138]
2025-08-07 09:34:44,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 33.0, 26.0, 29.0, 29.0, 79.0, 28.0, 33.0, 24.0, 78.0]
2025-08-07 09:34:44,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 28 minutes, 5 seconds)
2025-08-07 09:36:36,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:36:37,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 180.39812 ± 46.381
2025-08-07 09:36:37,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [159.87112, 192.32265, 113.33508, 193.68361, 129.33589, 291.40735, 201.14165, 184.94933, 151.19604, 186.73853]
2025-08-07 09:36:37,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 38.0, 22.0, 38.0, 25.0, 58.0, 40.0, 38.0, 29.0, 37.0]
2025-08-07 09:36:37,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 26 minutes, 12 seconds)
2025-08-07 09:38:28,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:38:29,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 170.19275 ± 45.184
2025-08-07 09:38:29,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [164.19551, 129.89536, 166.01762, 189.24402, 113.33136, 188.48232, 268.77448, 113.70905, 211.61601, 156.66188]
2025-08-07 09:38:29,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 25.0, 32.0, 37.0, 22.0, 38.0, 55.0, 22.0, 42.0, 30.0]
2025-08-07 09:38:29,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 24 minutes, 20 seconds)
2025-08-07 09:40:20,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:40:21,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 194.66928 ± 102.249
2025-08-07 09:40:21,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [129.98688, 107.65742, 197.83902, 282.14413, 108.293564, 119.03923, 272.6036, 165.05994, 124.33859, 439.73047]
2025-08-07 09:40:21,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 21.0, 39.0, 57.0, 21.0, 23.0, 54.0, 32.0, 24.0, 89.0]
2025-08-07 09:40:21,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 22 minutes, 28 seconds)
2025-08-07 09:42:13,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:42:13,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 157.55003 ± 26.536
2025-08-07 09:42:13,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [205.51376, 168.50876, 146.25003, 169.9971, 123.18365, 120.06275, 134.61514, 180.65945, 145.68306, 181.02661]
2025-08-07 09:42:13,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [40.0, 33.0, 28.0, 33.0, 24.0, 23.0, 26.0, 35.0, 28.0, 36.0]
2025-08-07 09:42:13,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 20 minutes, 34 seconds)
2025-08-07 09:44:05,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:44:06,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 193.80241 ± 75.998
2025-08-07 09:44:06,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [199.06204, 179.38872, 152.09546, 251.50127, 114.53555, 215.18465, 390.10852, 155.10219, 139.70697, 141.33891]
2025-08-07 09:44:06,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [40.0, 35.0, 29.0, 50.0, 22.0, 43.0, 81.0, 30.0, 27.0, 27.0]
2025-08-07 09:44:06,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 18 minutes, 42 seconds)
2025-08-07 09:45:57,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:45:58,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 147.11032 ± 18.887
2025-08-07 09:45:58,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [160.79396, 159.77333, 134.88066, 154.92499, 135.66898, 108.39219, 159.00262, 165.64093, 167.13002, 124.89552]
2025-08-07 09:45:58,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 31.0, 26.0, 30.0, 26.0, 21.0, 31.0, 32.0, 33.0, 24.0]
2025-08-07 09:45:58,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 16 minutes, 49 seconds)
2025-08-07 09:47:49,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:47:50,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 138.10069 ± 17.808
2025-08-07 09:47:50,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [130.08978, 109.1953, 151.06429, 119.98831, 159.40236, 127.54788, 165.62889, 154.39334, 124.470024, 139.22678]
2025-08-07 09:47:50,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 21.0, 29.0, 23.0, 31.0, 25.0, 32.0, 30.0, 24.0, 27.0]
2025-08-07 09:47:50,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 14 minutes, 57 seconds)
2025-08-07 09:49:41,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:49:42,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 166.02252 ± 31.651
2025-08-07 09:49:42,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [180.91634, 144.98964, 230.54457, 165.68315, 205.78387, 129.62518, 139.92506, 155.51326, 129.2723, 177.97185]
2025-08-07 09:49:42,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 28.0, 45.0, 32.0, 41.0, 25.0, 27.0, 30.0, 25.0, 36.0]
2025-08-07 09:49:42,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 13 minutes, 4 seconds)
2025-08-07 09:51:33,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:51:34,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 157.25371 ± 24.073
2025-08-07 09:51:34,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [173.88347, 141.08522, 128.95552, 149.9449, 216.69179, 153.61198, 166.70119, 152.85551, 158.8569, 129.95062]
2025-08-07 09:51:34,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 27.0, 25.0, 29.0, 43.0, 30.0, 32.0, 30.0, 31.0, 25.0]
2025-08-07 09:51:34,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 11 minutes, 12 seconds)
2025-08-07 09:53:26,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:53:26,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 180.29169 ± 49.132
2025-08-07 09:53:26,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [103.47684, 150.2786, 171.25366, 151.06194, 177.6894, 182.54187, 223.2252, 243.60773, 269.78256, 129.99898]
2025-08-07 09:53:26,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 29.0, 33.0, 29.0, 35.0, 36.0, 46.0, 50.0, 54.0, 25.0]
2025-08-07 09:53:26,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes, 20 seconds)
2025-08-07 09:55:18,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:55:18,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 158.63864 ± 33.214
2025-08-07 09:55:18,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [144.17303, 151.57695, 128.88344, 207.97307, 118.7844, 159.09505, 230.71854, 133.27264, 158.36253, 153.54686]
2025-08-07 09:55:18,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 29.0, 25.0, 41.0, 23.0, 31.0, 45.0, 26.0, 31.0, 30.0]
2025-08-07 09:55:18,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 28 seconds)
2025-08-07 09:57:10,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:57:10,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 152.51276 ± 27.746
2025-08-07 09:57:10,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [168.33514, 125.438034, 114.068275, 131.97969, 150.39777, 176.31813, 161.96579, 119.74106, 204.24155, 172.64214]
2025-08-07 09:57:10,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 24.0, 22.0, 26.0, 29.0, 34.0, 31.0, 23.0, 40.0, 34.0]
2025-08-07 09:57:10,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 36 seconds)
2025-08-07 09:59:02,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:59:02,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 153.26480 ± 21.648
2025-08-07 09:59:02,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [170.31134, 139.00041, 188.88556, 163.80551, 169.1275, 133.55756, 159.45312, 164.98596, 119.346054, 124.17503]
2025-08-07 09:59:02,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 27.0, 37.0, 32.0, 33.0, 26.0, 31.0, 33.0, 23.0, 24.0]
2025-08-07 09:59:02,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 44 seconds)
2025-08-07 10:00:54,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:00:55,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 141.09683 ± 23.869
2025-08-07 10:00:55,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [164.839, 129.0916, 103.35098, 139.76459, 109.22923, 145.84279, 130.05739, 157.80618, 187.09656, 143.88995]
2025-08-07 10:00:55,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 25.0, 20.0, 27.0, 21.0, 28.0, 25.0, 30.0, 36.0, 28.0]
2025-08-07 10:00:55,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 52 seconds)
2025-08-07 10:02:46,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:02:46,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 161.00465 ± 24.474
2025-08-07 10:02:46,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [112.51492, 157.83684, 144.02818, 163.54536, 191.28423, 189.38724, 174.14476, 181.41972, 129.49448, 166.39075]
2025-08-07 10:02:46,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 31.0, 28.0, 32.0, 39.0, 38.0, 34.0, 36.0, 25.0, 33.0]
2025-08-07 10:02:46,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1251 [DEBUG]: Training session finished
