2025-08-07 07:07:55,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc25-humanoid/ExtremeClogL1U23-bpql-mem24
2025-08-07 07:07:55,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc25-humanoid/ExtremeClogL1U23-bpql-mem24
2025-08-07 07:07:55,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x14f49d270550>}
2025-08-07 07:07:55,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1111 [DEBUG]: using device: cuda
2025-08-07 07:07:55,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1133 [INFO]: Creating new trainer
2025-08-07 07:07:55,640 baseline-bpql-noiseperc25-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=784, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-08-07 07:07:55,640 baseline-bpql-noiseperc25-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 07:07:58,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1194 [DEBUG]: Starting training session...
2025-08-07 07:07:58,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 1/100
2025-08-07 07:09:44,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:09:45,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 195.08566 ± 114.748
2025-08-07 07:09:45,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [317.03934, 322.63425, 89.32214, 291.52243, 397.1841, 108.07992, 113.06355, 106.58095, 101.09158, 104.33829]
2025-08-07 07:09:45,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [68.0, 67.0, 18.0, 56.0, 86.0, 21.0, 22.0, 21.0, 20.0, 21.0]
2025-08-07 07:09:45,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1226 [INFO]: New best (195.09) for latency ExtremeClogL1U23
2025-08-07 07:09:45,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 55 minutes, 19 seconds)
2025-08-07 07:11:39,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:11:40,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 165.88637 ± 81.258
2025-08-07 07:11:40,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [90.293274, 303.92947, 311.07004, 234.1552, 151.49745, 95.0341, 96.09927, 122.017136, 143.70334, 111.06432]
2025-08-07 07:11:40,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 63.0, 61.0, 44.0, 29.0, 19.0, 19.0, 24.0, 28.0, 22.0]
2025-08-07 07:11:40,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 1 minute)
2025-08-07 07:13:34,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:13:35,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 227.62048 ± 139.287
2025-08-07 07:13:35,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [306.83206, 380.23575, 136.42789, 129.1297, 506.09363, 356.2851, 123.561104, 146.3992, 90.62624, 100.61415]
2025-08-07 07:13:35,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [60.0, 68.0, 27.0, 25.0, 103.0, 65.0, 24.0, 28.0, 18.0, 20.0]
2025-08-07 07:13:35,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1226 [INFO]: New best (227.62) for latency ExtremeClogL1U23
2025-08-07 07:13:35,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 1 minute, 22 seconds)
2025-08-07 07:15:30,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:15:30,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 162.83249 ± 99.134
2025-08-07 07:15:30,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [129.25668, 95.64285, 101.907555, 342.85602, 95.468925, 112.221405, 96.05485, 104.67791, 365.54977, 184.68883]
2025-08-07 07:15:30,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 19.0, 20.0, 63.0, 19.0, 22.0, 19.0, 21.0, 68.0, 35.0]
2025-08-07 07:15:30,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 43 seconds)
2025-08-07 07:17:23,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:17:23,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 120.18333 ± 29.389
2025-08-07 07:17:23,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [124.95813, 152.41872, 150.42915, 89.52185, 176.48445, 96.13166, 118.34195, 88.91996, 89.039375, 115.58804]
2025-08-07 07:17:23,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 30.0, 29.0, 18.0, 34.0, 19.0, 23.0, 18.0, 18.0, 23.0]
2025-08-07 07:17:23,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 58 minutes, 57 seconds)
2025-08-07 07:19:18,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:19:18,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 101.83963 ± 19.027
2025-08-07 07:19:18,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [89.72093, 89.00642, 84.128136, 88.674225, 88.96779, 95.437004, 112.59078, 119.419655, 101.808624, 148.64278]
2025-08-07 07:19:18,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 18.0, 17.0, 18.0, 18.0, 19.0, 23.0, 23.0, 20.0, 28.0]
2025-08-07 07:19:18,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 59 minutes, 37 seconds)
2025-08-07 07:21:14,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:21:14,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 148.71973 ± 91.274
2025-08-07 07:21:14,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [119.24907, 161.14282, 83.7264, 128.62225, 362.79605, 279.4401, 84.294945, 88.61745, 88.903435, 90.40468]
2025-08-07 07:21:14,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 31.0, 17.0, 25.0, 72.0, 53.0, 17.0, 18.0, 18.0, 18.0]
2025-08-07 07:21:14,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 57 minutes, 57 seconds)
2025-08-07 07:23:08,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:23:08,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 144.12387 ± 102.697
2025-08-07 07:23:08,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [277.18466, 90.35594, 404.9355, 108.33672, 88.80539, 95.36041, 101.93864, 95.27863, 89.06159, 89.981155]
2025-08-07 07:23:08,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [53.0, 18.0, 81.0, 21.0, 18.0, 19.0, 20.0, 19.0, 18.0, 18.0]
2025-08-07 07:23:08,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 55 minutes, 53 seconds)
2025-08-07 07:25:02,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:25:03,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 127.51542 ± 90.292
2025-08-07 07:25:03,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [395.6778, 96.539185, 123.45323, 84.02557, 83.65984, 101.17958, 83.59063, 112.99682, 105.0875, 88.94392]
2025-08-07 07:25:03,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 19.0, 25.0, 17.0, 17.0, 20.0, 17.0, 22.0, 21.0, 18.0]
2025-08-07 07:25:03,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 53 minutes, 40 seconds)
2025-08-07 07:26:58,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:26:59,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 200.05423 ± 99.071
2025-08-07 07:26:59,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [106.74509, 122.309425, 301.35178, 129.52716, 344.35748, 119.81175, 341.2844, 143.71959, 101.522835, 289.9128]
2025-08-07 07:26:59,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 25.0, 56.0, 26.0, 69.0, 23.0, 67.0, 28.0, 20.0, 54.0]
2025-08-07 07:26:59,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 52 minutes, 33 seconds)
2025-08-07 07:28:53,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:28:53,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 165.38223 ± 108.099
2025-08-07 07:28:53,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [127.85233, 105.02575, 177.20833, 419.63763, 110.443054, 322.75568, 123.04476, 95.30289, 83.77787, 88.77399]
2025-08-07 07:28:53,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 21.0, 35.0, 79.0, 22.0, 64.0, 24.0, 19.0, 17.0, 18.0]
2025-08-07 07:28:53,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 50 minutes, 40 seconds)
2025-08-07 07:30:47,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:30:48,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 112.46145 ± 24.779
2025-08-07 07:30:48,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [88.7129, 90.347595, 84.04174, 110.87227, 123.9404, 101.72866, 167.07472, 137.08862, 125.10494, 95.70261]
2025-08-07 07:30:48,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 18.0, 17.0, 22.0, 24.0, 20.0, 32.0, 27.0, 24.0, 19.0]
2025-08-07 07:30:48,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 48 minutes, 14 seconds)
2025-08-07 07:32:42,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:32:43,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 164.79155 ± 86.191
2025-08-07 07:32:43,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [101.555504, 102.176254, 336.8641, 205.2917, 122.615074, 94.2231, 110.90786, 119.70125, 315.64548, 138.93521]
2025-08-07 07:32:43,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 20.0, 63.0, 44.0, 24.0, 19.0, 22.0, 23.0, 58.0, 27.0]
2025-08-07 07:32:43,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 46 minutes, 34 seconds)
2025-08-07 07:34:37,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:34:38,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 140.23660 ± 62.347
2025-08-07 07:34:38,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [135.09357, 90.32011, 101.292786, 138.45483, 155.25574, 312.90802, 158.8132, 95.01977, 95.922035, 119.28618]
2025-08-07 07:34:38,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 18.0, 20.0, 28.0, 30.0, 58.0, 30.0, 19.0, 19.0, 23.0]
2025-08-07 07:34:38,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 44 minutes, 49 seconds)
2025-08-07 07:36:31,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:36:32,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 214.26926 ± 138.466
2025-08-07 07:36:32,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [490.58438, 326.7145, 172.13022, 173.15686, 100.469795, 121.41802, 142.84369, 95.223076, 424.73566, 95.41653]
2025-08-07 07:36:32,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 59.0, 33.0, 34.0, 20.0, 25.0, 28.0, 19.0, 80.0, 19.0]
2025-08-07 07:36:32,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 42 minutes, 23 seconds)
2025-08-07 07:38:27,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:38:27,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 179.79012 ± 130.999
2025-08-07 07:38:27,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [140.8151, 134.24416, 96.205894, 119.23241, 531.27313, 107.618576, 94.95173, 309.26715, 139.57382, 124.719154]
2025-08-07 07:38:27,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 26.0, 19.0, 23.0, 104.0, 21.0, 19.0, 63.0, 27.0, 24.0]
2025-08-07 07:38:27,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 40 minutes, 42 seconds)
2025-08-07 07:40:21,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:40:22,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 146.22104 ± 81.103
2025-08-07 07:40:22,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [193.15277, 111.45784, 121.409164, 96.52362, 104.362434, 109.05617, 94.88233, 371.4894, 163.3707, 96.505844]
2025-08-07 07:40:22,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [38.0, 22.0, 24.0, 19.0, 21.0, 22.0, 19.0, 68.0, 31.0, 19.0]
2025-08-07 07:40:22,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 38 minutes, 53 seconds)
2025-08-07 07:42:16,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:42:16,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 182.32146 ± 106.646
2025-08-07 07:42:16,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [334.49072, 352.87085, 116.47686, 121.93036, 112.56906, 131.3977, 344.87546, 95.88234, 123.8993, 88.821846]
2025-08-07 07:42:16,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [70.0, 70.0, 23.0, 24.0, 23.0, 26.0, 65.0, 19.0, 24.0, 18.0]
2025-08-07 07:42:16,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 36 minutes, 46 seconds)
2025-08-07 07:44:11,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:44:12,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 174.94353 ± 107.400
2025-08-07 07:44:12,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [318.68652, 101.57305, 106.44341, 132.8907, 101.81734, 118.26307, 372.74976, 88.84707, 89.969444, 318.1949]
2025-08-07 07:44:12,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [62.0, 20.0, 21.0, 26.0, 20.0, 23.0, 72.0, 18.0, 18.0, 60.0]
2025-08-07 07:44:12,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 34 minutes, 57 seconds)
2025-08-07 07:46:06,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:46:07,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 132.27986 ± 51.981
2025-08-07 07:46:07,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [277.25955, 120.11126, 113.09842, 130.33434, 90.72487, 83.874626, 150.69844, 136.66998, 104.45422, 115.572876]
2025-08-07 07:46:07,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [52.0, 23.0, 22.0, 26.0, 18.0, 17.0, 29.0, 27.0, 21.0, 23.0]
2025-08-07 07:46:07,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 33 minutes, 19 seconds)
2025-08-07 07:48:02,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:48:02,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 137.76326 ± 82.520
2025-08-07 07:48:02,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [125.68312, 381.39807, 134.20898, 95.45717, 105.52577, 123.8331, 102.810875, 95.29274, 124.054474, 89.36846]
2025-08-07 07:48:02,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 76.0, 26.0, 19.0, 21.0, 24.0, 20.0, 19.0, 24.0, 18.0]
2025-08-07 07:48:02,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 31 minutes, 22 seconds)
2025-08-07 07:49:56,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:49:57,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 229.84781 ± 114.325
2025-08-07 07:49:57,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [104.453354, 108.30324, 340.53534, 394.1256, 127.9965, 321.73206, 84.03369, 300.68008, 342.72208, 173.89606]
2025-08-07 07:49:57,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 21.0, 71.0, 79.0, 25.0, 65.0, 17.0, 60.0, 67.0, 34.0]
2025-08-07 07:49:57,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1226 [INFO]: New best (229.85) for latency ExtremeClogL1U23
2025-08-07 07:49:57,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 29 minutes, 27 seconds)
2025-08-07 07:51:51,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:51:51,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 215.91096 ± 167.327
2025-08-07 07:51:51,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [117.868095, 281.62714, 95.00047, 636.3217, 397.21432, 129.0021, 111.38422, 101.29147, 150.13501, 139.26503]
2025-08-07 07:51:51,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 56.0, 19.0, 119.0, 74.0, 25.0, 22.0, 20.0, 29.0, 27.0]
2025-08-07 07:51:51,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 27 minutes, 32 seconds)
2025-08-07 07:53:46,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:53:46,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 165.25034 ± 106.468
2025-08-07 07:53:46,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [94.337685, 101.829544, 95.422714, 382.00977, 142.0268, 162.44177, 100.483864, 365.7793, 102.24401, 105.92796]
2025-08-07 07:53:46,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 20.0, 19.0, 82.0, 28.0, 32.0, 20.0, 68.0, 20.0, 21.0]
2025-08-07 07:53:46,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 25 minutes, 33 seconds)
2025-08-07 07:55:41,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:55:42,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 230.61179 ± 137.321
2025-08-07 07:55:42,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [324.49234, 124.30311, 129.17659, 138.83401, 488.5345, 100.60635, 409.85638, 337.99164, 105.68432, 146.6387]
2025-08-07 07:55:42,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [60.0, 24.0, 25.0, 27.0, 101.0, 20.0, 83.0, 64.0, 21.0, 28.0]
2025-08-07 07:55:42,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1226 [INFO]: New best (230.61) for latency ExtremeClogL1U23
2025-08-07 07:55:42,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 23 minutes, 41 seconds)
2025-08-07 07:57:35,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:57:36,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 217.15408 ± 133.739
2025-08-07 07:57:36,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [326.43268, 151.41301, 463.69812, 100.298965, 83.82116, 95.472725, 375.509, 96.55899, 325.7296, 152.60648]
2025-08-07 07:57:36,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [60.0, 30.0, 86.0, 20.0, 17.0, 19.0, 69.0, 19.0, 62.0, 30.0]
2025-08-07 07:57:36,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 21 minutes, 38 seconds)
2025-08-07 07:59:30,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:59:31,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 151.30156 ± 100.907
2025-08-07 07:59:31,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [100.66713, 101.721535, 111.98989, 95.9491, 107.4014, 444.88892, 172.16959, 128.34724, 95.63304, 154.24774]
2025-08-07 07:59:31,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 20.0, 22.0, 19.0, 21.0, 96.0, 33.0, 25.0, 19.0, 30.0]
2025-08-07 07:59:31,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 19 minutes, 40 seconds)
2025-08-07 08:01:25,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:01:26,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 208.55171 ± 117.413
2025-08-07 08:01:26,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [149.88686, 83.82768, 113.610954, 147.64638, 394.49393, 403.3075, 354.76526, 162.5491, 144.83443, 130.595]
2025-08-07 08:01:26,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 17.0, 22.0, 29.0, 77.0, 75.0, 71.0, 31.0, 28.0, 25.0]
2025-08-07 08:01:26,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 17 minutes, 52 seconds)
2025-08-07 08:03:20,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:03:21,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 132.02315 ± 62.405
2025-08-07 08:03:21,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [88.876564, 128.04759, 125.343864, 88.87936, 312.53094, 123.38576, 110.93024, 140.43785, 95.89115, 105.908165]
2025-08-07 08:03:21,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 25.0, 24.0, 18.0, 58.0, 24.0, 22.0, 27.0, 19.0, 21.0]
2025-08-07 08:03:21,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 15 minutes, 58 seconds)
2025-08-07 08:05:16,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:05:16,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 152.90836 ± 89.354
2025-08-07 08:05:16,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [117.66353, 95.46321, 107.83641, 123.08872, 109.28932, 124.90442, 367.1911, 100.64034, 286.53723, 96.469154]
2025-08-07 08:05:16,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 19.0, 21.0, 24.0, 22.0, 24.0, 78.0, 20.0, 55.0, 19.0]
2025-08-07 08:05:16,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 14 minutes, 3 seconds)
2025-08-07 08:07:11,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:07:11,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 153.87880 ± 70.419
2025-08-07 08:07:11,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [106.9406, 110.19308, 105.03299, 212.28535, 156.71272, 340.4814, 145.31537, 115.88052, 149.65187, 96.294044]
2025-08-07 08:07:11,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 22.0, 21.0, 42.0, 31.0, 65.0, 28.0, 23.0, 29.0, 19.0]
2025-08-07 08:07:11,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 12 minutes, 17 seconds)
2025-08-07 08:09:06,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:09:07,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 172.93535 ± 114.125
2025-08-07 08:09:07,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [89.195915, 108.42636, 89.03589, 390.817, 148.01042, 403.8009, 95.14154, 139.69933, 121.8328, 143.39325]
2025-08-07 08:09:07,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 21.0, 18.0, 72.0, 30.0, 85.0, 19.0, 28.0, 24.0, 28.0]
2025-08-07 08:09:07,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 10 minutes, 37 seconds)
2025-08-07 08:11:00,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:11:01,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 127.94106 ± 64.418
2025-08-07 08:11:01,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [89.25272, 315.37616, 117.25854, 141.83476, 89.7052, 120.39576, 113.74516, 100.96494, 95.70626, 95.17098]
2025-08-07 08:11:01,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 65.0, 23.0, 28.0, 18.0, 24.0, 23.0, 20.0, 19.0, 19.0]
2025-08-07 08:11:01,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 8 minutes, 25 seconds)
2025-08-07 08:12:55,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:12:56,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 152.46239 ± 102.725
2025-08-07 08:12:56,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [100.9751, 88.76726, 295.58554, 101.79263, 106.93807, 90.298454, 138.67383, 101.10256, 95.523476, 404.96695]
2025-08-07 08:12:56,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 18.0, 57.0, 20.0, 21.0, 18.0, 28.0, 20.0, 19.0, 74.0]
2025-08-07 08:12:56,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 6 minutes, 33 seconds)
2025-08-07 08:14:50,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:14:50,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 201.52502 ± 111.805
2025-08-07 08:14:50,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [122.93779, 302.8091, 112.944336, 101.61998, 352.40192, 95.47638, 134.48991, 353.39792, 100.169586, 339.0033]
2025-08-07 08:14:50,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 59.0, 22.0, 20.0, 68.0, 19.0, 26.0, 68.0, 20.0, 64.0]
2025-08-07 08:14:50,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 4 minutes, 23 seconds)
2025-08-07 08:16:44,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:16:45,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 187.72885 ± 108.647
2025-08-07 08:16:45,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [119.460754, 120.96647, 107.731636, 110.8562, 118.63199, 107.61185, 135.25056, 335.40805, 383.38126, 337.98978]
2025-08-07 08:16:45,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 24.0, 21.0, 23.0, 24.0, 21.0, 26.0, 70.0, 77.0, 74.0]
2025-08-07 08:16:45,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 2 minutes, 23 seconds)
2025-08-07 08:18:39,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:18:40,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 223.82324 ± 113.844
2025-08-07 08:18:40,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [337.92755, 292.88828, 112.81008, 122.44561, 89.57035, 296.79837, 154.75133, 90.04308, 379.6568, 361.34116]
2025-08-07 08:18:40,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 57.0, 22.0, 24.0, 18.0, 59.0, 30.0, 18.0, 76.0, 70.0]
2025-08-07 08:18:40,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 20 seconds)
2025-08-07 08:20:34,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:20:35,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 179.04727 ± 100.895
2025-08-07 08:20:35,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [308.37225, 111.62159, 138.73232, 95.43837, 103.09541, 122.48418, 361.2208, 323.29593, 124.30449, 101.907295]
2025-08-07 08:20:35,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [66.0, 22.0, 27.0, 19.0, 20.0, 24.0, 67.0, 65.0, 25.0, 20.0]
2025-08-07 08:20:35,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 58 minutes, 37 seconds)
2025-08-07 08:22:29,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:22:30,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 171.48442 ± 119.456
2025-08-07 08:22:30,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [116.602264, 406.0974, 123.27333, 112.59114, 412.47516, 126.773186, 89.144, 100.97015, 125.46054, 101.45694]
2025-08-07 08:22:30,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 75.0, 24.0, 22.0, 85.0, 24.0, 18.0, 20.0, 24.0, 20.0]
2025-08-07 08:22:30,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 56 minutes, 39 seconds)
2025-08-07 08:24:24,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:24:25,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 181.70578 ± 102.050
2025-08-07 08:24:25,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [117.70821, 177.9903, 175.58762, 289.71198, 116.9789, 123.73688, 158.22841, 443.27393, 89.24529, 124.59649]
2025-08-07 08:24:25,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 34.0, 34.0, 56.0, 23.0, 24.0, 31.0, 86.0, 18.0, 24.0]
2025-08-07 08:24:25,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 54 minutes, 53 seconds)
2025-08-07 08:26:19,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:26:19,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 232.25552 ± 166.197
2025-08-07 08:26:19,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [362.61926, 116.21251, 95.83937, 380.94388, 95.0098, 109.269066, 94.81851, 487.40726, 89.76399, 490.67145]
2025-08-07 08:26:19,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [67.0, 23.0, 19.0, 72.0, 19.0, 22.0, 19.0, 99.0, 18.0, 108.0]
2025-08-07 08:26:19,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1226 [INFO]: New best (232.26) for latency ExtremeClogL1U23
2025-08-07 08:26:20,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 52 minutes, 58 seconds)
2025-08-07 08:28:14,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:28:15,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 180.26784 ± 92.758
2025-08-07 08:28:15,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [107.78578, 163.4273, 159.86128, 118.552155, 105.69362, 153.92174, 132.0342, 352.3881, 370.80093, 138.2133]
2025-08-07 08:28:15,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 32.0, 31.0, 23.0, 21.0, 31.0, 26.0, 69.0, 71.0, 27.0]
2025-08-07 08:28:15,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 51 minutes, 8 seconds)
2025-08-07 08:30:08,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:30:09,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 193.71890 ± 112.206
2025-08-07 08:30:09,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [95.9098, 95.594376, 113.30622, 344.1773, 155.64203, 131.26498, 391.29062, 116.00672, 143.312, 350.6849]
2025-08-07 08:30:09,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 19.0, 22.0, 64.0, 31.0, 27.0, 86.0, 23.0, 28.0, 65.0]
2025-08-07 08:30:09,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 49 minutes, 6 seconds)
2025-08-07 08:32:04,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:32:04,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 169.61923 ± 103.027
2025-08-07 08:32:04,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [164.41942, 91.145966, 146.76022, 107.70055, 361.2626, 380.09506, 112.43958, 135.39085, 101.43743, 95.54069]
2025-08-07 08:32:04,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 18.0, 29.0, 21.0, 68.0, 69.0, 22.0, 27.0, 20.0, 19.0]
2025-08-07 08:32:04,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 47 minutes, 15 seconds)
2025-08-07 08:33:59,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:33:59,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 161.55386 ± 103.136
2025-08-07 08:33:59,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [132.00426, 205.067, 83.68611, 130.72691, 118.50957, 141.58797, 120.18023, 112.46721, 458.0146, 113.29466]
2025-08-07 08:33:59,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 44.0, 17.0, 26.0, 23.0, 28.0, 24.0, 22.0, 86.0, 22.0]
2025-08-07 08:33:59,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 45 minutes, 19 seconds)
2025-08-07 08:35:54,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:35:54,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 149.19855 ± 105.084
2025-08-07 08:35:54,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [96.233025, 116.99075, 103.35692, 113.380295, 115.47686, 84.30257, 458.5312, 165.7008, 113.10941, 124.90365]
2025-08-07 08:35:54,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 24.0, 20.0, 23.0, 23.0, 17.0, 100.0, 33.0, 22.0, 25.0]
2025-08-07 08:35:54,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 43 minutes, 28 seconds)
2025-08-07 08:37:49,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:37:49,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 133.80643 ± 75.259
2025-08-07 08:37:49,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [195.83217, 101.10356, 123.563194, 94.8587, 340.82852, 101.540474, 89.18693, 106.75886, 95.07234, 89.31955]
2025-08-07 08:37:49,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [38.0, 20.0, 24.0, 19.0, 62.0, 20.0, 18.0, 21.0, 19.0, 18.0]
2025-08-07 08:37:49,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 41 minutes, 26 seconds)
2025-08-07 08:39:44,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:39:44,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 187.57658 ± 120.446
2025-08-07 08:39:44,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [177.0149, 339.2836, 488.57578, 129.5804, 122.76405, 108.05095, 130.3275, 89.1636, 123.66681, 167.33842]
2025-08-07 08:39:44,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 75.0, 94.0, 26.0, 24.0, 21.0, 26.0, 18.0, 24.0, 32.0]
2025-08-07 08:39:44,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 39 minutes, 43 seconds)
2025-08-07 08:41:40,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:41:40,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 172.11060 ± 94.805
2025-08-07 08:41:40,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [90.49592, 102.11603, 172.0441, 358.85715, 153.52505, 170.19897, 345.69067, 84.05513, 112.00867, 132.1143]
2025-08-07 08:41:40,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 20.0, 33.0, 66.0, 30.0, 33.0, 68.0, 17.0, 22.0, 26.0]
2025-08-07 08:41:40,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 37 minutes, 56 seconds)
2025-08-07 08:43:35,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:43:35,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 163.79694 ± 88.967
2025-08-07 08:43:35,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [132.01636, 90.44346, 118.146255, 363.80307, 89.02727, 302.99457, 165.23776, 137.39355, 142.65166, 96.2554]
2025-08-07 08:43:35,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 18.0, 23.0, 73.0, 18.0, 59.0, 32.0, 27.0, 28.0, 19.0]
2025-08-07 08:43:35,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 36 minutes, 2 seconds)
2025-08-07 08:45:30,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:45:31,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 175.09500 ± 133.795
2025-08-07 08:45:31,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [508.1619, 88.83436, 357.9286, 126.22279, 124.912964, 125.60176, 106.18524, 108.47891, 96.34955, 108.27375]
2025-08-07 08:45:31,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 18.0, 66.0, 24.0, 25.0, 25.0, 21.0, 23.0, 19.0, 21.0]
2025-08-07 08:45:31,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 34 minutes, 7 seconds)
2025-08-07 08:47:25,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:47:25,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 161.20795 ± 70.889
2025-08-07 08:47:25,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [142.09349, 168.14183, 107.91112, 262.94644, 320.39578, 118.99741, 121.43128, 108.90115, 89.78112, 171.4799]
2025-08-07 08:47:25,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 33.0, 21.0, 61.0, 62.0, 23.0, 24.0, 21.0, 18.0, 33.0]
2025-08-07 08:47:25,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 32 minutes, 9 seconds)
2025-08-07 08:49:20,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:49:20,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 195.29251 ± 138.224
2025-08-07 08:49:20,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [88.70211, 409.6582, 112.03668, 134.75879, 101.79212, 101.634514, 108.597435, 413.7268, 393.03665, 88.98194]
2025-08-07 08:49:20,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 79.0, 22.0, 27.0, 20.0, 20.0, 21.0, 87.0, 79.0, 18.0]
2025-08-07 08:49:20,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 30 minutes, 15 seconds)
2025-08-07 08:51:15,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:51:15,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 127.39172 ± 73.303
2025-08-07 08:51:15,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [95.84747, 95.95448, 95.47017, 136.05516, 114.23379, 343.92078, 94.76206, 107.02527, 100.87899, 89.76906]
2025-08-07 08:51:15,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 19.0, 19.0, 26.0, 22.0, 67.0, 19.0, 21.0, 20.0, 18.0]
2025-08-07 08:51:15,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 28 minutes, 7 seconds)
2025-08-07 08:53:10,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:53:10,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 213.01541 ± 128.375
2025-08-07 08:53:10,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [101.566086, 117.039314, 467.32285, 249.55432, 100.449066, 376.55048, 116.226555, 122.60188, 336.4015, 142.44208]
2025-08-07 08:53:10,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 23.0, 88.0, 47.0, 20.0, 75.0, 23.0, 24.0, 62.0, 28.0]
2025-08-07 08:53:10,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 26 minutes, 15 seconds)
2025-08-07 08:55:05,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:55:06,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 133.77318 ± 41.929
2025-08-07 08:55:06,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [129.28459, 125.22725, 100.07885, 194.73067, 113.76733, 222.1112, 89.19923, 90.74981, 153.34254, 119.2403]
2025-08-07 08:55:06,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 27.0, 20.0, 38.0, 22.0, 43.0, 18.0, 18.0, 30.0, 23.0]
2025-08-07 08:55:06,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 24 minutes, 21 seconds)
2025-08-07 08:57:01,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:57:01,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 182.37376 ± 108.107
2025-08-07 08:57:01,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [134.3264, 119.69244, 373.49142, 116.5721, 128.75714, 106.14907, 96.34304, 279.43488, 373.8404, 95.13071]
2025-08-07 08:57:01,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 23.0, 72.0, 23.0, 25.0, 21.0, 19.0, 52.0, 70.0, 19.0]
2025-08-07 08:57:01,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 22 minutes, 34 seconds)
2025-08-07 08:58:56,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:58:56,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 124.92235 ± 31.744
2025-08-07 08:58:56,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [101.21642, 89.743996, 108.68495, 182.50995, 124.22364, 181.95518, 113.218575, 134.08089, 123.617615, 89.9723]
2025-08-07 08:58:56,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 18.0, 21.0, 35.0, 25.0, 35.0, 22.0, 26.0, 24.0, 18.0]
2025-08-07 08:58:56,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 20 minutes, 37 seconds)
2025-08-07 09:00:51,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:00:51,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 205.17514 ± 156.489
2025-08-07 09:00:51,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [136.4326, 190.3706, 501.04047, 131.89209, 123.125015, 95.02598, 524.4837, 157.56218, 101.80951, 90.009155]
2025-08-07 09:00:51,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 36.0, 94.0, 25.0, 25.0, 19.0, 101.0, 31.0, 20.0, 18.0]
2025-08-07 09:00:51,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 18 minutes, 44 seconds)
2025-08-07 09:02:46,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:02:46,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 105.44100 ± 11.849
2025-08-07 09:02:46,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [105.76479, 89.67718, 95.47892, 111.44051, 107.29966, 122.26922, 104.81086, 102.322754, 126.55918, 88.78699]
2025-08-07 09:02:46,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 18.0, 19.0, 22.0, 21.0, 24.0, 21.0, 20.0, 26.0, 18.0]
2025-08-07 09:02:46,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 16 minutes, 47 seconds)
2025-08-07 09:04:41,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:04:42,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 181.84726 ± 203.140
2025-08-07 09:04:42,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [101.67108, 94.91451, 89.37116, 280.50327, 96.47711, 768.15247, 106.71194, 83.95656, 107.724686, 88.98988]
2025-08-07 09:04:42,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 19.0, 18.0, 56.0, 19.0, 147.0, 21.0, 17.0, 21.0, 18.0]
2025-08-07 09:04:42,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 14 minutes, 53 seconds)
2025-08-07 09:06:36,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:06:36,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 136.99202 ± 59.528
2025-08-07 09:06:36,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [121.50689, 161.09692, 295.9026, 100.40398, 88.8517, 175.03877, 106.83086, 106.19544, 124.46085, 89.63216]
2025-08-07 09:06:36,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 32.0, 57.0, 20.0, 18.0, 34.0, 21.0, 21.0, 24.0, 18.0]
2025-08-07 09:06:36,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 12 minutes, 50 seconds)
2025-08-07 09:08:31,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:08:32,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 199.50058 ± 139.862
2025-08-07 09:08:32,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [395.42505, 94.602036, 402.88184, 89.98217, 104.853836, 154.47119, 104.46005, 97.38715, 434.96707, 115.9755]
2025-08-07 09:08:32,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [75.0, 19.0, 77.0, 18.0, 21.0, 30.0, 21.0, 19.0, 81.0, 23.0]
2025-08-07 09:08:32,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 10 minutes, 58 seconds)
2025-08-07 09:10:27,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:10:28,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 237.62431 ± 161.172
2025-08-07 09:10:28,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [94.7949, 143.60098, 139.5927, 328.03537, 418.76834, 96.45666, 437.43542, 101.33983, 522.07764, 94.14129]
2025-08-07 09:10:28,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 28.0, 27.0, 64.0, 89.0, 19.0, 87.0, 20.0, 101.0, 19.0]
2025-08-07 09:10:28,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1226 [INFO]: New best (237.62) for latency ExtremeClogL1U23
2025-08-07 09:10:28,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 9 minutes, 9 seconds)
2025-08-07 09:12:22,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:12:22,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 161.00826 ± 87.523
2025-08-07 09:12:22,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [101.60858, 95.36388, 106.56649, 374.22366, 281.83627, 152.16382, 136.67075, 117.86657, 113.24328, 130.53925]
2025-08-07 09:12:22,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 19.0, 21.0, 70.0, 51.0, 30.0, 27.0, 23.0, 22.0, 26.0]
2025-08-07 09:12:23,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 7 minutes, 13 seconds)
2025-08-07 09:14:17,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:14:18,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 181.38350 ± 106.267
2025-08-07 09:14:18,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [100.367615, 96.721275, 257.72195, 124.80434, 164.13353, 89.947716, 392.41962, 151.444, 88.95848, 347.3164]
2025-08-07 09:14:18,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 19.0, 50.0, 24.0, 32.0, 18.0, 75.0, 30.0, 18.0, 67.0]
2025-08-07 09:14:18,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 5 minutes, 18 seconds)
2025-08-07 09:16:13,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:16:13,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 144.81711 ± 76.549
2025-08-07 09:16:13,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [106.5698, 305.03256, 95.22829, 282.4277, 96.05786, 89.16935, 148.97641, 90.27096, 110.87561, 123.562614]
2025-08-07 09:16:13,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 60.0, 19.0, 52.0, 19.0, 18.0, 28.0, 18.0, 22.0, 24.0]
2025-08-07 09:16:13,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 3 minutes, 26 seconds)
2025-08-07 09:18:09,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:18:09,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 174.44449 ± 89.832
2025-08-07 09:18:09,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [109.44264, 106.657364, 174.78613, 141.69977, 127.831276, 100.90066, 140.54297, 143.99197, 349.0553, 349.53677]
2025-08-07 09:18:09,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 21.0, 35.0, 28.0, 25.0, 20.0, 29.0, 28.0, 73.0, 71.0]
2025-08-07 09:18:09,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 1 minute, 35 seconds)
2025-08-07 09:20:03,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:20:04,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 208.04573 ± 108.673
2025-08-07 09:20:04,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [276.1763, 340.39224, 105.42756, 113.73454, 108.330635, 144.55254, 327.92566, 95.51262, 393.53427, 174.8708]
2025-08-07 09:20:04,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [59.0, 66.0, 21.0, 22.0, 21.0, 30.0, 59.0, 19.0, 78.0, 34.0]
2025-08-07 09:20:04,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 59 minutes, 30 seconds)
2025-08-07 09:21:59,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:22:00,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 202.76035 ± 131.500
2025-08-07 09:22:00,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [95.51664, 171.32997, 100.674484, 116.23056, 83.97875, 129.97069, 361.5235, 286.4532, 503.29913, 178.6267]
2025-08-07 09:22:00,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 33.0, 20.0, 23.0, 17.0, 26.0, 76.0, 66.0, 99.0, 34.0]
2025-08-07 09:22:00,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 57 minutes, 44 seconds)
2025-08-07 09:23:54,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:23:54,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 198.56792 ± 107.535
2025-08-07 09:23:54,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [122.78237, 308.58374, 128.12856, 297.68945, 110.4596, 89.97157, 327.77472, 95.982124, 128.68877, 375.61835]
2025-08-07 09:23:54,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 62.0, 27.0, 62.0, 22.0, 18.0, 59.0, 19.0, 26.0, 69.0]
2025-08-07 09:23:54,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 55 minutes, 43 seconds)
2025-08-07 09:25:49,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:25:50,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 175.86208 ± 122.322
2025-08-07 09:25:50,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [93.995384, 136.98679, 107.58265, 128.98116, 344.04993, 106.33197, 183.27823, 471.77472, 89.56583, 96.07415]
2025-08-07 09:25:50,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 27.0, 21.0, 26.0, 63.0, 21.0, 35.0, 89.0, 18.0, 19.0]
2025-08-07 09:25:50,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 53 minutes, 47 seconds)
2025-08-07 09:27:44,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:27:45,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 146.83966 ± 93.803
2025-08-07 09:27:45,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [105.95352, 155.86699, 137.99287, 420.81204, 95.57167, 96.51431, 136.87723, 125.94034, 83.8029, 109.06469]
2025-08-07 09:27:45,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 30.0, 27.0, 84.0, 19.0, 19.0, 27.0, 25.0, 17.0, 22.0]
2025-08-07 09:27:45,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 51 minutes, 46 seconds)
2025-08-07 09:29:38,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:29:38,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 158.51070 ± 101.610
2025-08-07 09:29:38,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [135.15942, 359.09088, 138.18716, 101.48901, 100.72903, 96.29553, 83.680756, 105.2093, 105.81703, 359.4488]
2025-08-07 09:29:38,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 70.0, 27.0, 20.0, 20.0, 19.0, 17.0, 21.0, 21.0, 74.0]
2025-08-07 09:29:38,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 49 minutes, 48 seconds)
2025-08-07 09:31:32,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:31:33,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 147.11006 ± 81.495
2025-08-07 09:31:33,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [140.85654, 209.62769, 89.530876, 367.54218, 88.81491, 142.55286, 94.915535, 90.0194, 118.388916, 128.8517]
2025-08-07 09:31:33,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 42.0, 18.0, 72.0, 18.0, 28.0, 19.0, 18.0, 23.0, 25.0]
2025-08-07 09:31:33,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 47 minutes, 44 seconds)
2025-08-07 09:33:27,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:33:28,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 118.57926 ± 17.812
2025-08-07 09:33:28,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [123.13006, 122.697014, 107.84739, 118.3589, 106.67764, 100.765594, 107.889114, 123.95233, 107.900246, 166.57423]
2025-08-07 09:33:28,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 24.0, 21.0, 23.0, 21.0, 20.0, 21.0, 25.0, 21.0, 32.0]
2025-08-07 09:33:28,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 45 minutes, 51 seconds)
2025-08-07 09:35:22,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:35:22,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 193.11719 ± 118.327
2025-08-07 09:35:22,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [365.96182, 90.695946, 94.40402, 362.40396, 131.74971, 105.14599, 153.3224, 95.49214, 147.67058, 384.32538]
2025-08-07 09:35:22,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [68.0, 18.0, 19.0, 68.0, 26.0, 21.0, 30.0, 19.0, 30.0, 74.0]
2025-08-07 09:35:22,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 43 minutes, 53 seconds)
2025-08-07 09:37:17,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:37:17,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 153.52396 ± 81.607
2025-08-07 09:37:17,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [101.2186, 121.896126, 295.33008, 330.6415, 148.7569, 107.22241, 101.95123, 101.29148, 95.89573, 131.03543]
2025-08-07 09:37:17,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 24.0, 59.0, 61.0, 29.0, 21.0, 20.0, 20.0, 19.0, 26.0]
2025-08-07 09:37:17,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 41 minutes, 59 seconds)
2025-08-07 09:39:12,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:39:12,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 150.05063 ± 81.397
2025-08-07 09:39:12,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [297.25546, 121.0692, 120.36883, 110.62787, 97.588264, 110.3131, 96.15761, 83.723175, 322.09274, 141.30997]
2025-08-07 09:39:12,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [58.0, 24.0, 23.0, 22.0, 19.0, 22.0, 19.0, 17.0, 61.0, 27.0]
2025-08-07 09:39:12,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 40 minutes, 10 seconds)
2025-08-07 09:41:07,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:41:07,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 139.03593 ± 84.503
2025-08-07 09:41:07,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [94.247025, 133.64119, 101.29676, 101.07275, 84.03252, 106.17853, 387.3853, 130.33266, 113.55569, 138.61693]
2025-08-07 09:41:07,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 26.0, 20.0, 20.0, 17.0, 21.0, 79.0, 25.0, 22.0, 28.0]
2025-08-07 09:41:07,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 38 minutes, 17 seconds)
2025-08-07 09:43:01,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:43:02,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 140.10886 ± 82.385
2025-08-07 09:43:02,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [89.05328, 119.2139, 89.10958, 273.4924, 94.500656, 90.75545, 88.79367, 108.38989, 118.42952, 329.35022]
2025-08-07 09:43:02,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 24.0, 18.0, 53.0, 19.0, 18.0, 18.0, 21.0, 23.0, 62.0]
2025-08-07 09:43:02,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 36 minutes, 21 seconds)
2025-08-07 09:44:57,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:44:57,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 193.56772 ± 120.544
2025-08-07 09:44:57,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [423.42905, 115.63749, 113.376945, 90.301605, 366.51334, 119.01348, 328.43884, 102.34506, 113.6292, 162.9921]
2025-08-07 09:44:57,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [95.0, 23.0, 22.0, 18.0, 83.0, 23.0, 63.0, 20.0, 22.0, 31.0]
2025-08-07 09:44:57,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 34 minutes, 30 seconds)
2025-08-07 09:46:52,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:46:52,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 179.72849 ± 119.538
2025-08-07 09:46:52,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [150.96404, 416.23477, 125.404976, 93.98545, 416.3639, 139.34026, 95.527725, 133.59258, 107.51148, 118.35964]
2025-08-07 09:46:52,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 82.0, 24.0, 19.0, 86.0, 28.0, 19.0, 26.0, 21.0, 23.0]
2025-08-07 09:46:52,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 32 minutes, 34 seconds)
2025-08-07 09:48:47,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:48:48,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 175.60251 ± 91.941
2025-08-07 09:48:48,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [148.26501, 400.66257, 153.68951, 284.9361, 123.457565, 193.69786, 84.013985, 99.62145, 133.15546, 134.5256]
2025-08-07 09:48:48,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 78.0, 30.0, 53.0, 24.0, 37.0, 17.0, 20.0, 26.0, 26.0]
2025-08-07 09:48:48,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 30 minutes, 42 seconds)
2025-08-07 09:50:42,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:50:42,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 174.01768 ± 104.775
2025-08-07 09:50:42,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [109.58573, 347.5847, 84.009186, 107.96859, 172.09624, 344.02267, 89.19474, 294.11652, 107.55926, 84.03926]
2025-08-07 09:50:42,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 65.0, 17.0, 21.0, 34.0, 70.0, 18.0, 56.0, 21.0, 17.0]
2025-08-07 09:50:42,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 28 minutes, 45 seconds)
2025-08-07 09:52:37,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:52:38,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 221.03067 ± 120.855
2025-08-07 09:52:38,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [158.08636, 330.33997, 101.922745, 353.86545, 107.95858, 178.04117, 118.52697, 361.94263, 89.08925, 410.5337]
2025-08-07 09:52:38,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 58.0, 20.0, 73.0, 21.0, 33.0, 23.0, 67.0, 18.0, 90.0]
2025-08-07 09:52:38,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 26 minutes, 52 seconds)
2025-08-07 09:54:32,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:54:33,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 159.87822 ± 101.822
2025-08-07 09:54:33,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [149.81902, 108.23746, 100.55376, 315.8721, 90.4449, 140.73975, 96.11676, 103.69731, 397.6274, 95.673805]
2025-08-07 09:54:33,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 21.0, 20.0, 65.0, 18.0, 27.0, 19.0, 20.0, 81.0, 19.0]
2025-08-07 09:54:33,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 24 minutes, 55 seconds)
2025-08-07 09:56:27,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:56:28,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 146.95297 ± 67.923
2025-08-07 09:56:28,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [89.41288, 318.30533, 94.64513, 120.03813, 211.12517, 143.15208, 107.83464, 102.20106, 107.675415, 175.13977]
2025-08-07 09:56:28,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 63.0, 19.0, 24.0, 42.0, 28.0, 21.0, 20.0, 21.0, 34.0]
2025-08-07 09:56:28,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 23 minutes, 1 second)
2025-08-07 09:58:23,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:58:24,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 164.53909 ± 105.591
2025-08-07 09:58:24,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [277.23306, 95.45569, 107.534164, 133.38197, 434.98083, 95.89624, 94.728195, 89.21444, 184.28387, 132.68237]
2025-08-07 09:58:24,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [60.0, 19.0, 21.0, 26.0, 94.0, 19.0, 19.0, 18.0, 36.0, 26.0]
2025-08-07 09:58:24,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 21 minutes, 6 seconds)
2025-08-07 10:00:18,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:00:19,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 200.57886 ± 144.871
2025-08-07 10:00:19,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [107.675316, 314.13342, 141.61351, 113.16726, 516.3431, 107.552475, 126.52596, 90.57244, 398.2114, 89.99371]
2025-08-07 10:00:19,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 63.0, 28.0, 22.0, 101.0, 21.0, 25.0, 18.0, 76.0, 18.0]
2025-08-07 10:00:19,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 19 minutes, 12 seconds)
2025-08-07 10:02:13,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:02:14,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 173.72992 ± 110.016
2025-08-07 10:02:14,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [133.53613, 102.708496, 352.33252, 114.75381, 129.5165, 95.742714, 163.8922, 100.83313, 424.04572, 119.93816]
2025-08-07 10:02:14,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 20.0, 72.0, 22.0, 25.0, 19.0, 35.0, 20.0, 93.0, 23.0]
2025-08-07 10:02:14,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 17 minutes, 17 seconds)
2025-08-07 10:04:08,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:04:08,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 203.49455 ± 148.286
2025-08-07 10:04:08,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [117.20557, 112.49038, 131.279, 84.22824, 102.697266, 531.7835, 302.8293, 410.08044, 139.72229, 102.62957]
2025-08-07 10:04:08,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 22.0, 26.0, 17.0, 20.0, 110.0, 69.0, 90.0, 27.0, 20.0]
2025-08-07 10:04:08,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 15 minutes, 21 seconds)
2025-08-07 10:06:04,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:06:05,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 163.56607 ± 93.403
2025-08-07 10:06:05,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [132.34366, 88.85295, 111.123405, 94.12222, 354.46942, 135.33983, 340.33368, 135.17693, 108.64097, 135.25748]
2025-08-07 10:06:05,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 18.0, 22.0, 19.0, 77.0, 26.0, 70.0, 26.0, 21.0, 28.0]
2025-08-07 10:06:05,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 13 minutes, 27 seconds)
2025-08-07 10:07:59,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:07:59,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 182.82019 ± 97.779
2025-08-07 10:07:59,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [256.28314, 145.1347, 359.47073, 107.418816, 135.54543, 358.45605, 136.84306, 119.27993, 120.97075, 88.79928]
2025-08-07 10:07:59,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [50.0, 28.0, 71.0, 21.0, 26.0, 69.0, 26.0, 23.0, 24.0, 18.0]
2025-08-07 10:07:59,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 11 minutes, 30 seconds)
2025-08-07 10:09:54,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:09:54,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 130.38133 ± 48.075
2025-08-07 10:09:54,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [107.82233, 259.97644, 142.31578, 131.7678, 89.874886, 89.70406, 89.63524, 125.112785, 115.67722, 151.92688]
2025-08-07 10:09:54,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 48.0, 28.0, 26.0, 18.0, 18.0, 18.0, 25.0, 23.0, 30.0]
2025-08-07 10:09:54,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes, 35 seconds)
2025-08-07 10:11:51,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:11:51,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 137.59578 ± 68.459
2025-08-07 10:11:51,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [113.4192, 144.65468, 95.41912, 124.45693, 117.02427, 110.4607, 107.37474, 122.3108, 339.27795, 101.559265]
2025-08-07 10:11:51,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 29.0, 19.0, 24.0, 23.0, 22.0, 21.0, 25.0, 62.0, 20.0]
2025-08-07 10:11:51,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 41 seconds)
2025-08-07 10:13:46,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:13:47,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 213.30310 ± 122.535
2025-08-07 10:13:47,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [88.79762, 108.42816, 117.23587, 143.30289, 397.27603, 307.2762, 357.287, 141.5342, 377.2527, 94.640366]
2025-08-07 10:13:47,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 22.0, 23.0, 28.0, 76.0, 63.0, 68.0, 27.0, 72.0, 19.0]
2025-08-07 10:13:47,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 47 seconds)
2025-08-07 10:15:42,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:15:42,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 139.56902 ± 98.370
2025-08-07 10:15:42,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [96.31755, 89.36657, 114.538536, 115.69021, 158.00584, 88.78522, 103.878105, 429.14175, 99.77197, 100.1944]
2025-08-07 10:15:42,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 18.0, 22.0, 23.0, 30.0, 18.0, 20.0, 79.0, 20.0, 20.0]
2025-08-07 10:15:42,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 51 seconds)
2025-08-07 10:17:38,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:17:38,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 165.28363 ± 106.193
2025-08-07 10:17:38,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [95.38068, 107.6868, 194.73175, 323.73715, 95.298836, 138.43163, 407.26184, 88.86881, 111.72105, 89.71772]
2025-08-07 10:17:38,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 21.0, 38.0, 69.0, 19.0, 27.0, 87.0, 18.0, 22.0, 18.0]
2025-08-07 10:17:38,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 55 seconds)
2025-08-07 10:19:34,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:19:34,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1221 [DEBUG]: Total Reward: 178.81453 ± 84.786
2025-08-07 10:19:34,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1222 [DEBUG]: All rewards: [289.9049, 170.88388, 296.64017, 118.61284, 88.65208, 128.55229, 324.92026, 132.30992, 101.30455, 136.36433]
2025-08-07 10:19:34,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1223 [DEBUG]: All trajectory lengths: [54.0, 33.0, 55.0, 23.0, 18.0, 25.0, 75.0, 26.0, 20.0, 26.0]
2025-08-07 10:19:34,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-humanoid):1251 [DEBUG]: Training session finished
