2025-08-07 06:55:33,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc15-humanoid/ExtremeClogL1U23-bpql-mem24
2025-08-07 06:55:33,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc15-humanoid/ExtremeClogL1U23-bpql-mem24
2025-08-07 06:55:33,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x145ddad12550>}
2025-08-07 06:55:33,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1111 [DEBUG]: using device: cuda
2025-08-07 06:55:33,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1133 [INFO]: Creating new trainer
2025-08-07 06:55:33,747 baseline-bpql-noiseperc15-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=784, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-08-07 06:55:33,747 baseline-bpql-noiseperc15-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 06:55:35,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1194 [DEBUG]: Starting training session...
2025-08-07 06:55:35,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 1/100
2025-08-07 06:57:25,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:57:25,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 200.49078 ± 108.891
2025-08-07 06:57:25,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [178.02246, 259.32974, 307.54788, 209.58707, 461.3019, 114.35205, 117.7777, 136.02504, 95.99919, 124.964806]
2025-08-07 06:57:25,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 51.0, 62.0, 40.0, 88.0, 22.0, 23.0, 26.0, 19.0, 24.0]
2025-08-07 06:57:25,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1226 [INFO]: New best (200.49) for latency ExtremeClogL1U23
2025-08-07 06:57:25,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 1 minute, 4 seconds)
2025-08-07 06:59:24,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:59:24,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 210.23010 ± 114.343
2025-08-07 06:59:24,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [261.59778, 356.31247, 119.032326, 91.177086, 162.63855, 89.3477, 162.48105, 133.38756, 284.31314, 442.0134]
2025-08-07 06:59:24,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [53.0, 67.0, 23.0, 18.0, 31.0, 18.0, 32.0, 26.0, 55.0, 95.0]
2025-08-07 06:59:24,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1226 [INFO]: New best (210.23) for latency ExtremeClogL1U23
2025-08-07 06:59:24,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 7 minutes, 1 second)
2025-08-07 07:01:22,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:01:23,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 204.34517 ± 109.714
2025-08-07 07:01:23,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [156.42894, 124.30207, 135.07118, 447.1884, 313.48196, 129.8221, 327.70587, 117.00804, 166.59178, 125.85137]
2025-08-07 07:01:23,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 24.0, 26.0, 87.0, 61.0, 25.0, 64.0, 23.0, 32.0, 24.0]
2025-08-07 07:01:23,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 7 minutes, 20 seconds)
2025-08-07 07:03:22,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:03:22,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 217.24429 ± 106.003
2025-08-07 07:03:22,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [348.26776, 113.77444, 119.610596, 109.173096, 311.9544, 176.12776, 384.21194, 124.493286, 156.17886, 328.65076]
2025-08-07 07:03:22,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 22.0, 23.0, 21.0, 65.0, 34.0, 74.0, 24.0, 30.0, 62.0]
2025-08-07 07:03:22,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1226 [INFO]: New best (217.24) for latency ExtremeClogL1U23
2025-08-07 07:03:22,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 6 minutes, 49 seconds)
2025-08-07 07:05:20,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:05:21,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 180.87057 ± 138.099
2025-08-07 07:05:21,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [136.16982, 113.454704, 136.05367, 560.1955, 118.29275, 112.952194, 113.868095, 301.14337, 97.018776, 119.55683]
2025-08-07 07:05:21,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 22.0, 26.0, 111.0, 23.0, 22.0, 22.0, 61.0, 19.0, 23.0]
2025-08-07 07:05:21,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 5 minutes, 24 seconds)
2025-08-07 07:07:19,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:07:20,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 158.43381 ± 63.749
2025-08-07 07:07:20,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [113.40341, 281.86325, 95.158066, 107.54537, 123.4616, 136.75726, 124.95221, 276.28107, 167.4203, 157.49564]
2025-08-07 07:07:20,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 57.0, 19.0, 21.0, 24.0, 26.0, 24.0, 57.0, 32.0, 30.0]
2025-08-07 07:07:20,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 6 minutes, 15 seconds)
2025-08-07 07:09:19,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:09:20,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 213.61398 ± 127.077
2025-08-07 07:09:20,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [457.72864, 179.58217, 161.85492, 102.625, 358.50845, 383.96255, 113.43669, 114.088844, 96.685, 167.66753]
2025-08-07 07:09:20,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 34.0, 31.0, 20.0, 74.0, 71.0, 22.0, 22.0, 19.0, 33.0]
2025-08-07 07:09:20,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 4 minutes, 29 seconds)
2025-08-07 07:11:19,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:11:20,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 186.66788 ± 114.305
2025-08-07 07:11:20,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [388.29382, 118.629524, 119.561966, 411.09442, 108.568375, 103.06768, 249.26144, 108.98471, 107.80847, 151.40839]
2025-08-07 07:11:20,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [71.0, 23.0, 23.0, 78.0, 21.0, 20.0, 50.0, 21.0, 21.0, 29.0]
2025-08-07 07:11:20,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 2 minutes, 57 seconds)
2025-08-07 07:13:19,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:13:20,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 215.66885 ± 140.588
2025-08-07 07:13:20,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [103.48301, 89.63599, 401.93695, 170.11577, 113.00603, 474.895, 388.51276, 107.430504, 205.95932, 101.71301]
2025-08-07 07:13:20,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 18.0, 76.0, 33.0, 22.0, 90.0, 74.0, 21.0, 40.0, 20.0]
2025-08-07 07:13:20,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 3 hours, 1 minute, 6 seconds)
2025-08-07 07:15:18,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:15:19,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 204.53592 ± 100.388
2025-08-07 07:15:19,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [364.56354, 190.06711, 113.443924, 118.88944, 378.0597, 102.78451, 222.97182, 126.861084, 135.38531, 292.33282]
2025-08-07 07:15:19,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [67.0, 38.0, 22.0, 23.0, 70.0, 20.0, 43.0, 25.0, 26.0, 54.0]
2025-08-07 07:15:19,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 59 minutes, 23 seconds)
2025-08-07 07:17:17,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:17:18,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 259.06064 ± 106.140
2025-08-07 07:17:18,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [348.3813, 422.8766, 152.36746, 313.16006, 124.97737, 130.85329, 232.7977, 159.24147, 373.27292, 332.67825]
2025-08-07 07:17:18,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [66.0, 89.0, 30.0, 59.0, 24.0, 25.0, 45.0, 31.0, 70.0, 73.0]
2025-08-07 07:17:18,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1226 [INFO]: New best (259.06) for latency ExtremeClogL1U23
2025-08-07 07:17:18,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 57 minutes, 34 seconds)
2025-08-07 07:19:16,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:19:17,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 145.70798 ± 36.002
2025-08-07 07:19:17,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [160.10335, 114.34614, 146.0918, 144.98404, 228.82112, 117.66879, 107.563484, 187.62767, 124.1744, 125.69895]
2025-08-07 07:19:17,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 22.0, 28.0, 28.0, 46.0, 23.0, 21.0, 41.0, 24.0, 24.0]
2025-08-07 07:19:17,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 55 minutes, 9 seconds)
2025-08-07 07:21:14,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:21:14,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 132.33073 ± 22.877
2025-08-07 07:21:14,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [96.394775, 106.813, 150.67062, 126.58892, 124.476234, 118.07801, 128.5013, 135.44278, 173.81108, 162.53069]
2025-08-07 07:21:14,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 21.0, 29.0, 25.0, 24.0, 23.0, 25.0, 26.0, 33.0, 32.0]
2025-08-07 07:21:14,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 52 minutes, 29 seconds)
2025-08-07 07:23:12,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:23:12,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 217.33794 ± 117.525
2025-08-07 07:23:12,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [149.17883, 101.70766, 119.618645, 108.28583, 188.39455, 373.85907, 229.91153, 114.49938, 383.90073, 404.0231]
2025-08-07 07:23:12,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 20.0, 24.0, 21.0, 39.0, 80.0, 47.0, 22.0, 75.0, 78.0]
2025-08-07 07:23:13,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 49 minutes, 58 seconds)
2025-08-07 07:25:13,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:25:13,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 189.73474 ± 118.362
2025-08-07 07:25:13,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [131.47722, 140.06671, 102.18528, 425.5352, 109.05652, 418.4242, 125.4928, 188.443, 144.02011, 112.64641]
2025-08-07 07:25:13,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 27.0, 20.0, 81.0, 21.0, 81.0, 24.0, 37.0, 28.0, 22.0]
2025-08-07 07:25:13,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 48 minutes, 24 seconds)
2025-08-07 07:27:12,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:27:13,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 207.57410 ± 136.036
2025-08-07 07:27:13,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [431.09015, 96.236015, 182.32495, 114.11021, 352.25937, 97.4776, 119.285614, 97.321266, 141.50584, 444.12994]
2025-08-07 07:27:13,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 19.0, 35.0, 22.0, 67.0, 19.0, 23.0, 19.0, 27.0, 96.0]
2025-08-07 07:27:13,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 46 minutes, 31 seconds)
2025-08-07 07:29:12,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:29:13,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 270.22003 ± 154.559
2025-08-07 07:29:13,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [123.29251, 399.26007, 140.2563, 130.34052, 466.03073, 468.2487, 115.89718, 443.20117, 95.7866, 319.88657]
2025-08-07 07:29:13,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 74.0, 27.0, 26.0, 86.0, 88.0, 23.0, 84.0, 19.0, 65.0]
2025-08-07 07:29:13,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1226 [INFO]: New best (270.22) for latency ExtremeClogL1U23
2025-08-07 07:29:13,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 44 minutes, 57 seconds)
2025-08-07 07:31:13,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:31:13,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 149.01625 ± 69.907
2025-08-07 07:31:13,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [143.7391, 123.039734, 164.096, 130.87238, 113.66016, 108.295654, 353.1178, 126.60799, 114.07537, 112.658485]
2025-08-07 07:31:13,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 24.0, 32.0, 25.0, 22.0, 21.0, 66.0, 25.0, 22.0, 22.0]
2025-08-07 07:31:13,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 43 minutes, 36 seconds)
2025-08-07 07:33:13,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:33:14,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 294.27280 ± 148.805
2025-08-07 07:33:14,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [506.349, 114.401566, 309.05124, 117.374725, 112.5583, 440.77695, 361.69385, 410.76834, 143.7911, 425.96307]
2025-08-07 07:33:14,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [102.0, 22.0, 63.0, 23.0, 22.0, 80.0, 67.0, 77.0, 28.0, 81.0]
2025-08-07 07:33:14,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1226 [INFO]: New best (294.27) for latency ExtremeClogL1U23
2025-08-07 07:33:14,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 42 minutes, 17 seconds)
2025-08-07 07:35:12,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:35:12,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 235.04338 ± 127.248
2025-08-07 07:35:12,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [469.0509, 330.72537, 322.1933, 354.4665, 118.03593, 119.429634, 101.44909, 296.74176, 136.262, 102.07926]
2025-08-07 07:35:12,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 62.0, 59.0, 65.0, 23.0, 23.0, 20.0, 57.0, 26.0, 20.0]
2025-08-07 07:35:12,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 39 minutes, 46 seconds)
2025-08-07 07:37:12,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:37:12,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 178.26590 ± 109.326
2025-08-07 07:37:12,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [166.06879, 124.2392, 357.09305, 96.182365, 107.60027, 423.7404, 107.84019, 153.23457, 144.11198, 102.54835]
2025-08-07 07:37:12,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 24.0, 66.0, 19.0, 21.0, 81.0, 21.0, 30.0, 28.0, 20.0]
2025-08-07 07:37:12,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 37 minutes, 51 seconds)
2025-08-07 07:39:12,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:39:13,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 258.91699 ± 130.133
2025-08-07 07:39:13,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [130.20084, 125.08573, 439.1287, 151.08913, 117.69258, 404.08496, 318.09708, 139.37314, 354.3566, 410.06113]
2025-08-07 07:39:13,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 24.0, 85.0, 29.0, 23.0, 75.0, 60.0, 27.0, 75.0, 79.0]
2025-08-07 07:39:13,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 35 minutes, 53 seconds)
2025-08-07 07:41:12,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:41:13,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 212.85555 ± 119.903
2025-08-07 07:41:13,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [106.435905, 126.672676, 118.1838, 440.13678, 364.00253, 366.77832, 141.10004, 122.91133, 147.25403, 195.07988]
2025-08-07 07:41:13,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 24.0, 23.0, 84.0, 66.0, 69.0, 27.0, 24.0, 28.0, 38.0]
2025-08-07 07:41:13,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 33 minutes, 52 seconds)
2025-08-07 07:43:12,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:43:12,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 141.38077 ± 56.823
2025-08-07 07:43:12,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [135.69456, 95.863945, 95.847626, 124.197334, 117.78243, 96.370705, 123.56201, 140.1278, 290.09937, 194.26178]
2025-08-07 07:43:12,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 19.0, 19.0, 24.0, 23.0, 19.0, 24.0, 27.0, 59.0, 38.0]
2025-08-07 07:43:12,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 31 minutes, 39 seconds)
2025-08-07 07:45:12,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:45:13,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 198.91997 ± 133.410
2025-08-07 07:45:13,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [120.290665, 470.9602, 135.42764, 120.231766, 113.73843, 101.98402, 400.46933, 313.4932, 103.65883, 108.94572]
2025-08-07 07:45:13,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 88.0, 26.0, 23.0, 22.0, 20.0, 81.0, 58.0, 20.0, 21.0]
2025-08-07 07:45:13,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 30 minutes, 3 seconds)
2025-08-07 07:47:11,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:47:12,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 231.66458 ± 126.626
2025-08-07 07:47:12,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [134.52686, 102.91343, 106.45453, 455.64423, 366.18597, 367.75598, 215.47432, 107.40281, 140.25665, 320.0309]
2025-08-07 07:47:12,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 20.0, 21.0, 86.0, 68.0, 68.0, 42.0, 21.0, 27.0, 60.0]
2025-08-07 07:47:12,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 27 minutes, 56 seconds)
2025-08-07 07:49:12,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:49:12,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 179.58437 ± 120.381
2025-08-07 07:49:12,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [128.10031, 101.35313, 122.883, 139.26195, 136.6917, 103.44279, 450.73218, 127.35715, 102.70448, 383.31693]
2025-08-07 07:49:12,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 20.0, 24.0, 27.0, 27.0, 20.0, 84.0, 25.0, 20.0, 76.0]
2025-08-07 07:49:12,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 25 minutes, 58 seconds)
2025-08-07 07:51:12,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:51:12,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 152.84033 ± 63.512
2025-08-07 07:51:12,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [147.7653, 102.731445, 137.32623, 184.70703, 114.55491, 330.7451, 108.463684, 149.76854, 122.48423, 129.85683]
2025-08-07 07:51:12,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 20.0, 27.0, 35.0, 22.0, 75.0, 21.0, 29.0, 24.0, 25.0]
2025-08-07 07:51:12,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 23 minutes, 51 seconds)
2025-08-07 07:53:12,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:53:13,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 295.20651 ± 118.481
2025-08-07 07:53:13,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [342.08203, 169.40617, 207.91222, 466.48386, 372.38242, 112.74862, 377.98395, 415.0319, 343.26508, 144.7687]
2025-08-07 07:53:13,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [69.0, 33.0, 41.0, 88.0, 72.0, 22.0, 72.0, 77.0, 65.0, 28.0]
2025-08-07 07:53:13,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1226 [INFO]: New best (295.21) for latency ExtremeClogL1U23
2025-08-07 07:53:13,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 22 minutes, 7 seconds)
2025-08-07 07:55:11,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:55:12,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 259.31570 ± 133.623
2025-08-07 07:55:12,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [414.51065, 138.7357, 367.4808, 113.90755, 439.8727, 111.72181, 122.74594, 367.9197, 152.65831, 363.6038]
2025-08-07 07:55:12,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 27.0, 74.0, 22.0, 83.0, 22.0, 24.0, 68.0, 30.0, 67.0]
2025-08-07 07:55:12,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 19 minutes, 54 seconds)
2025-08-07 07:57:11,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:57:12,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 282.72803 ± 178.880
2025-08-07 07:57:12,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [140.96886, 523.4568, 134.45273, 609.60974, 114.11282, 301.8071, 419.15515, 361.77805, 125.75464, 96.18445]
2025-08-07 07:57:12,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 100.0, 26.0, 128.0, 22.0, 63.0, 77.0, 80.0, 24.0, 19.0]
2025-08-07 07:57:12,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 18 minutes, 1 second)
2025-08-07 07:59:11,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:59:12,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 200.09453 ± 138.716
2025-08-07 07:59:12,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [119.84509, 89.911606, 466.5425, 424.35446, 107.87486, 130.46356, 325.59976, 108.968605, 103.340454, 124.044365]
2025-08-07 07:59:12,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 18.0, 85.0, 81.0, 21.0, 25.0, 59.0, 21.0, 20.0, 24.0]
2025-08-07 07:59:12,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 15 minutes, 54 seconds)
2025-08-07 08:01:11,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:01:12,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 209.32935 ± 116.665
2025-08-07 08:01:12,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [172.43347, 97.54999, 179.81842, 173.6998, 430.5564, 445.95905, 151.184, 156.6557, 134.14629, 151.2904]
2025-08-07 08:01:12,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 19.0, 35.0, 34.0, 79.0, 88.0, 29.0, 31.0, 26.0, 29.0]
2025-08-07 08:01:12,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 13 minutes, 59 seconds)
2025-08-07 08:03:11,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:03:12,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 224.48965 ± 129.666
2025-08-07 08:03:12,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [424.73108, 145.33481, 102.939186, 113.81197, 107.11285, 419.78854, 411.16553, 165.77673, 172.37918, 181.85658]
2025-08-07 08:03:12,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 28.0, 20.0, 22.0, 21.0, 77.0, 77.0, 32.0, 33.0, 35.0]
2025-08-07 08:03:12,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 11 minutes, 50 seconds)
2025-08-07 08:05:11,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:05:12,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 195.02615 ± 108.461
2025-08-07 08:05:12,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [128.49052, 108.97109, 361.40543, 103.89768, 205.56352, 105.84657, 336.07648, 366.03107, 102.20336, 131.7758]
2025-08-07 08:05:12,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 21.0, 66.0, 20.0, 39.0, 21.0, 62.0, 72.0, 20.0, 26.0]
2025-08-07 08:05:12,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 9 minutes, 56 seconds)
2025-08-07 08:07:11,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:07:12,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 268.51178 ± 130.949
2025-08-07 08:07:12,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [371.8397, 122.88177, 361.2428, 107.0557, 391.68115, 108.82982, 102.81089, 327.31824, 417.7671, 373.69058]
2025-08-07 08:07:12,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [68.0, 24.0, 67.0, 21.0, 72.0, 21.0, 20.0, 65.0, 76.0, 71.0]
2025-08-07 08:07:12,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 8 minutes, 1 second)
2025-08-07 08:09:12,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:09:12,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 247.04179 ± 144.894
2025-08-07 08:09:12,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [157.09401, 132.95027, 132.68237, 96.53117, 509.33975, 162.65024, 319.8949, 319.0534, 487.45862, 152.76306]
2025-08-07 08:09:12,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 26.0, 26.0, 19.0, 107.0, 31.0, 72.0, 59.0, 92.0, 30.0]
2025-08-07 08:09:12,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 6 minutes, 6 seconds)
2025-08-07 08:11:12,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:11:13,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 254.38705 ± 135.530
2025-08-07 08:11:13,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [103.37731, 448.75476, 119.442276, 153.53168, 350.69257, 143.33153, 381.8931, 89.92599, 375.93707, 376.98447]
2025-08-07 08:11:13,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 92.0, 23.0, 29.0, 66.0, 28.0, 72.0, 18.0, 69.0, 74.0]
2025-08-07 08:11:13,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 4 minutes, 10 seconds)
2025-08-07 08:13:12,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:13:13,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 289.88754 ± 166.454
2025-08-07 08:13:13,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [394.2617, 143.43472, 101.87255, 563.097, 161.81122, 379.7576, 152.36327, 96.676025, 474.35632, 431.245]
2025-08-07 08:13:13,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 28.0, 20.0, 106.0, 31.0, 82.0, 29.0, 19.0, 88.0, 81.0]
2025-08-07 08:13:13,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 2 minutes, 10 seconds)
2025-08-07 08:15:11,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:15:12,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 209.37981 ± 189.739
2025-08-07 08:15:12,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [101.385765, 123.981445, 118.47157, 169.23175, 134.32921, 709.09015, 102.23644, 103.44011, 112.90786, 418.72375]
2025-08-07 08:15:12,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 24.0, 23.0, 33.0, 26.0, 140.0, 20.0, 20.0, 22.0, 78.0]
2025-08-07 08:15:12,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 59 minutes, 58 seconds)
2025-08-07 08:17:11,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:17:12,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 246.76811 ± 163.222
2025-08-07 08:17:12,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [159.78406, 194.03577, 129.2394, 163.28154, 156.13374, 135.12524, 637.4014, 107.346756, 392.3895, 392.94376]
2025-08-07 08:17:12,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 38.0, 25.0, 31.0, 30.0, 26.0, 123.0, 21.0, 73.0, 74.0]
2025-08-07 08:17:12,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 57 minutes, 59 seconds)
2025-08-07 08:19:11,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:19:12,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 189.21494 ± 119.376
2025-08-07 08:19:12,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [151.15309, 135.25705, 123.960976, 438.28577, 138.31126, 102.09187, 411.2408, 124.91422, 165.28468, 101.649666]
2025-08-07 08:19:12,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 26.0, 24.0, 83.0, 27.0, 20.0, 75.0, 24.0, 32.0, 20.0]
2025-08-07 08:19:12,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 55 minutes, 55 seconds)
2025-08-07 08:21:12,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:21:12,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 235.33406 ± 161.342
2025-08-07 08:21:12,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [121.63799, 139.85182, 113.0533, 114.006134, 598.1242, 376.76132, 101.59416, 107.0486, 325.46198, 355.801]
2025-08-07 08:21:12,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 27.0, 22.0, 22.0, 125.0, 74.0, 20.0, 21.0, 72.0, 81.0]
2025-08-07 08:21:12,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 53 minutes, 54 seconds)
2025-08-07 08:23:12,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:23:12,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 301.50070 ± 174.162
2025-08-07 08:23:12,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [140.29884, 112.50165, 408.2457, 130.44193, 330.48743, 464.67026, 108.59566, 256.59845, 650.7289, 412.4383]
2025-08-07 08:23:12,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 22.0, 74.0, 25.0, 62.0, 87.0, 21.0, 50.0, 122.0, 75.0]
2025-08-07 08:23:12,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1226 [INFO]: New best (301.50) for latency ExtremeClogL1U23
2025-08-07 08:23:12,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 51 minutes, 55 seconds)
2025-08-07 08:25:12,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:25:13,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 353.54858 ± 191.827
2025-08-07 08:25:13,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [616.9775, 439.57077, 517.2456, 374.89685, 563.1615, 127.60284, 172.25653, 113.54236, 113.04887, 497.18314]
2025-08-07 08:25:13,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [117.0, 84.0, 95.0, 71.0, 106.0, 25.0, 33.0, 22.0, 22.0, 105.0]
2025-08-07 08:25:13,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1226 [INFO]: New best (353.55) for latency ExtremeClogL1U23
2025-08-07 08:25:13,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 50 minutes, 8 seconds)
2025-08-07 08:27:12,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:27:13,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 292.37775 ± 181.268
2025-08-07 08:27:13,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [118.63518, 160.16434, 364.26355, 317.88712, 589.38367, 560.8419, 164.49179, 103.327446, 448.33853, 96.44403]
2025-08-07 08:27:13,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 31.0, 68.0, 65.0, 114.0, 118.0, 32.0, 20.0, 85.0, 19.0]
2025-08-07 08:27:13,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 48 minutes, 3 seconds)
2025-08-07 08:29:13,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:29:13,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 235.84927 ± 139.861
2025-08-07 08:29:13,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [102.686066, 351.26663, 424.52347, 380.1037, 131.07411, 119.38977, 102.3492, 172.78305, 455.3012, 119.015495]
2025-08-07 08:29:13,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 65.0, 77.0, 71.0, 25.0, 23.0, 20.0, 33.0, 84.0, 23.0]
2025-08-07 08:29:13,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 46 minutes, 12 seconds)
2025-08-07 08:31:13,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:31:13,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 202.29935 ± 113.028
2025-08-07 08:31:13,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [119.54987, 430.0393, 121.37113, 397.79425, 119.4799, 224.3063, 216.31813, 102.02238, 132.81012, 159.3021]
2025-08-07 08:31:13,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 82.0, 23.0, 84.0, 23.0, 42.0, 41.0, 20.0, 26.0, 31.0]
2025-08-07 08:31:13,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 44 minutes, 10 seconds)
2025-08-07 08:33:13,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:33:13,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 216.25203 ± 143.966
2025-08-07 08:33:13,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [89.80568, 139.87288, 162.58614, 133.60262, 494.67392, 119.29201, 443.3802, 118.878654, 113.275894, 347.15216]
2025-08-07 08:33:13,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 27.0, 31.0, 26.0, 91.0, 23.0, 80.0, 23.0, 22.0, 67.0]
2025-08-07 08:33:13,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 42 minutes, 7 seconds)
2025-08-07 08:35:13,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:35:13,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 206.22693 ± 159.401
2025-08-07 08:35:13,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [167.42415, 107.30017, 96.35933, 621.46063, 157.98315, 130.06241, 96.48016, 129.56688, 382.71027, 172.92218]
2025-08-07 08:35:13,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 21.0, 19.0, 116.0, 30.0, 25.0, 19.0, 25.0, 75.0, 33.0]
2025-08-07 08:35:13,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 40 minutes, 6 seconds)
2025-08-07 08:37:13,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:37:13,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 201.69864 ± 156.896
2025-08-07 08:37:13,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [175.2743, 155.77225, 103.35371, 129.57211, 137.83876, 598.9659, 101.834145, 102.13927, 113.33505, 398.90076]
2025-08-07 08:37:13,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 30.0, 20.0, 25.0, 27.0, 111.0, 20.0, 20.0, 22.0, 74.0]
2025-08-07 08:37:13,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 38 minutes, 6 seconds)
2025-08-07 08:39:14,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:39:14,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 198.10428 ± 133.830
2025-08-07 08:39:14,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [130.33452, 489.81503, 167.31177, 149.69415, 435.10828, 123.58494, 106.95463, 107.939995, 139.44176, 130.8576]
2025-08-07 08:39:14,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 92.0, 32.0, 29.0, 81.0, 24.0, 21.0, 21.0, 28.0, 25.0]
2025-08-07 08:39:14,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 36 minutes, 10 seconds)
2025-08-07 08:41:13,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:41:14,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 237.66455 ± 153.291
2025-08-07 08:41:14,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [124.12213, 498.28372, 443.67746, 423.43036, 141.63794, 122.8104, 111.38485, 299.11923, 90.95399, 121.225235]
2025-08-07 08:41:14,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 93.0, 83.0, 77.0, 28.0, 24.0, 22.0, 57.0, 18.0, 24.0]
2025-08-07 08:41:14,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 34 minutes, 4 seconds)
2025-08-07 08:43:13,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:43:13,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 150.23547 ± 60.694
2025-08-07 08:43:13,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [119.78231, 148.01056, 161.96239, 128.84259, 325.50345, 149.14719, 119.84859, 128.74493, 108.216125, 112.29662]
2025-08-07 08:43:13,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 29.0, 31.0, 25.0, 61.0, 29.0, 23.0, 25.0, 21.0, 22.0]
2025-08-07 08:43:13,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 32 minutes, 1 second)
2025-08-07 08:45:13,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:45:14,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 165.84349 ± 131.153
2025-08-07 08:45:14,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [119.02981, 203.19145, 549.88873, 124.10366, 119.453964, 96.60578, 124.09942, 95.95749, 108.31609, 117.78854]
2025-08-07 08:45:14,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 39.0, 112.0, 24.0, 23.0, 19.0, 24.0, 19.0, 21.0, 23.0]
2025-08-07 08:45:14,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 30 minutes, 2 seconds)
2025-08-07 08:47:13,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:47:14,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 288.23813 ± 177.303
2025-08-07 08:47:14,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [142.79343, 114.36549, 388.59967, 113.88354, 107.35992, 440.98737, 442.4487, 133.73726, 376.88016, 621.3259]
2025-08-07 08:47:14,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 22.0, 71.0, 22.0, 21.0, 81.0, 97.0, 26.0, 69.0, 121.0]
2025-08-07 08:47:14,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 28 minutes, 3 seconds)
2025-08-07 08:49:14,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:49:15,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 241.85225 ± 165.439
2025-08-07 08:49:15,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [116.97811, 156.91977, 108.53339, 551.24866, 436.4992, 166.67088, 482.90015, 140.20288, 123.22717, 135.34222]
2025-08-07 08:49:15,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 30.0, 21.0, 102.0, 81.0, 32.0, 90.0, 27.0, 24.0, 26.0]
2025-08-07 08:49:15,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 26 minutes, 6 seconds)
2025-08-07 08:51:15,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:51:15,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 210.19385 ± 128.740
2025-08-07 08:51:15,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [447.08618, 422.99887, 123.45669, 127.8275, 333.08347, 151.65547, 129.50044, 155.66457, 108.8109, 101.85449]
2025-08-07 08:51:15,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 80.0, 24.0, 25.0, 72.0, 29.0, 25.0, 30.0, 21.0, 20.0]
2025-08-07 08:51:15,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 24 minutes, 10 seconds)
2025-08-07 08:53:15,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:53:16,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 196.11955 ± 113.740
2025-08-07 08:53:16,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [143.45119, 139.25185, 346.29742, 135.3346, 112.24909, 361.61624, 106.96048, 112.879036, 107.253624, 395.90192]
2025-08-07 08:53:16,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 27.0, 65.0, 27.0, 22.0, 66.0, 21.0, 22.0, 21.0, 74.0]
2025-08-07 08:53:16,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 22 minutes, 17 seconds)
2025-08-07 08:55:14,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:55:15,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 218.97903 ± 160.320
2025-08-07 08:55:15,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [155.09811, 128.87555, 108.3071, 473.6638, 546.8005, 96.086655, 119.00497, 133.80884, 332.06992, 96.07491]
2025-08-07 08:55:15,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 25.0, 21.0, 94.0, 103.0, 19.0, 23.0, 26.0, 61.0, 19.0]
2025-08-07 08:55:15,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 20 minutes, 7 seconds)
2025-08-07 08:57:15,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:57:15,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 273.97061 ± 178.760
2025-08-07 08:57:15,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [304.32553, 125.11445, 124.89267, 107.909134, 647.0603, 445.51328, 117.869965, 401.12225, 363.61316, 102.28523]
2025-08-07 08:57:15,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [68.0, 25.0, 25.0, 21.0, 123.0, 83.0, 23.0, 90.0, 68.0, 20.0]
2025-08-07 08:57:15,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 18 minutes, 13 seconds)
2025-08-07 08:59:15,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:59:15,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 229.36548 ± 182.975
2025-08-07 08:59:15,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [503.07758, 505.397, 103.53601, 102.09107, 106.79208, 114.53363, 137.92921, 101.503136, 516.67035, 102.12466]
2025-08-07 08:59:15,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 113.0, 20.0, 20.0, 21.0, 22.0, 27.0, 20.0, 97.0, 20.0]
2025-08-07 08:59:15,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 16 minutes, 1 second)
2025-08-07 09:01:15,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:01:16,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 261.30521 ± 137.943
2025-08-07 09:01:16,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [176.6889, 370.18683, 450.38617, 449.9209, 162.70827, 161.18112, 128.331, 150.26096, 438.77728, 124.61066]
2025-08-07 09:01:16,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 71.0, 83.0, 84.0, 31.0, 32.0, 25.0, 29.0, 86.0, 24.0]
2025-08-07 09:01:16,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 14 minutes, 3 seconds)
2025-08-07 09:03:16,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:03:16,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 204.74216 ± 119.400
2025-08-07 09:03:16,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [376.0827, 124.63782, 119.606804, 162.14912, 117.54057, 369.05582, 139.81036, 119.04779, 108.89243, 410.59827]
2025-08-07 09:03:16,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 24.0, 23.0, 31.0, 23.0, 69.0, 27.0, 23.0, 21.0, 74.0]
2025-08-07 09:03:16,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 12 minutes, 6 seconds)
2025-08-07 09:05:16,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:05:16,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 222.34946 ± 118.004
2025-08-07 09:05:16,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [117.064926, 401.88248, 297.2764, 102.56796, 422.45975, 314.04654, 152.16595, 108.30431, 164.00415, 143.72206]
2025-08-07 09:05:16,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 75.0, 57.0, 20.0, 79.0, 57.0, 29.0, 21.0, 32.0, 28.0]
2025-08-07 09:05:16,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 10 minutes, 12 seconds)
2025-08-07 09:07:16,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:07:17,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 217.56638 ± 157.976
2025-08-07 09:07:17,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [133.75348, 118.39873, 121.83929, 101.36427, 161.00662, 630.9386, 130.07503, 328.28915, 135.7282, 314.27072]
2025-08-07 09:07:17,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 23.0, 24.0, 20.0, 31.0, 135.0, 25.0, 61.0, 26.0, 60.0]
2025-08-07 09:07:17,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 8 minutes, 10 seconds)
2025-08-07 09:09:17,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:09:18,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 365.18723 ± 172.407
2025-08-07 09:09:18,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [120.49884, 362.89462, 430.93625, 668.358, 448.36868, 536.72174, 450.36838, 330.94272, 201.09174, 101.69121]
2025-08-07 09:09:18,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 69.0, 80.0, 126.0, 85.0, 101.0, 89.0, 62.0, 38.0, 20.0]
2025-08-07 09:09:18,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1226 [INFO]: New best (365.19) for latency ExtremeClogL1U23
2025-08-07 09:09:18,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 6 minutes, 19 seconds)
2025-08-07 09:11:17,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:11:18,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 178.51254 ± 92.468
2025-08-07 09:11:18,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [114.12617, 107.85405, 111.803856, 113.12662, 147.0901, 138.4965, 336.3643, 376.86188, 187.53564, 151.86626]
2025-08-07 09:11:18,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 21.0, 22.0, 22.0, 28.0, 27.0, 63.0, 76.0, 36.0, 29.0]
2025-08-07 09:11:18,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 4 minutes, 13 seconds)
2025-08-07 09:13:18,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:13:19,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 187.27827 ± 135.441
2025-08-07 09:13:19,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [582.25244, 185.43759, 113.50769, 167.45328, 95.81843, 176.01997, 124.26267, 101.844505, 144.83397, 181.35223]
2025-08-07 09:13:19,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [112.0, 36.0, 22.0, 32.0, 19.0, 33.0, 24.0, 20.0, 28.0, 35.0]
2025-08-07 09:13:19,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 2 minutes, 15 seconds)
2025-08-07 09:15:18,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:15:19,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 271.56750 ± 164.857
2025-08-07 09:15:19,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [101.07522, 90.070885, 467.20538, 430.05948, 108.327484, 520.7042, 308.93658, 130.66783, 414.01685, 144.6108]
2025-08-07 09:15:19,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 18.0, 98.0, 82.0, 21.0, 112.0, 58.0, 25.0, 79.0, 28.0]
2025-08-07 09:15:19,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 15 seconds)
2025-08-07 09:17:19,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:17:19,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 204.04150 ± 115.895
2025-08-07 09:17:19,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [125.579414, 130.67464, 438.16904, 123.924095, 158.49106, 397.63446, 136.36325, 113.361916, 275.76404, 140.45322]
2025-08-07 09:17:19,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 25.0, 87.0, 24.0, 31.0, 75.0, 27.0, 22.0, 52.0, 27.0]
2025-08-07 09:17:19,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 58 minutes, 13 seconds)
2025-08-07 09:19:20,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:19:20,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 146.79953 ± 106.249
2025-08-07 09:19:20,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [158.34926, 124.98878, 85.19302, 96.54726, 107.938576, 118.04767, 460.4087, 107.64381, 95.779045, 113.099304]
2025-08-07 09:19:20,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 24.0, 17.0, 19.0, 21.0, 23.0, 88.0, 21.0, 19.0, 22.0]
2025-08-07 09:19:20,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 56 minutes, 11 seconds)
2025-08-07 09:21:19,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:21:20,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 205.21555 ± 126.151
2025-08-07 09:21:20,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [158.14111, 374.11743, 131.37054, 110.321785, 133.56915, 113.91823, 114.3653, 428.07336, 103.568016, 384.71048]
2025-08-07 09:21:20,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 71.0, 25.0, 22.0, 26.0, 22.0, 22.0, 80.0, 20.0, 76.0]
2025-08-07 09:21:20,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 54 minutes, 10 seconds)
2025-08-07 09:23:20,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:23:21,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 312.57343 ± 180.037
2025-08-07 09:23:21,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [122.74958, 291.66254, 535.7658, 102.94126, 432.9927, 333.60025, 141.90453, 426.198, 624.4623, 113.45757]
2025-08-07 09:23:21,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 56.0, 105.0, 20.0, 80.0, 63.0, 27.0, 78.0, 132.0, 22.0]
2025-08-07 09:23:21,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 52 minutes, 8 seconds)
2025-08-07 09:25:20,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:25:21,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 282.10666 ± 236.921
2025-08-07 09:25:21,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [153.63393, 850.1691, 139.36209, 114.71805, 446.69055, 151.31158, 135.86722, 139.14842, 145.36943, 544.79645]
2025-08-07 09:25:21,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 189.0, 27.0, 22.0, 86.0, 29.0, 26.0, 27.0, 28.0, 102.0]
2025-08-07 09:25:21,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 50 minutes, 10 seconds)
2025-08-07 09:27:20,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:27:21,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 173.94157 ± 97.243
2025-08-07 09:27:21,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [134.15434, 96.42833, 143.85439, 133.41852, 110.69559, 172.98997, 354.52148, 102.02281, 372.8516, 118.47873]
2025-08-07 09:27:21,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 19.0, 28.0, 26.0, 22.0, 33.0, 69.0, 20.0, 76.0, 23.0]
2025-08-07 09:27:21,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 48 minutes, 7 seconds)
2025-08-07 09:29:19,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:29:20,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 179.10078 ± 133.574
2025-08-07 09:29:20,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [114.12131, 187.40305, 113.2325, 119.303276, 570.91626, 128.8827, 118.69973, 108.61976, 188.94333, 140.88599]
2025-08-07 09:29:20,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 37.0, 22.0, 23.0, 108.0, 25.0, 23.0, 21.0, 36.0, 27.0]
2025-08-07 09:29:20,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 45 minutes, 56 seconds)
2025-08-07 09:31:18,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:31:18,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 196.75786 ± 91.432
2025-08-07 09:31:18,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [146.61629, 126.84607, 166.31914, 188.58377, 97.339264, 177.0143, 161.90697, 369.9974, 375.9787, 156.97675]
2025-08-07 09:31:18,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 25.0, 32.0, 36.0, 19.0, 34.0, 31.0, 71.0, 68.0, 30.0]
2025-08-07 09:31:18,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 43 minutes, 54 seconds)
2025-08-07 09:33:17,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:33:18,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 212.31332 ± 141.806
2025-08-07 09:33:18,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [520.1127, 113.72283, 459.76898, 125.14853, 186.20665, 159.33035, 123.90463, 161.84682, 171.64937, 101.44241]
2025-08-07 09:33:18,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 22.0, 88.0, 24.0, 36.0, 31.0, 24.0, 31.0, 33.0, 20.0]
2025-08-07 09:33:18,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 41 minutes, 49 seconds)
2025-08-07 09:35:18,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:35:18,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 154.98232 ± 72.856
2025-08-07 09:35:18,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [116.622696, 121.61523, 124.48523, 146.8144, 135.72661, 95.65742, 160.97746, 112.99846, 363.6543, 171.27141]
2025-08-07 09:35:18,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 24.0, 24.0, 28.0, 26.0, 19.0, 31.0, 22.0, 73.0, 33.0]
2025-08-07 09:35:18,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 39 minutes, 48 seconds)
2025-08-07 09:37:17,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:37:18,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 263.63962 ± 186.256
2025-08-07 09:37:18,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [102.20076, 471.94376, 137.46596, 375.15216, 102.17856, 486.68585, 114.26074, 597.81683, 140.05219, 108.63952]
2025-08-07 09:37:18,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 90.0, 27.0, 70.0, 20.0, 90.0, 22.0, 112.0, 27.0, 21.0]
2025-08-07 09:37:18,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 37 minutes, 48 seconds)
2025-08-07 09:39:17,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:39:18,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 267.89801 ± 205.828
2025-08-07 09:39:18,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [570.6445, 139.07193, 571.53687, 138.96506, 169.70845, 113.04961, 113.48707, 600.80536, 111.78397, 149.92729]
2025-08-07 09:39:18,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [105.0, 27.0, 104.0, 27.0, 33.0, 22.0, 22.0, 110.0, 22.0, 29.0]
2025-08-07 09:39:18,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 35 minutes, 55 seconds)
2025-08-07 09:41:19,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:41:19,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 205.31604 ± 173.790
2025-08-07 09:41:19,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [659.572, 139.81223, 406.62982, 119.90695, 119.58316, 119.01975, 127.94483, 146.04349, 101.51812, 113.13008]
2025-08-07 09:41:19,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [127.0, 27.0, 77.0, 23.0, 23.0, 23.0, 25.0, 28.0, 20.0, 22.0]
2025-08-07 09:41:19,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 34 minutes, 3 seconds)
2025-08-07 09:43:18,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:43:19,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 233.07858 ± 153.747
2025-08-07 09:43:19,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [413.8553, 420.03983, 129.53667, 113.09343, 546.7712, 164.71785, 171.57726, 144.60037, 124.39128, 102.20258]
2025-08-07 09:43:19,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 77.0, 25.0, 22.0, 105.0, 31.0, 33.0, 28.0, 24.0, 20.0]
2025-08-07 09:43:19,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 32 minutes, 2 seconds)
2025-08-07 09:45:19,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:45:19,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 206.49532 ± 135.075
2025-08-07 09:45:19,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [119.70757, 108.96638, 124.63219, 128.07968, 382.2125, 436.33102, 146.2507, 414.69366, 107.49366, 96.58594]
2025-08-07 09:45:19,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 21.0, 24.0, 25.0, 72.0, 81.0, 28.0, 90.0, 21.0, 19.0]
2025-08-07 09:45:19,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 30 minutes, 3 seconds)
2025-08-07 09:47:18,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:47:19,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 277.94403 ± 164.962
2025-08-07 09:47:19,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [101.901665, 173.64903, 118.121765, 390.36276, 141.66238, 555.42505, 119.33546, 530.3586, 365.1618, 283.46167]
2025-08-07 09:47:19,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 33.0, 23.0, 88.0, 27.0, 110.0, 23.0, 101.0, 68.0, 53.0]
2025-08-07 09:47:19,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 28 minutes, 4 seconds)
2025-08-07 09:49:19,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:49:19,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 218.02707 ± 143.904
2025-08-07 09:49:19,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [113.38748, 121.96954, 119.005936, 113.33038, 133.23251, 143.93935, 460.44977, 459.14334, 387.0849, 128.72758]
2025-08-07 09:49:19,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 24.0, 23.0, 22.0, 26.0, 28.0, 85.0, 85.0, 72.0, 25.0]
2025-08-07 09:49:19,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 26 minutes, 2 seconds)
2025-08-07 09:51:20,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:51:20,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 143.20898 ± 62.421
2025-08-07 09:51:20,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [122.36375, 95.519745, 108.30472, 164.48006, 124.01284, 118.47256, 108.693665, 117.990036, 321.30008, 150.95242]
2025-08-07 09:51:20,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 19.0, 21.0, 32.0, 24.0, 23.0, 21.0, 23.0, 59.0, 29.0]
2025-08-07 09:51:20,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 24 minutes, 1 second)
2025-08-07 09:53:20,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:53:21,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 242.67703 ± 192.527
2025-08-07 09:53:21,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [172.2731, 417.35138, 525.86957, 101.73105, 126.725876, 123.16747, 107.882034, 634.0201, 97.02304, 120.72683]
2025-08-07 09:53:21,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 86.0, 99.0, 20.0, 25.0, 24.0, 21.0, 116.0, 19.0, 23.0]
2025-08-07 09:53:21,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 22 minutes, 3 seconds)
2025-08-07 09:55:19,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:55:20,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 202.72997 ± 132.944
2025-08-07 09:55:20,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [149.55937, 485.8168, 444.30225, 174.13365, 125.7898, 108.347534, 119.88636, 157.39444, 147.99112, 114.07854]
2025-08-07 09:55:20,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 91.0, 102.0, 34.0, 24.0, 21.0, 23.0, 30.0, 29.0, 22.0]
2025-08-07 09:55:20,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 20 minutes, 1 second)
2025-08-07 09:57:20,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:57:20,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 154.67288 ± 105.472
2025-08-07 09:57:20,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [96.01132, 118.21545, 102.242546, 463.35556, 133.54395, 108.16489, 120.97898, 113.419754, 182.94864, 107.847755]
2025-08-07 09:57:20,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 23.0, 20.0, 84.0, 26.0, 21.0, 24.0, 22.0, 36.0, 21.0]
2025-08-07 09:57:20,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 18 minutes, 1 second)
2025-08-07 09:59:21,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 09:59:22,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 265.97849 ± 150.519
2025-08-07 09:59:22,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [103.46852, 457.9871, 134.40613, 444.39505, 327.36246, 442.73047, 140.21378, 125.77264, 387.46417, 95.98461]
2025-08-07 09:59:22,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 95.0, 26.0, 85.0, 60.0, 81.0, 27.0, 24.0, 71.0, 19.0]
2025-08-07 09:59:22,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 16 minutes, 3 seconds)
2025-08-07 10:01:21,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:01:21,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 225.93301 ± 198.917
2025-08-07 10:01:21,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [128.82378, 169.70296, 185.16139, 102.29401, 113.880135, 764.0899, 103.103134, 408.247, 114.08118, 169.94667]
2025-08-07 10:01:21,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 33.0, 36.0, 20.0, 22.0, 142.0, 20.0, 84.0, 22.0, 32.0]
2025-08-07 10:01:21,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 14 minutes, 1 second)
2025-08-07 10:03:21,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:03:22,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 250.86850 ± 162.585
2025-08-07 10:03:22,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [127.996574, 108.58339, 123.45684, 374.8534, 181.08487, 404.23804, 112.26448, 577.67224, 102.67171, 395.8638]
2025-08-07 10:03:22,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 21.0, 24.0, 71.0, 35.0, 90.0, 22.0, 123.0, 20.0, 73.0]
2025-08-07 10:03:22,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 12 minutes, 1 second)
2025-08-07 10:05:21,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:05:22,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 274.48691 ± 199.257
2025-08-07 10:05:22,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [675.48663, 467.5879, 103.61342, 123.85387, 429.97772, 130.43587, 140.0783, 114.73336, 108.54087, 450.5613]
2025-08-07 10:05:22,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [123.0, 96.0, 20.0, 24.0, 81.0, 25.0, 27.0, 22.0, 21.0, 96.0]
2025-08-07 10:05:22,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 10 minutes, 1 second)
2025-08-07 10:07:20,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:07:21,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 303.26944 ± 112.579
2025-08-07 10:07:21,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [341.5525, 175.79085, 128.75589, 125.342834, 309.96536, 369.7792, 393.62598, 471.1838, 355.62366, 361.0743]
2025-08-07 10:07:21,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 34.0, 25.0, 24.0, 60.0, 69.0, 72.0, 87.0, 65.0, 65.0]
2025-08-07 10:07:21,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes)
2025-08-07 10:09:20,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:09:20,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 181.94017 ± 161.965
2025-08-07 10:09:20,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [156.34802, 113.32514, 107.58844, 154.9625, 123.931915, 96.636284, 178.37872, 107.20185, 662.04865, 118.98009]
2025-08-07 10:09:20,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 22.0, 21.0, 30.0, 24.0, 19.0, 34.0, 21.0, 141.0, 23.0]
2025-08-07 10:09:20,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 59 seconds)
2025-08-07 10:11:19,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:11:19,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 230.67122 ± 118.641
2025-08-07 10:11:19,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [207.80048, 133.70831, 123.04393, 113.57901, 341.61258, 457.94577, 360.3548, 152.92386, 301.87708, 113.8664]
2025-08-07 10:11:19,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [40.0, 26.0, 24.0, 22.0, 64.0, 85.0, 66.0, 29.0, 60.0, 22.0]
2025-08-07 10:11:19,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 59 seconds)
2025-08-07 10:13:18,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:13:19,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 319.18945 ± 181.039
2025-08-07 10:13:19,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [140.76476, 498.3081, 545.0659, 515.9759, 430.82352, 141.15733, 497.85233, 128.14027, 179.64378, 114.16289]
2025-08-07 10:13:19,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 92.0, 102.0, 96.0, 79.0, 27.0, 90.0, 25.0, 35.0, 22.0]
2025-08-07 10:13:19,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 59 seconds)
2025-08-07 10:15:17,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 10:15:17,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1221 [DEBUG]: Total Reward: 230.68279 ± 186.782
2025-08-07 10:15:17,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1222 [DEBUG]: All rewards: [128.80434, 114.404106, 435.58713, 102.54752, 112.60791, 95.616806, 626.32416, 124.93504, 108.07497, 457.92596]
2025-08-07 10:15:17,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 22.0, 82.0, 20.0, 22.0, 19.0, 126.0, 24.0, 21.0, 83.0]
2025-08-07 10:15:17,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-humanoid):1251 [DEBUG]: Training session finished
