2025-08-07 03:18:40,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc10-humanoid/ExtremeSparseL4U32-bpql-mem32
2025-08-07 03:18:40,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc10-humanoid/ExtremeSparseL4U32-bpql-mem32
2025-08-07 03:18:40,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x154731a8fe90>}
2025-08-07 03:18:40,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1111 [DEBUG]: using device: cuda
2025-08-07 03:18:40,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1133 [INFO]: Creating new trainer
2025-08-07 03:18:40,894 baseline-bpql-noiseperc10-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=920, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-08-07 03:18:40,894 baseline-bpql-noiseperc10-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 03:18:43,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1194 [DEBUG]: Starting training session...
2025-08-07 03:18:43,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 1/100
2025-08-07 03:20:36,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:20:36,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 149.45187 ± 56.200
2025-08-07 03:20:36,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [151.6122, 146.30984, 197.2359, 123.94766, 113.92087, 114.76895, 107.19025, 109.48907, 130.79315, 299.25092]
2025-08-07 03:20:36,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 28.0, 38.0, 24.0, 22.0, 22.0, 21.0, 21.0, 25.0, 61.0]
2025-08-07 03:20:36,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1226 [INFO]: New best (149.45) for latency ExtremeSparseL4U32
2025-08-07 03:20:36,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 6 minutes, 14 seconds)
2025-08-07 03:22:36,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:22:37,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 301.67566 ± 134.630
2025-08-07 03:22:37,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [418.69812, 479.02225, 154.85722, 386.94598, 390.08057, 161.3641, 161.62598, 486.55896, 204.59488, 173.00822]
2025-08-07 03:22:37,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 94.0, 30.0, 70.0, 78.0, 31.0, 31.0, 91.0, 39.0, 34.0]
2025-08-07 03:22:37,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1226 [INFO]: New best (301.68) for latency ExtremeSparseL4U32
2025-08-07 03:22:37,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 10 minutes, 41 seconds)
2025-08-07 03:24:37,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:24:38,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 147.35168 ± 12.244
2025-08-07 03:24:38,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [169.35646, 159.51674, 134.08467, 143.59608, 148.97237, 160.69608, 151.39676, 135.85333, 139.68805, 130.35622]
2025-08-07 03:24:38,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 31.0, 26.0, 28.0, 29.0, 31.0, 29.0, 26.0, 27.0, 25.0]
2025-08-07 03:24:38,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 10 minutes, 55 seconds)
2025-08-07 03:26:38,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:26:38,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 145.28619 ± 21.985
2025-08-07 03:26:38,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [155.31042, 154.5556, 122.71653, 160.18642, 102.35778, 119.3469, 175.62814, 166.3494, 145.14453, 151.2662]
2025-08-07 03:26:38,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 30.0, 24.0, 31.0, 20.0, 23.0, 34.0, 32.0, 28.0, 29.0]
2025-08-07 03:26:38,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 10 minutes, 1 second)
2025-08-07 03:28:39,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:28:39,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 195.91402 ± 85.512
2025-08-07 03:28:39,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [134.98645, 130.48624, 180.6493, 153.6552, 327.39658, 143.34732, 169.4498, 158.78212, 164.4899, 395.8972]
2025-08-07 03:28:39,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 25.0, 35.0, 30.0, 66.0, 28.0, 33.0, 31.0, 32.0, 76.0]
2025-08-07 03:28:39,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 8 minutes, 48 seconds)
2025-08-07 03:30:40,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:30:41,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 187.48344 ± 108.870
2025-08-07 03:30:41,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [167.55028, 166.2564, 172.83224, 164.60132, 144.56306, 135.43153, 203.03372, 503.26355, 108.829506, 108.47284]
2025-08-07 03:30:41,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 32.0, 33.0, 32.0, 28.0, 26.0, 39.0, 93.0, 21.0, 21.0]
2025-08-07 03:30:41,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 9 minutes, 27 seconds)
2025-08-07 03:32:41,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:32:42,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 169.84875 ± 64.050
2025-08-07 03:32:42,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [144.71352, 150.10368, 141.86646, 150.01114, 170.21104, 354.3291, 114.07809, 143.19833, 143.4912, 186.4848]
2025-08-07 03:32:42,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 29.0, 27.0, 29.0, 33.0, 77.0, 22.0, 28.0, 28.0, 36.0]
2025-08-07 03:32:42,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 7 minutes, 33 seconds)
2025-08-07 03:34:43,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:34:44,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 195.19365 ± 151.948
2025-08-07 03:34:44,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [123.15745, 120.46924, 125.48948, 172.84743, 167.71413, 647.58356, 144.4789, 154.90726, 130.0124, 165.27658]
2025-08-07 03:34:44,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 23.0, 24.0, 33.0, 32.0, 134.0, 28.0, 30.0, 25.0, 32.0]
2025-08-07 03:34:44,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 5 minutes, 55 seconds)
2025-08-07 03:36:44,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:36:44,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 187.88512 ± 107.424
2025-08-07 03:36:44,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [135.76451, 171.33237, 155.34908, 101.83996, 451.82385, 141.58908, 129.84918, 332.89987, 149.86646, 108.5369]
2025-08-07 03:36:44,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 33.0, 30.0, 20.0, 83.0, 27.0, 25.0, 63.0, 29.0, 21.0]
2025-08-07 03:36:44,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 3 hours, 3 minutes, 50 seconds)
2025-08-07 03:38:45,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:38:46,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 232.86475 ± 133.536
2025-08-07 03:38:46,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [173.58102, 255.9705, 150.6995, 140.52731, 441.6085, 130.46457, 464.22107, 102.82198, 365.7012, 103.05173]
2025-08-07 03:38:46,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 53.0, 29.0, 27.0, 85.0, 25.0, 86.0, 20.0, 74.0, 20.0]
2025-08-07 03:38:46,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 3 hours, 1 minute, 55 seconds)
2025-08-07 03:40:47,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:40:47,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 160.47198 ± 29.229
2025-08-07 03:40:47,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [140.97734, 120.02549, 212.65202, 174.3448, 150.79463, 161.57297, 130.17828, 140.12277, 206.7152, 167.33635]
2025-08-07 03:40:47,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 23.0, 41.0, 34.0, 29.0, 31.0, 25.0, 27.0, 40.0, 32.0]
2025-08-07 03:40:47,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 59 minutes, 54 seconds)
2025-08-07 03:42:48,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:42:48,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 141.03214 ± 22.313
2025-08-07 03:42:48,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [119.48118, 108.59146, 125.62064, 159.73083, 136.75089, 178.35907, 114.35694, 155.08983, 149.25906, 163.08147]
2025-08-07 03:42:48,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 21.0, 24.0, 31.0, 26.0, 35.0, 22.0, 30.0, 29.0, 31.0]
2025-08-07 03:42:48,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 57 minutes, 57 seconds)
2025-08-07 03:44:50,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:44:50,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 237.48032 ± 140.188
2025-08-07 03:44:50,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [143.97058, 149.9048, 351.43243, 175.18477, 182.63336, 150.37128, 574.89417, 119.13486, 371.87454, 155.40244]
2025-08-07 03:44:50,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 29.0, 68.0, 34.0, 35.0, 29.0, 114.0, 23.0, 71.0, 30.0]
2025-08-07 03:44:50,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 55 minutes, 54 seconds)
2025-08-07 03:46:51,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:46:52,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 200.98491 ± 104.425
2025-08-07 03:46:52,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [435.1023, 170.87097, 144.94739, 138.97989, 376.62994, 167.95027, 123.3709, 152.94846, 169.17506, 129.874]
2025-08-07 03:46:52,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 33.0, 28.0, 27.0, 73.0, 32.0, 24.0, 30.0, 33.0, 25.0]
2025-08-07 03:46:52,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 54 minutes, 13 seconds)
2025-08-07 03:48:52,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:48:53,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 164.47867 ± 66.665
2025-08-07 03:48:53,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [136.49762, 145.4115, 118.83594, 170.82887, 140.82547, 357.23557, 135.1575, 144.01279, 119.39882, 176.5827]
2025-08-07 03:48:53,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 28.0, 23.0, 33.0, 27.0, 67.0, 26.0, 28.0, 23.0, 34.0]
2025-08-07 03:48:53,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 51 minutes, 57 seconds)
2025-08-07 03:50:53,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:50:54,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 177.27432 ± 76.175
2025-08-07 03:50:54,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [199.63226, 149.93224, 190.94456, 102.277336, 388.12137, 164.86649, 175.36766, 135.11475, 151.92734, 114.55926]
2025-08-07 03:50:54,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [39.0, 29.0, 37.0, 20.0, 74.0, 32.0, 34.0, 26.0, 29.0, 22.0]
2025-08-07 03:50:54,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 49 minutes, 53 seconds)
2025-08-07 03:52:55,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:52:56,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 164.73601 ± 59.192
2025-08-07 03:52:56,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [125.2049, 141.03712, 124.051575, 322.1691, 141.79324, 119.510345, 146.38069, 135.62813, 215.6253, 175.9597]
2025-08-07 03:52:56,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 27.0, 24.0, 60.0, 27.0, 23.0, 28.0, 26.0, 42.0, 34.0]
2025-08-07 03:52:56,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 47 minutes, 58 seconds)
2025-08-07 03:54:56,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:54:57,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 258.17810 ± 161.660
2025-08-07 03:54:57,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [166.9882, 130.27892, 479.29257, 571.83685, 150.79861, 166.79971, 443.8334, 149.36101, 114.22402, 208.36769]
2025-08-07 03:54:57,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 25.0, 94.0, 106.0, 29.0, 32.0, 79.0, 29.0, 22.0, 40.0]
2025-08-07 03:54:57,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 45 minutes, 50 seconds)
2025-08-07 03:56:58,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:56:58,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 170.16385 ± 86.345
2025-08-07 03:56:58,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [145.2451, 118.95711, 119.3266, 139.33244, 180.643, 114.44372, 164.39296, 150.35788, 422.31784, 146.62186]
2025-08-07 03:56:58,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 23.0, 23.0, 27.0, 35.0, 22.0, 32.0, 29.0, 91.0, 28.0]
2025-08-07 03:56:58,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 43 minutes, 43 seconds)
2025-08-07 03:58:59,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:59:00,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 249.96758 ± 139.180
2025-08-07 03:59:00,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [146.02252, 146.16374, 124.04569, 405.9493, 526.7, 138.97617, 364.75372, 138.84792, 160.60135, 347.61536]
2025-08-07 03:59:00,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 28.0, 24.0, 85.0, 97.0, 27.0, 80.0, 27.0, 31.0, 73.0]
2025-08-07 03:59:00,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 41 minutes, 53 seconds)
2025-08-07 04:01:01,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:01:02,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 149.96007 ± 21.066
2025-08-07 04:01:02,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [138.88083, 129.83359, 150.96413, 135.92363, 167.22232, 151.13696, 171.32195, 194.15485, 140.39131, 119.771126]
2025-08-07 04:01:02,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 25.0, 29.0, 26.0, 32.0, 29.0, 33.0, 37.0, 27.0, 23.0]
2025-08-07 04:01:02,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 40 minutes)
2025-08-07 04:03:02,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:03:02,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 192.85597 ± 115.962
2025-08-07 04:03:02,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [307.60748, 117.30206, 500.2388, 152.7625, 151.41354, 118.84264, 191.07208, 114.482994, 140.26521, 134.57227]
2025-08-07 04:03:02,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [62.0, 23.0, 95.0, 30.0, 29.0, 23.0, 36.0, 22.0, 27.0, 26.0]
2025-08-07 04:03:02,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 37 minutes, 48 seconds)
2025-08-07 04:05:04,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:05:05,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 204.88208 ± 115.716
2025-08-07 04:05:05,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [151.41585, 128.59322, 162.19418, 419.1375, 148.13303, 165.80844, 150.33672, 153.80554, 449.6564, 119.73995]
2025-08-07 04:05:05,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 25.0, 31.0, 95.0, 29.0, 32.0, 29.0, 30.0, 94.0, 23.0]
2025-08-07 04:05:05,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 35 minutes, 57 seconds)
2025-08-07 04:07:05,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:07:05,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 222.12595 ± 135.930
2025-08-07 04:07:05,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [437.8084, 455.92093, 151.54187, 124.29456, 144.612, 140.04163, 119.80709, 387.43353, 157.13539, 102.66407]
2025-08-07 04:07:05,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 85.0, 29.0, 24.0, 28.0, 27.0, 23.0, 87.0, 30.0, 20.0]
2025-08-07 04:07:05,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 33 minutes, 45 seconds)
2025-08-07 04:09:06,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:09:07,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 171.04713 ± 98.738
2025-08-07 04:09:07,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [447.33966, 124.76382, 144.90012, 113.21928, 113.4724, 170.42992, 117.55121, 124.161995, 234.45712, 120.17573]
2025-08-07 04:09:07,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 24.0, 28.0, 22.0, 22.0, 33.0, 23.0, 24.0, 44.0, 23.0]
2025-08-07 04:09:07,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 31 minutes, 46 seconds)
2025-08-07 04:11:08,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:11:08,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 194.84184 ± 79.403
2025-08-07 04:11:08,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [159.3805, 167.10037, 339.6234, 152.84541, 166.25069, 145.8656, 365.62628, 146.557, 154.73563, 150.43362]
2025-08-07 04:11:08,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 32.0, 67.0, 29.0, 32.0, 28.0, 69.0, 29.0, 30.0, 29.0]
2025-08-07 04:11:08,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 29 minutes, 40 seconds)
2025-08-07 04:13:09,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:13:10,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 174.26384 ± 120.776
2025-08-07 04:13:10,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [534.4333, 129.04364, 150.53333, 129.69875, 160.58875, 139.4918, 119.75569, 140.5235, 119.78091, 118.788795]
2025-08-07 04:13:10,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 25.0, 29.0, 25.0, 31.0, 27.0, 23.0, 27.0, 23.0, 23.0]
2025-08-07 04:13:10,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 27 minutes, 42 seconds)
2025-08-07 04:15:11,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:15:12,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 232.61450 ± 117.868
2025-08-07 04:15:12,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [178.05429, 434.46973, 176.63786, 366.2899, 177.0235, 151.14241, 206.07732, 415.65866, 96.40019, 124.391365]
2025-08-07 04:15:12,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 88.0, 34.0, 69.0, 34.0, 29.0, 40.0, 82.0, 19.0, 24.0]
2025-08-07 04:15:12,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 25 minutes, 37 seconds)
2025-08-07 04:17:12,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:17:12,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 207.67709 ± 119.265
2025-08-07 04:17:12,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [125.80977, 169.37245, 144.61388, 159.26889, 193.63608, 413.14297, 108.97713, 468.5731, 153.38426, 139.99236]
2025-08-07 04:17:12,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 33.0, 28.0, 31.0, 37.0, 76.0, 21.0, 90.0, 29.0, 27.0]
2025-08-07 04:17:12,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 23 minutes, 40 seconds)
2025-08-07 04:19:13,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:19:14,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 224.82674 ± 125.429
2025-08-07 04:19:14,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [155.25267, 403.15604, 124.53312, 420.6601, 421.8487, 133.4407, 174.2702, 130.01137, 145.52367, 139.57072]
2025-08-07 04:19:14,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 77.0, 24.0, 79.0, 80.0, 26.0, 34.0, 25.0, 28.0, 27.0]
2025-08-07 04:19:14,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 21 minutes, 41 seconds)
2025-08-07 04:21:15,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:21:16,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 139.45943 ± 24.796
2025-08-07 04:21:16,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [123.55579, 102.69515, 165.35997, 115.24812, 175.54555, 175.94513, 135.64973, 118.76929, 130.75053, 151.07513]
2025-08-07 04:21:16,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 20.0, 32.0, 22.0, 34.0, 34.0, 26.0, 23.0, 25.0, 29.0]
2025-08-07 04:21:16,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 19 minutes, 40 seconds)
2025-08-07 04:23:16,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:23:17,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 249.21342 ± 153.436
2025-08-07 04:23:17,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [144.64403, 144.45654, 204.25966, 411.97794, 129.41087, 170.4627, 114.365555, 392.1949, 182.7216, 597.6406]
2025-08-07 04:23:17,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 28.0, 39.0, 81.0, 25.0, 33.0, 22.0, 74.0, 35.0, 125.0]
2025-08-07 04:23:17,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 17 minutes, 45 seconds)
2025-08-07 04:25:18,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:25:19,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 190.72418 ± 135.481
2025-08-07 04:25:19,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [124.947426, 151.63914, 195.6253, 591.6997, 171.37581, 129.49055, 144.85411, 141.07393, 114.18285, 142.35292]
2025-08-07 04:25:19,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 29.0, 37.0, 114.0, 33.0, 25.0, 28.0, 27.0, 22.0, 28.0]
2025-08-07 04:25:19,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 15 minutes, 35 seconds)
2025-08-07 04:27:19,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:27:20,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 243.60437 ± 140.131
2025-08-07 04:27:20,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [156.318, 338.7417, 175.89012, 539.6384, 167.57462, 456.2763, 176.73666, 150.11748, 118.8663, 155.88408]
2025-08-07 04:27:20,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 68.0, 34.0, 109.0, 33.0, 87.0, 34.0, 29.0, 23.0, 30.0]
2025-08-07 04:27:20,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 13 minutes, 42 seconds)
2025-08-07 04:29:21,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:29:21,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 165.24217 ± 87.183
2025-08-07 04:29:21,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [407.9759, 119.672646, 152.45085, 196.06305, 119.146454, 108.75562, 138.3562, 96.16256, 118.51118, 195.32726]
2025-08-07 04:29:21,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 23.0, 30.0, 38.0, 23.0, 21.0, 27.0, 19.0, 23.0, 38.0]
2025-08-07 04:29:21,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 11 minutes, 30 seconds)
2025-08-07 04:31:22,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:31:22,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 170.79985 ± 88.736
2025-08-07 04:31:22,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [117.10593, 428.45465, 179.26721, 119.72269, 166.35678, 131.0, 166.81339, 146.05048, 144.70218, 108.52542]
2025-08-07 04:31:22,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 93.0, 34.0, 23.0, 32.0, 25.0, 32.0, 28.0, 28.0, 21.0]
2025-08-07 04:31:22,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 9 minutes, 27 seconds)
2025-08-07 04:33:23,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:33:24,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 221.34753 ± 118.967
2025-08-07 04:33:24,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [166.3412, 155.30974, 162.74034, 439.5095, 475.21594, 149.61447, 182.75446, 184.73209, 141.03741, 156.22012]
2025-08-07 04:33:24,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 30.0, 32.0, 82.0, 92.0, 29.0, 35.0, 36.0, 27.0, 30.0]
2025-08-07 04:33:24,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 7 minutes, 19 seconds)
2025-08-07 04:35:25,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:35:25,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 172.40561 ± 84.100
2025-08-07 04:35:25,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [124.457344, 130.17686, 161.16325, 139.78401, 181.6472, 160.28491, 418.7351, 154.73831, 123.082344, 129.98672]
2025-08-07 04:35:25,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 25.0, 31.0, 27.0, 35.0, 31.0, 80.0, 30.0, 24.0, 25.0]
2025-08-07 04:35:25,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 5 minutes, 24 seconds)
2025-08-07 04:37:26,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:37:27,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 254.05783 ± 162.355
2025-08-07 04:37:27,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [497.58002, 158.79051, 469.1042, 113.0679, 438.57574, 108.59928, 394.42682, 134.22348, 106.611, 119.59905]
2025-08-07 04:37:27,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 31.0, 89.0, 22.0, 83.0, 21.0, 81.0, 26.0, 21.0, 23.0]
2025-08-07 04:37:27,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 3 minutes, 22 seconds)
2025-08-07 04:39:28,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:39:28,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 196.63754 ± 112.017
2025-08-07 04:39:28,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [112.91929, 168.74376, 118.74242, 114.272995, 165.63667, 165.09822, 482.10068, 328.55417, 171.75485, 138.55228]
2025-08-07 04:39:28,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 33.0, 23.0, 22.0, 32.0, 33.0, 104.0, 64.0, 33.0, 27.0]
2025-08-07 04:39:29,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 1 minute, 27 seconds)
2025-08-07 04:41:30,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:41:31,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 206.56303 ± 108.898
2025-08-07 04:41:31,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [364.88727, 159.16336, 160.64064, 174.88487, 156.85863, 113.77802, 208.41731, 134.31798, 463.28494, 129.39725]
2025-08-07 04:41:31,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [72.0, 31.0, 31.0, 34.0, 30.0, 22.0, 41.0, 26.0, 88.0, 25.0]
2025-08-07 04:41:31,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 59 minutes, 37 seconds)
2025-08-07 04:43:31,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:43:31,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 138.07520 ± 25.130
2025-08-07 04:43:31,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [158.5429, 108.81361, 95.95237, 131.25372, 119.743195, 162.50128, 159.03725, 130.38515, 180.19035, 134.33217]
2025-08-07 04:43:31,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 21.0, 19.0, 25.0, 23.0, 32.0, 31.0, 25.0, 35.0, 26.0]
2025-08-07 04:43:31,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 57 minutes, 28 seconds)
2025-08-07 04:45:32,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:45:33,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 239.27621 ± 153.533
2025-08-07 04:45:33,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [114.405334, 162.43858, 177.85652, 145.77744, 416.54468, 166.34775, 390.01172, 577.1858, 112.91114, 129.28311]
2025-08-07 04:45:33,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 31.0, 35.0, 28.0, 81.0, 32.0, 74.0, 113.0, 22.0, 25.0]
2025-08-07 04:45:33,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 55 minutes, 23 seconds)
2025-08-07 04:47:33,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:47:34,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 171.43068 ± 74.675
2025-08-07 04:47:34,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [109.38274, 129.3368, 124.33994, 176.98753, 169.32074, 181.4167, 171.01295, 383.67615, 133.67589, 135.15736]
2025-08-07 04:47:34,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 25.0, 24.0, 34.0, 33.0, 35.0, 33.0, 70.0, 26.0, 26.0]
2025-08-07 04:47:34,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 53 minutes, 17 seconds)
2025-08-07 04:49:34,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:49:35,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 148.85435 ± 28.915
2025-08-07 04:49:35,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [159.97308, 136.4792, 119.04361, 182.06276, 166.75497, 160.09071, 108.85197, 139.9914, 201.3086, 113.987206]
2025-08-07 04:49:35,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 26.0, 23.0, 36.0, 32.0, 31.0, 21.0, 27.0, 39.0, 22.0]
2025-08-07 04:49:35,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 51 minutes, 11 seconds)
2025-08-07 04:51:36,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:51:37,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 193.50853 ± 108.199
2025-08-07 04:51:37,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [153.301, 108.90997, 130.37668, 125.54023, 171.57555, 113.53006, 192.4556, 433.94577, 134.42467, 371.0259]
2025-08-07 04:51:37,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 21.0, 25.0, 24.0, 34.0, 22.0, 37.0, 84.0, 26.0, 72.0]
2025-08-07 04:51:37,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 49 minutes, 7 seconds)
2025-08-07 04:53:38,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:53:38,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 199.81519 ± 86.405
2025-08-07 04:53:38,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [173.14134, 352.27887, 135.67557, 150.21986, 128.51149, 164.83406, 164.95514, 174.74768, 167.13455, 386.65332]
2025-08-07 04:53:38,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 71.0, 26.0, 29.0, 25.0, 32.0, 32.0, 34.0, 32.0, 75.0]
2025-08-07 04:53:38,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 47 minutes, 17 seconds)
2025-08-07 04:55:40,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:55:40,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 148.39488 ± 13.679
2025-08-07 04:55:40,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [124.33767, 150.93481, 155.11401, 160.35544, 150.84442, 164.89688, 131.06602, 157.85826, 158.33444, 130.20686]
2025-08-07 04:55:40,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 29.0, 30.0, 31.0, 29.0, 32.0, 25.0, 31.0, 31.0, 25.0]
2025-08-07 04:55:40,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 45 minutes, 17 seconds)
2025-08-07 04:57:40,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:57:41,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 168.83997 ± 88.768
2025-08-07 04:57:41,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [160.71268, 171.57077, 425.3934, 114.02546, 146.66748, 134.7094, 97.5143, 152.98317, 169.87157, 114.95142]
2025-08-07 04:57:41,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 33.0, 81.0, 22.0, 28.0, 26.0, 19.0, 29.0, 33.0, 22.0]
2025-08-07 04:57:41,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 43 minutes, 12 seconds)
2025-08-07 04:59:42,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:59:43,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 167.25291 ± 15.388
2025-08-07 04:59:43,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [175.24525, 150.34602, 179.05109, 145.74086, 175.5648, 186.5595, 186.83334, 171.27315, 155.63423, 146.28087]
2025-08-07 04:59:43,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 29.0, 35.0, 28.0, 35.0, 36.0, 36.0, 34.0, 30.0, 28.0]
2025-08-07 04:59:43,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 41 minutes, 15 seconds)
2025-08-07 05:01:43,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:01:44,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 207.37183 ± 90.571
2025-08-07 05:01:44,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [173.7924, 177.51611, 400.51868, 135.1099, 158.23494, 265.7994, 339.55426, 165.87373, 122.738495, 134.58026]
2025-08-07 05:01:44,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 34.0, 89.0, 26.0, 31.0, 50.0, 65.0, 32.0, 24.0, 26.0]
2025-08-07 05:01:44,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 39 minutes, 10 seconds)
2025-08-07 05:03:44,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:03:45,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 163.45601 ± 84.193
2025-08-07 05:03:45,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [130.08247, 129.31488, 114.11603, 171.03328, 410.62558, 123.99829, 113.72918, 144.98848, 158.14758, 138.52428]
2025-08-07 05:03:45,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 25.0, 22.0, 33.0, 79.0, 24.0, 22.0, 28.0, 31.0, 27.0]
2025-08-07 05:03:45,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 37 minutes, 2 seconds)
2025-08-07 05:05:46,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:05:47,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 171.77588 ± 105.347
2025-08-07 05:05:47,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [164.57257, 482.50616, 155.30219, 130.69379, 167.12762, 135.13309, 102.12533, 135.31902, 125.0684, 119.910706]
2025-08-07 05:05:47,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 103.0, 30.0, 25.0, 33.0, 26.0, 20.0, 26.0, 24.0, 23.0]
2025-08-07 05:05:47,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 35 minutes)
2025-08-07 05:07:47,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:07:48,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 237.23807 ± 148.137
2025-08-07 05:07:48,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [169.17624, 161.89633, 144.22604, 125.037254, 195.23862, 193.16592, 601.91205, 161.57614, 442.11523, 178.0368]
2025-08-07 05:07:48,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 31.0, 28.0, 24.0, 37.0, 37.0, 110.0, 31.0, 83.0, 35.0]
2025-08-07 05:07:48,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 33 minutes, 6 seconds)
2025-08-07 05:09:48,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:09:49,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 157.58060 ± 25.376
2025-08-07 05:09:49,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [140.01653, 140.23433, 172.95908, 118.86465, 182.14803, 165.17395, 208.80984, 143.76706, 135.2717, 168.56091]
2025-08-07 05:09:49,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 27.0, 33.0, 23.0, 35.0, 32.0, 41.0, 28.0, 26.0, 33.0]
2025-08-07 05:09:49,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 30 minutes, 58 seconds)
2025-08-07 05:11:50,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:11:51,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 154.64568 ± 27.318
2025-08-07 05:11:51,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [161.14998, 186.43239, 140.61409, 197.4519, 129.19325, 108.75, 190.27402, 145.84561, 139.77957, 146.96605]
2025-08-07 05:11:51,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 36.0, 27.0, 38.0, 25.0, 21.0, 37.0, 28.0, 27.0, 28.0]
2025-08-07 05:11:51,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 28 minutes, 56 seconds)
2025-08-07 05:13:52,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:13:52,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 173.96011 ± 80.274
2025-08-07 05:13:52,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [203.03822, 149.05487, 145.1872, 194.45248, 150.0525, 124.269936, 398.7412, 103.30215, 124.64782, 146.85466]
2025-08-07 05:13:52,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [39.0, 29.0, 28.0, 38.0, 29.0, 24.0, 73.0, 20.0, 24.0, 28.0]
2025-08-07 05:13:52,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 27 minutes, 4 seconds)
2025-08-07 05:15:53,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:15:53,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 169.10091 ± 60.893
2025-08-07 05:15:53,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [124.75877, 130.11465, 166.49614, 142.31242, 172.25598, 119.14474, 340.7838, 145.88278, 162.61954, 186.6403]
2025-08-07 05:15:53,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 25.0, 32.0, 27.0, 33.0, 23.0, 67.0, 28.0, 31.0, 37.0]
2025-08-07 05:15:53,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 24 minutes, 57 seconds)
2025-08-07 05:17:54,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:17:55,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 221.00256 ± 131.704
2025-08-07 05:17:55,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [178.194, 186.9191, 539.9408, 144.80818, 129.39693, 139.67105, 178.17162, 150.54079, 151.0239, 411.3593]
2025-08-07 05:17:55,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 36.0, 110.0, 28.0, 25.0, 27.0, 34.0, 29.0, 29.0, 94.0]
2025-08-07 05:17:55,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 22 minutes, 54 seconds)
2025-08-07 05:19:55,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:19:56,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 151.44524 ± 25.567
2025-08-07 05:19:56,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [164.71825, 208.79616, 129.2302, 157.18579, 124.10466, 169.6978, 158.27896, 130.4458, 120.19456, 151.80011]
2025-08-07 05:19:56,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 41.0, 25.0, 30.0, 24.0, 33.0, 31.0, 25.0, 23.0, 29.0]
2025-08-07 05:19:56,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 20 minutes, 55 seconds)
2025-08-07 05:21:57,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:21:58,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 246.27861 ± 130.855
2025-08-07 05:21:58,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [344.8143, 129.52654, 355.4231, 140.08998, 114.72587, 166.98535, 496.1017, 398.2744, 149.59169, 167.25322]
2025-08-07 05:21:58,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [63.0, 25.0, 70.0, 27.0, 22.0, 32.0, 101.0, 77.0, 29.0, 32.0]
2025-08-07 05:21:58,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 18 minutes, 55 seconds)
2025-08-07 05:23:59,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:23:59,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 183.34500 ± 88.561
2025-08-07 05:23:59,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [124.300385, 197.50308, 438.9181, 176.11801, 133.96294, 144.33371, 189.98215, 159.6903, 135.4143, 133.22691]
2025-08-07 05:23:59,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 38.0, 96.0, 34.0, 26.0, 28.0, 37.0, 31.0, 26.0, 26.0]
2025-08-07 05:23:59,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 16 minutes, 51 seconds)
2025-08-07 05:26:00,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:26:01,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 145.42371 ± 23.527
2025-08-07 05:26:01,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [168.86263, 133.12567, 146.80669, 113.68442, 114.085075, 170.9371, 123.93793, 168.03798, 135.77527, 178.98434]
2025-08-07 05:26:01,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 26.0, 28.0, 22.0, 22.0, 33.0, 24.0, 32.0, 26.0, 35.0]
2025-08-07 05:26:01,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 14 minutes, 52 seconds)
2025-08-07 05:28:01,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:28:02,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 248.07730 ± 165.906
2025-08-07 05:28:02,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [151.43977, 136.27379, 150.70967, 149.65388, 431.6812, 636.90106, 382.07935, 129.74593, 140.586, 171.70251]
2025-08-07 05:28:02,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 26.0, 29.0, 29.0, 79.0, 119.0, 72.0, 25.0, 27.0, 33.0]
2025-08-07 05:28:02,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 12 minutes, 54 seconds)
2025-08-07 05:30:03,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:30:03,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 216.23349 ± 135.471
2025-08-07 05:30:03,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [163.79597, 168.26958, 150.40709, 153.7707, 147.61485, 140.99817, 356.5598, 140.14064, 161.99074, 578.78723]
2025-08-07 05:30:03,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 32.0, 29.0, 30.0, 28.0, 27.0, 76.0, 27.0, 31.0, 106.0]
2025-08-07 05:30:03,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 10 minutes, 52 seconds)
2025-08-07 05:32:04,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:32:05,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 175.80368 ± 70.980
2025-08-07 05:32:05,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [136.7546, 385.44684, 146.16383, 161.95892, 148.77542, 179.50844, 140.17978, 166.58159, 146.73203, 145.93527]
2025-08-07 05:32:05,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 74.0, 28.0, 31.0, 29.0, 35.0, 27.0, 32.0, 28.0, 28.0]
2025-08-07 05:32:05,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 8 minutes, 48 seconds)
2025-08-07 05:34:06,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:34:07,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 210.68855 ± 139.087
2025-08-07 05:34:07,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [547.2647, 147.50854, 140.25502, 119.06266, 173.7645, 119.878265, 160.69928, 160.3743, 125.2136, 412.86462]
2025-08-07 05:34:07,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [101.0, 29.0, 27.0, 23.0, 34.0, 23.0, 31.0, 31.0, 24.0, 79.0]
2025-08-07 05:34:07,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 6 minutes, 48 seconds)
2025-08-07 05:36:07,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:36:08,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 146.29245 ± 26.829
2025-08-07 05:36:08,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [165.5512, 175.78127, 108.37895, 139.72295, 156.89555, 103.20748, 113.5547, 176.67851, 156.80159, 166.35231]
2025-08-07 05:36:08,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 34.0, 21.0, 27.0, 30.0, 20.0, 22.0, 34.0, 30.0, 32.0]
2025-08-07 05:36:08,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 4 minutes, 45 seconds)
2025-08-07 05:38:09,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:38:09,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 221.49582 ± 136.700
2025-08-07 05:38:09,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [113.98215, 140.41257, 190.7772, 502.7224, 155.48183, 379.55447, 107.21401, 382.9331, 118.046036, 123.83421]
2025-08-07 05:38:09,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 27.0, 37.0, 98.0, 30.0, 74.0, 21.0, 73.0, 23.0, 24.0]
2025-08-07 05:38:09,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 2 minutes, 43 seconds)
2025-08-07 05:40:10,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:40:11,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 178.87793 ± 73.102
2025-08-07 05:40:11,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [138.5574, 157.55554, 167.77257, 139.63274, 176.36646, 180.8486, 177.77426, 390.59735, 134.79816, 124.87627]
2025-08-07 05:40:11,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 30.0, 33.0, 27.0, 34.0, 35.0, 35.0, 72.0, 26.0, 24.0]
2025-08-07 05:40:11,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 42 seconds)
2025-08-07 05:42:12,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:42:13,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 187.57312 ± 143.056
2025-08-07 05:42:13,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [108.9115, 149.8917, 109.13184, 108.41001, 150.87048, 139.72566, 165.10542, 176.8636, 611.07416, 155.74689]
2025-08-07 05:42:13,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 29.0, 21.0, 21.0, 29.0, 27.0, 32.0, 35.0, 117.0, 30.0]
2025-08-07 05:42:13,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 58 minutes, 45 seconds)
2025-08-07 05:44:14,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:44:14,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 176.04753 ± 66.943
2025-08-07 05:44:14,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [176.10239, 138.87195, 146.59918, 145.40897, 138.76347, 161.5219, 152.83327, 150.7162, 176.52298, 373.13498]
2025-08-07 05:44:14,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 27.0, 28.0, 28.0, 27.0, 31.0, 29.0, 29.0, 34.0, 69.0]
2025-08-07 05:44:14,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 56 minutes, 43 seconds)
2025-08-07 05:46:15,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:46:16,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 255.14658 ± 144.106
2025-08-07 05:46:16,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [124.61971, 129.42674, 114.24767, 420.43814, 398.4947, 508.36484, 208.22806, 152.3122, 125.00321, 370.3306]
2025-08-07 05:46:16,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 25.0, 22.0, 78.0, 84.0, 96.0, 40.0, 29.0, 24.0, 72.0]
2025-08-07 05:46:16,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 54 minutes, 45 seconds)
2025-08-07 05:48:17,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:48:18,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 172.03836 ± 97.980
2025-08-07 05:48:18,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [102.2008, 135.0063, 147.23506, 460.5552, 149.72066, 151.40636, 133.8662, 167.04637, 114.44088, 158.90573]
2025-08-07 05:48:18,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 26.0, 28.0, 87.0, 29.0, 29.0, 26.0, 32.0, 22.0, 31.0]
2025-08-07 05:48:18,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 52 minutes, 42 seconds)
2025-08-07 05:50:19,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:50:19,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 190.25107 ± 91.220
2025-08-07 05:50:19,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [183.342, 160.99838, 177.2981, 458.25464, 149.74774, 135.25488, 188.15862, 140.33327, 137.1669, 171.95616]
2025-08-07 05:50:19,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 31.0, 34.0, 86.0, 29.0, 26.0, 37.0, 27.0, 26.0, 33.0]
2025-08-07 05:50:19,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 50 minutes, 44 seconds)
2025-08-07 05:52:20,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:52:21,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 206.83955 ± 112.091
2025-08-07 05:52:21,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [417.2217, 135.959, 179.24208, 119.639114, 114.81697, 118.66685, 398.2275, 293.09476, 156.27852, 135.24911]
2025-08-07 05:52:21,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 26.0, 34.0, 23.0, 22.0, 23.0, 77.0, 53.0, 30.0, 26.0]
2025-08-07 05:52:21,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 48 minutes, 38 seconds)
2025-08-07 05:54:22,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:54:22,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 220.08493 ± 127.002
2025-08-07 05:54:22,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [119.55187, 166.00873, 171.53824, 176.92793, 524.2851, 135.77373, 167.8617, 407.21457, 185.454, 146.2334]
2025-08-07 05:54:22,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 32.0, 33.0, 34.0, 98.0, 26.0, 33.0, 76.0, 36.0, 28.0]
2025-08-07 05:54:22,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 46 minutes, 36 seconds)
2025-08-07 05:56:23,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:56:24,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 209.90495 ± 141.007
2025-08-07 05:56:24,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [155.61322, 134.69533, 575.8028, 125.48976, 124.94161, 376.63312, 163.98543, 161.08746, 135.55545, 145.24556]
2025-08-07 05:56:24,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 26.0, 108.0, 24.0, 24.0, 69.0, 32.0, 31.0, 26.0, 28.0]
2025-08-07 05:56:24,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 44 minutes, 35 seconds)
2025-08-07 05:58:25,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:58:26,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 183.10892 ± 69.531
2025-08-07 05:58:26,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [163.2974, 169.70132, 130.0824, 223.8423, 378.4318, 152.80241, 172.07971, 153.32703, 135.754, 151.7708]
2025-08-07 05:58:26,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 33.0, 25.0, 43.0, 71.0, 29.0, 33.0, 29.0, 26.0, 29.0]
2025-08-07 05:58:26,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 42 minutes, 34 seconds)
2025-08-07 06:00:27,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:00:27,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 210.70840 ± 130.935
2025-08-07 06:00:27,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [156.39445, 153.43051, 147.5056, 570.90564, 166.90422, 134.42613, 145.1449, 130.73795, 320.89792, 180.73662]
2025-08-07 06:00:27,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 30.0, 29.0, 115.0, 32.0, 26.0, 28.0, 25.0, 63.0, 35.0]
2025-08-07 06:00:27,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 40 minutes, 32 seconds)
2025-08-07 06:02:28,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:02:29,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 245.76944 ± 123.385
2025-08-07 06:02:29,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [133.8714, 195.65137, 128.50508, 412.93677, 129.90593, 130.9135, 168.4851, 354.84964, 364.88837, 437.68723]
2025-08-07 06:02:29,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 38.0, 25.0, 79.0, 25.0, 25.0, 32.0, 74.0, 66.0, 83.0]
2025-08-07 06:02:29,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 38 minutes, 33 seconds)
2025-08-07 06:04:31,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:04:31,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 138.25656 ± 15.893
2025-08-07 06:04:31,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [129.83232, 148.89642, 124.25643, 161.35672, 119.54928, 129.38742, 146.36052, 124.734344, 167.80338, 130.38882]
2025-08-07 06:04:31,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 29.0, 24.0, 31.0, 23.0, 25.0, 28.0, 24.0, 32.0, 25.0]
2025-08-07 06:04:31,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 36 minutes, 31 seconds)
2025-08-07 06:06:31,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:06:32,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 254.00301 ± 140.688
2025-08-07 06:06:32,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [471.73914, 124.46968, 153.52008, 442.4952, 182.8332, 145.53145, 369.47845, 120.01085, 123.9852, 405.96674]
2025-08-07 06:06:32,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 24.0, 30.0, 86.0, 35.0, 28.0, 70.0, 23.0, 24.0, 82.0]
2025-08-07 06:06:32,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 34 minutes, 27 seconds)
2025-08-07 06:08:33,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:08:34,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 178.41690 ± 84.682
2025-08-07 06:08:34,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [178.313, 114.0927, 120.116394, 405.32294, 243.52373, 163.83151, 121.09184, 118.52123, 177.8444, 141.51141]
2025-08-07 06:08:34,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 22.0, 23.0, 78.0, 47.0, 32.0, 23.0, 23.0, 34.0, 27.0]
2025-08-07 06:08:34,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 32 minutes, 26 seconds)
2025-08-07 06:10:35,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:10:36,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 207.16057 ± 99.559
2025-08-07 06:10:36,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [330.63327, 159.51108, 159.07442, 455.50052, 130.15517, 125.46214, 167.90932, 152.38747, 185.47493, 205.49734]
2025-08-07 06:10:36,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [62.0, 31.0, 31.0, 89.0, 25.0, 24.0, 33.0, 29.0, 36.0, 40.0]
2025-08-07 06:10:36,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 30 minutes, 25 seconds)
2025-08-07 06:12:37,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:12:38,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 190.93393 ± 128.600
2025-08-07 06:12:38,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [129.52943, 134.78983, 571.2849, 162.82544, 157.13242, 141.82942, 118.815704, 166.86737, 130.57535, 195.68959]
2025-08-07 06:12:38,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 26.0, 107.0, 31.0, 30.0, 27.0, 23.0, 32.0, 25.0, 38.0]
2025-08-07 06:12:38,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 28 minutes, 22 seconds)
2025-08-07 06:14:38,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:14:39,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 220.18845 ± 116.127
2025-08-07 06:14:39,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [169.68704, 222.98743, 155.5931, 152.9608, 464.44482, 167.90956, 159.1161, 130.39642, 148.25706, 430.53232]
2025-08-07 06:14:39,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 43.0, 30.0, 29.0, 87.0, 33.0, 31.0, 25.0, 29.0, 94.0]
2025-08-07 06:14:39,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 26 minutes, 19 seconds)
2025-08-07 06:16:39,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:16:40,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 194.92004 ± 84.993
2025-08-07 06:16:40,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [182.24898, 364.66486, 139.98978, 177.87169, 165.46953, 113.45702, 128.19423, 160.8965, 161.16658, 355.2414]
2025-08-07 06:16:40,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [36.0, 70.0, 27.0, 35.0, 32.0, 22.0, 25.0, 31.0, 31.0, 65.0]
2025-08-07 06:16:40,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 24 minutes, 18 seconds)
2025-08-07 06:18:41,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:18:41,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 192.48679 ± 126.034
2025-08-07 06:18:41,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [171.6813, 166.0321, 563.07336, 107.94176, 123.885605, 134.59741, 181.22562, 159.19742, 187.2865, 129.9469]
2025-08-07 06:18:41,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 32.0, 113.0, 21.0, 24.0, 26.0, 35.0, 31.0, 36.0, 25.0]
2025-08-07 06:18:41,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 22 minutes, 15 seconds)
2025-08-07 06:20:41,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:20:41,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 177.49170 ± 119.803
2025-08-07 06:20:41,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [151.76117, 206.56444, 135.49153, 108.69986, 113.915276, 108.13698, 527.1924, 125.12573, 146.11853, 151.91107]
2025-08-07 06:20:41,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 40.0, 26.0, 21.0, 22.0, 21.0, 105.0, 24.0, 28.0, 29.0]
2025-08-07 06:20:41,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 20 minutes, 10 seconds)
2025-08-07 06:22:41,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:22:42,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 210.93359 ± 122.684
2025-08-07 06:22:42,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [157.69537, 443.3207, 463.7185, 124.14161, 135.62491, 184.71101, 134.27852, 133.43394, 166.7421, 165.66922]
2025-08-07 06:22:42,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 93.0, 90.0, 24.0, 26.0, 35.0, 26.0, 26.0, 32.0, 32.0]
2025-08-07 06:22:42,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 18 minutes, 8 seconds)
2025-08-07 06:24:41,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:24:42,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 207.56221 ± 113.759
2025-08-07 06:24:42,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [120.35232, 162.71686, 355.51865, 152.88364, 124.66586, 489.25528, 195.05833, 179.83215, 130.4836, 164.85542]
2025-08-07 06:24:42,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 31.0, 68.0, 29.0, 24.0, 92.0, 38.0, 35.0, 25.0, 32.0]
2025-08-07 06:24:42,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 16 minutes, 4 seconds)
2025-08-07 06:26:41,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:26:42,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 181.82292 ± 104.427
2025-08-07 06:26:42,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [156.63757, 108.19585, 165.36542, 96.81609, 395.8235, 114.60871, 124.22301, 166.98685, 374.62753, 114.94467]
2025-08-07 06:26:42,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 21.0, 32.0, 19.0, 79.0, 22.0, 24.0, 33.0, 71.0, 22.0]
2025-08-07 06:26:42,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 14 minutes, 2 seconds)
2025-08-07 06:28:40,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:28:41,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 238.04636 ± 125.868
2025-08-07 06:28:41,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [178.73177, 176.02472, 396.67145, 155.83905, 171.54149, 448.19104, 134.0098, 155.67194, 438.58252, 125.20003]
2025-08-07 06:28:41,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 34.0, 73.0, 30.0, 33.0, 83.0, 26.0, 30.0, 84.0, 24.0]
2025-08-07 06:28:41,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 12 minutes)
2025-08-07 06:30:41,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:30:41,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 156.15782 ± 27.215
2025-08-07 06:30:41,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [179.19807, 128.72385, 125.988846, 177.01936, 125.19812, 204.30031, 144.5287, 129.31116, 169.593, 177.71677]
2025-08-07 06:30:41,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 25.0, 24.0, 34.0, 24.0, 39.0, 28.0, 25.0, 33.0, 34.0]
2025-08-07 06:30:41,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes, 59 seconds)
2025-08-07 06:32:40,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:32:41,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 182.25388 ± 114.931
2025-08-07 06:32:41,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [108.54162, 186.5091, 159.86212, 153.42622, 517.7226, 133.85597, 185.35654, 108.8005, 119.95844, 148.50552]
2025-08-07 06:32:41,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 36.0, 31.0, 30.0, 98.0, 26.0, 36.0, 21.0, 23.0, 29.0]
2025-08-07 06:32:41,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 58 seconds)
2025-08-07 06:34:40,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:34:41,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 133.15508 ± 17.724
2025-08-07 06:34:41,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [118.55276, 114.76362, 164.30688, 139.35927, 108.19406, 124.437965, 155.5945, 145.83168, 120.23196, 140.27812]
2025-08-07 06:34:41,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 22.0, 32.0, 27.0, 21.0, 24.0, 30.0, 28.0, 23.0, 27.0]
2025-08-07 06:34:41,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 59 seconds)
2025-08-07 06:36:40,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:36:41,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 226.52534 ± 120.973
2025-08-07 06:36:41,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [501.74814, 158.5045, 165.95547, 165.27516, 135.3613, 357.29468, 135.09561, 157.12097, 342.73056, 146.16695]
2025-08-07 06:36:41,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [112.0, 31.0, 32.0, 33.0, 26.0, 69.0, 26.0, 30.0, 71.0, 28.0]
2025-08-07 06:36:41,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 59 seconds)
2025-08-07 06:38:40,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:38:40,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 152.04469 ± 15.791
2025-08-07 06:38:40,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [145.41559, 140.0068, 178.65775, 140.5719, 169.02382, 128.84233, 165.46513, 139.01573, 145.66833, 167.77953]
2025-08-07 06:38:40,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 27.0, 34.0, 27.0, 32.0, 25.0, 32.0, 27.0, 28.0, 32.0]
2025-08-07 06:38:40,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 59 seconds)
2025-08-07 06:40:39,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:40:40,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1221 [DEBUG]: Total Reward: 160.24828 ± 94.942
2025-08-07 06:40:40,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1222 [DEBUG]: All rewards: [102.623375, 113.79555, 114.457726, 439.56235, 168.34717, 131.42072, 114.63393, 130.97165, 151.77594, 134.89438]
2025-08-07 06:40:40,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 22.0, 22.0, 83.0, 33.0, 26.0, 22.0, 25.0, 29.0, 26.0]
2025-08-07 06:40:40,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-humanoid):1251 [DEBUG]: Training session finished
