2025-08-07 02:51:49,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc5-humanoid/ExtremeSparseL4U32-bpql-mem32
2025-08-07 02:51:49,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc5-humanoid/ExtremeSparseL4U32-bpql-mem32
2025-08-07 02:51:49,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x14f6b62c3c10>}
2025-08-07 02:51:49,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1111 [DEBUG]: using device: cuda
2025-08-07 02:51:49,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1133 [INFO]: Creating new trainer
2025-08-07 02:51:49,913 baseline-bpql-noiseperc5-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=920, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-08-07 02:51:49,913 baseline-bpql-noiseperc5-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 02:51:52,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1194 [DEBUG]: Starting training session...
2025-08-07 02:51:52,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 1/100
2025-08-07 02:53:48,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:53:49,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 173.43822 ± 30.775
2025-08-07 02:53:49,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [182.1711, 139.44595, 130.31963, 172.4179, 191.366, 149.32176, 157.28375, 173.76498, 196.37883, 241.91238]
2025-08-07 02:53:49,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 27.0, 25.0, 33.0, 37.0, 29.0, 30.0, 34.0, 38.0, 49.0]
2025-08-07 02:53:49,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1226 [INFO]: New best (173.44) for latency ExtremeSparseL4U32
2025-08-07 02:53:49,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 12 minutes, 4 seconds)
2025-08-07 02:55:52,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:55:53,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 170.29176 ± 29.473
2025-08-07 02:55:53,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [155.40657, 166.4197, 144.19293, 141.13846, 161.5741, 155.70181, 155.83136, 205.05939, 175.2765, 242.31694]
2025-08-07 02:55:53,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 32.0, 28.0, 27.0, 31.0, 30.0, 30.0, 40.0, 34.0, 47.0]
2025-08-07 02:55:53,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 16 minutes, 10 seconds)
2025-08-07 02:57:56,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:57:56,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 227.20480 ± 151.323
2025-08-07 02:57:56,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [156.5017, 165.14294, 151.72719, 140.84262, 160.82144, 145.859, 156.91792, 607.363, 430.7721, 156.1003]
2025-08-07 02:57:56,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 32.0, 29.0, 27.0, 31.0, 28.0, 30.0, 116.0, 89.0, 30.0]
2025-08-07 02:57:56,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1226 [INFO]: New best (227.20) for latency ExtremeSparseL4U32
2025-08-07 02:57:56,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 16 minutes, 10 seconds)
2025-08-07 02:59:59,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:00:00,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 170.18144 ± 30.733
2025-08-07 03:00:00,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [166.56772, 167.08064, 135.315, 189.11327, 151.28523, 207.88173, 163.82196, 234.94586, 129.72847, 156.07451]
2025-08-07 03:00:00,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 32.0, 26.0, 37.0, 29.0, 41.0, 32.0, 47.0, 25.0, 30.0]
2025-08-07 03:00:00,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 14 minutes, 52 seconds)
2025-08-07 03:02:03,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:02:03,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 210.04582 ± 104.383
2025-08-07 03:02:03,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [516.823, 180.91064, 151.64294, 146.15851, 165.5281, 192.42088, 182.13026, 166.23279, 224.62076, 173.99036]
2025-08-07 03:02:03,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [108.0, 35.0, 29.0, 28.0, 32.0, 37.0, 35.0, 32.0, 44.0, 34.0]
2025-08-07 03:02:03,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 13 minutes, 31 seconds)
2025-08-07 03:04:07,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:04:07,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 178.32870 ± 65.405
2025-08-07 03:04:07,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [135.29247, 182.16154, 368.76196, 145.40479, 171.75206, 172.04462, 151.32884, 161.20926, 130.6675, 164.66411]
2025-08-07 03:04:07,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 35.0, 72.0, 28.0, 33.0, 33.0, 29.0, 31.0, 25.0, 32.0]
2025-08-07 03:04:07,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 13 minutes, 47 seconds)
2025-08-07 03:06:10,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:06:11,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 194.38435 ± 52.207
2025-08-07 03:06:11,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [340.55115, 177.91084, 186.06897, 217.31633, 188.74472, 184.767, 149.22444, 183.14786, 164.96347, 151.14862]
2025-08-07 03:06:11,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 34.0, 36.0, 42.0, 36.0, 36.0, 29.0, 35.0, 32.0, 29.0]
2025-08-07 03:06:11,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 11 minutes, 34 seconds)
2025-08-07 03:08:13,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:08:14,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 195.37332 ± 109.972
2025-08-07 03:08:14,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [150.09016, 157.73357, 170.85089, 176.70366, 156.3962, 139.5606, 165.4166, 523.67444, 166.8943, 146.41266]
2025-08-07 03:08:14,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 30.0, 33.0, 34.0, 30.0, 27.0, 32.0, 103.0, 32.0, 28.0]
2025-08-07 03:08:14,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 9 minutes, 23 seconds)
2025-08-07 03:10:17,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:10:18,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 210.80688 ± 83.948
2025-08-07 03:10:18,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [124.69174, 170.73402, 192.20259, 254.42848, 139.85039, 185.43585, 146.03522, 166.43222, 377.23288, 351.02542]
2025-08-07 03:10:18,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 33.0, 37.0, 49.0, 27.0, 36.0, 28.0, 32.0, 79.0, 69.0]
2025-08-07 03:10:18,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 3 hours, 7 minutes, 28 seconds)
2025-08-07 03:12:20,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:12:21,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 166.19476 ± 14.967
2025-08-07 03:12:21,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [146.44923, 160.57133, 173.62381, 166.69305, 147.34018, 171.67209, 160.48921, 182.59012, 197.15872, 155.36]
2025-08-07 03:12:21,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 31.0, 34.0, 32.0, 28.0, 33.0, 31.0, 35.0, 38.0, 30.0]
2025-08-07 03:12:21,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 3 hours, 5 minutes, 14 seconds)
2025-08-07 03:14:24,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:14:25,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 208.47620 ± 113.730
2025-08-07 03:14:25,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [291.06125, 170.0373, 170.78561, 524.5903, 175.4289, 130.53151, 129.5805, 181.3439, 155.95789, 155.44489]
2025-08-07 03:14:25,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [57.0, 33.0, 33.0, 112.0, 34.0, 25.0, 25.0, 35.0, 30.0, 30.0]
2025-08-07 03:14:25,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 3 hours, 3 minutes, 12 seconds)
2025-08-07 03:16:28,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:16:28,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 157.49704 ± 19.248
2025-08-07 03:16:28,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [125.04745, 145.3312, 171.05287, 175.22928, 134.84134, 174.85422, 160.36223, 161.64351, 139.86876, 186.73946]
2025-08-07 03:16:28,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 28.0, 33.0, 34.0, 26.0, 34.0, 31.0, 31.0, 27.0, 36.0]
2025-08-07 03:16:28,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 3 hours, 1 minute, 11 seconds)
2025-08-07 03:18:31,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:18:32,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 191.31787 ± 84.904
2025-08-07 03:18:32,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [177.5196, 154.60104, 176.68501, 176.27652, 154.9873, 201.40106, 135.68742, 161.29057, 135.31255, 439.4177]
2025-08-07 03:18:32,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 30.0, 34.0, 34.0, 30.0, 39.0, 26.0, 31.0, 26.0, 91.0]
2025-08-07 03:18:32,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 59 minutes, 17 seconds)
2025-08-07 03:20:35,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:20:36,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 183.33920 ± 90.145
2025-08-07 03:20:36,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [450.2548, 186.2446, 150.29112, 142.06712, 139.61064, 155.2244, 161.93498, 130.02638, 161.29575, 156.44215]
2025-08-07 03:20:36,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [96.0, 36.0, 29.0, 27.0, 27.0, 30.0, 31.0, 25.0, 31.0, 30.0]
2025-08-07 03:20:36,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 57 minutes, 10 seconds)
2025-08-07 03:22:38,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:22:39,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 211.97733 ± 72.131
2025-08-07 03:22:39,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [161.01, 204.2502, 160.26274, 357.9652, 160.30412, 345.99594, 182.11673, 165.35344, 170.89397, 211.62094]
2025-08-07 03:22:39,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 39.0, 31.0, 71.0, 31.0, 68.0, 35.0, 32.0, 33.0, 41.0]
2025-08-07 03:22:39,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 55 minutes, 12 seconds)
2025-08-07 03:24:42,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:24:43,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 172.27600 ± 38.357
2025-08-07 03:24:43,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [161.6616, 196.10919, 171.93013, 272.85046, 135.05495, 151.24368, 149.95631, 188.3369, 139.57231, 156.0444]
2025-08-07 03:24:43,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 38.0, 33.0, 55.0, 26.0, 29.0, 29.0, 37.0, 27.0, 30.0]
2025-08-07 03:24:43,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 53 minutes, 4 seconds)
2025-08-07 03:26:45,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:26:46,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 192.47461 ± 61.925
2025-08-07 03:26:46,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [144.14359, 146.21887, 187.85616, 243.7629, 165.27542, 183.31377, 182.9355, 155.27933, 358.8263, 157.13434]
2025-08-07 03:26:46,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 28.0, 36.0, 48.0, 32.0, 35.0, 35.0, 30.0, 72.0, 30.0]
2025-08-07 03:26:46,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 50 minutes, 56 seconds)
2025-08-07 03:28:49,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:28:50,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 199.37064 ± 55.464
2025-08-07 03:28:50,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [196.07669, 342.1012, 206.71825, 167.3657, 246.09212, 179.96202, 150.5463, 140.39117, 194.29265, 170.16023]
2025-08-07 03:28:50,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [38.0, 70.0, 40.0, 32.0, 50.0, 35.0, 29.0, 27.0, 38.0, 33.0]
2025-08-07 03:28:50,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 48 minutes, 48 seconds)
2025-08-07 03:30:52,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:30:53,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 184.36079 ± 74.242
2025-08-07 03:30:53,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [169.84, 140.20044, 151.71313, 197.3098, 150.66711, 150.13547, 402.47247, 166.10446, 151.36859, 163.79636]
2025-08-07 03:30:53,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 27.0, 29.0, 39.0, 29.0, 29.0, 85.0, 32.0, 29.0, 32.0]
2025-08-07 03:30:53,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 46 minutes, 42 seconds)
2025-08-07 03:32:56,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:32:56,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 203.67781 ± 60.373
2025-08-07 03:32:56,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [181.72906, 165.53731, 209.69514, 180.11208, 263.98685, 155.8561, 355.8736, 174.58295, 208.67326, 140.73184]
2025-08-07 03:32:56,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 32.0, 41.0, 35.0, 51.0, 30.0, 73.0, 34.0, 41.0, 27.0]
2025-08-07 03:32:56,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 44 minutes, 35 seconds)
2025-08-07 03:34:59,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:35:00,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 183.85922 ± 10.317
2025-08-07 03:35:00,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [180.93602, 203.06528, 165.68205, 183.99237, 186.15927, 178.69208, 199.46626, 180.77663, 184.40283, 175.4193]
2025-08-07 03:35:00,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 39.0, 32.0, 36.0, 36.0, 34.0, 39.0, 35.0, 36.0, 34.0]
2025-08-07 03:35:00,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 42 minutes, 29 seconds)
2025-08-07 03:37:03,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:37:03,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 192.84787 ± 41.798
2025-08-07 03:37:03,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [176.90843, 167.3243, 161.02798, 216.04704, 175.30832, 206.4058, 187.9418, 247.31616, 119.19892, 271.00003]
2025-08-07 03:37:03,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 32.0, 31.0, 43.0, 34.0, 40.0, 36.0, 48.0, 23.0, 53.0]
2025-08-07 03:37:03,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 40 minutes, 27 seconds)
2025-08-07 03:39:06,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:39:06,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 185.97629 ± 47.824
2025-08-07 03:39:06,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [156.69594, 160.45317, 180.37183, 165.45999, 145.97539, 322.85794, 186.86325, 177.86003, 196.9269, 166.29831]
2025-08-07 03:39:06,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 31.0, 35.0, 32.0, 28.0, 65.0, 36.0, 34.0, 38.0, 32.0]
2025-08-07 03:39:06,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 38 minutes, 16 seconds)
2025-08-07 03:41:10,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:41:11,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 242.21085 ± 103.536
2025-08-07 03:41:11,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [209.82079, 149.75987, 323.79202, 189.21278, 216.74037, 134.88002, 171.53609, 178.24684, 404.76746, 443.35217]
2025-08-07 03:41:11,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [41.0, 29.0, 69.0, 37.0, 43.0, 26.0, 33.0, 34.0, 82.0, 89.0]
2025-08-07 03:41:11,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1226 [INFO]: New best (242.21) for latency ExtremeSparseL4U32
2025-08-07 03:41:11,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 36 minutes, 25 seconds)
2025-08-07 03:43:13,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:43:14,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 164.11075 ± 24.268
2025-08-07 03:43:14,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [186.38737, 156.95174, 185.13293, 155.12419, 207.94687, 162.65831, 135.20973, 179.66724, 125.071556, 146.95747]
2025-08-07 03:43:14,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [36.0, 30.0, 36.0, 30.0, 40.0, 31.0, 26.0, 35.0, 24.0, 28.0]
2025-08-07 03:43:14,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 34 minutes, 19 seconds)
2025-08-07 03:45:16,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:45:17,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 156.79433 ± 18.195
2025-08-07 03:45:17,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [140.47334, 164.60803, 186.18602, 145.22224, 160.5657, 185.75258, 140.05444, 150.66959, 164.9673, 129.44398]
2025-08-07 03:45:17,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 32.0, 36.0, 28.0, 31.0, 36.0, 27.0, 29.0, 32.0, 25.0]
2025-08-07 03:45:17,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 32 minutes, 9 seconds)
2025-08-07 03:47:20,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:47:20,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 178.35910 ± 27.463
2025-08-07 03:47:20,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [176.06088, 182.59906, 155.2427, 143.7038, 172.28043, 176.37045, 237.32843, 145.72054, 181.41678, 212.8681]
2025-08-07 03:47:20,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 35.0, 30.0, 28.0, 33.0, 34.0, 49.0, 28.0, 35.0, 41.0]
2025-08-07 03:47:20,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 30 minutes, 9 seconds)
2025-08-07 03:49:23,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:49:23,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 179.72206 ± 26.913
2025-08-07 03:49:23,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [220.1256, 161.10161, 181.21248, 169.18518, 165.46289, 155.53802, 218.63022, 214.84175, 140.59836, 170.52437]
2025-08-07 03:49:23,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [42.0, 31.0, 35.0, 33.0, 32.0, 30.0, 43.0, 42.0, 27.0, 33.0]
2025-08-07 03:49:23,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 28 minutes, 3 seconds)
2025-08-07 03:51:26,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:51:27,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 219.53291 ± 98.781
2025-08-07 03:51:27,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [425.03354, 181.2061, 141.27875, 400.0553, 176.98668, 150.76581, 155.0937, 165.77634, 219.39845, 179.73436]
2025-08-07 03:51:27,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 35.0, 27.0, 84.0, 34.0, 29.0, 30.0, 32.0, 42.0, 35.0]
2025-08-07 03:51:27,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 25 minutes, 58 seconds)
2025-08-07 03:53:30,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:53:31,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 174.93202 ± 21.702
2025-08-07 03:53:31,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [196.03082, 185.11479, 160.30322, 159.49692, 172.19171, 182.11388, 144.77875, 224.18936, 160.60915, 164.49158]
2025-08-07 03:53:31,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [38.0, 36.0, 31.0, 31.0, 33.0, 35.0, 28.0, 43.0, 31.0, 32.0]
2025-08-07 03:53:31,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 23 minutes, 55 seconds)
2025-08-07 03:55:33,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:55:34,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 151.70784 ± 11.975
2025-08-07 03:55:34,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [146.43535, 150.87863, 144.28714, 165.65977, 135.59721, 150.43436, 135.65392, 161.83717, 174.81985, 151.47516]
2025-08-07 03:55:34,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 29.0, 28.0, 32.0, 26.0, 29.0, 26.0, 31.0, 34.0, 29.0]
2025-08-07 03:55:34,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 21 minutes, 57 seconds)
2025-08-07 03:57:37,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:57:38,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 190.94904 ± 49.989
2025-08-07 03:57:38,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [170.29555, 182.0747, 167.4835, 156.09128, 186.16592, 186.05667, 180.92923, 172.34845, 338.4861, 169.55893]
2025-08-07 03:57:38,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 35.0, 32.0, 30.0, 36.0, 36.0, 35.0, 33.0, 72.0, 33.0]
2025-08-07 03:57:38,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 19 minutes, 57 seconds)
2025-08-07 03:59:40,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:59:41,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 200.23427 ± 70.164
2025-08-07 03:59:41,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [150.74391, 212.85027, 209.03441, 150.84395, 181.67836, 398.4019, 159.674, 211.13947, 170.99075, 156.98572]
2025-08-07 03:59:41,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 41.0, 40.0, 29.0, 35.0, 78.0, 31.0, 41.0, 33.0, 30.0]
2025-08-07 03:59:41,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 17 minutes, 55 seconds)
2025-08-07 04:01:44,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:01:44,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 191.48228 ± 64.240
2025-08-07 04:01:44,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [156.895, 144.82362, 145.6741, 350.434, 172.31255, 140.68797, 236.12437, 245.64014, 140.95137, 181.27966]
2025-08-07 04:01:44,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 28.0, 28.0, 69.0, 33.0, 27.0, 46.0, 49.0, 27.0, 35.0]
2025-08-07 04:01:44,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 15 minutes, 45 seconds)
2025-08-07 04:03:47,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:03:48,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 220.50041 ± 93.532
2025-08-07 04:03:48,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [212.3562, 173.24504, 145.5278, 458.89044, 165.88127, 151.42706, 140.32759, 278.20706, 286.97125, 192.17033]
2025-08-07 04:03:48,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [41.0, 33.0, 28.0, 92.0, 32.0, 29.0, 27.0, 54.0, 58.0, 37.0]
2025-08-07 04:03:48,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 13 minutes, 43 seconds)
2025-08-07 04:05:51,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:05:51,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 208.74788 ± 90.250
2025-08-07 04:05:51,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [164.28297, 165.0125, 170.90698, 462.5185, 140.31961, 170.76974, 196.39651, 201.65744, 259.7692, 155.84538]
2025-08-07 04:05:51,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 32.0, 33.0, 101.0, 27.0, 33.0, 38.0, 39.0, 50.0, 30.0]
2025-08-07 04:05:51,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 11 minutes, 41 seconds)
2025-08-07 04:07:54,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:07:55,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 161.75513 ± 9.620
2025-08-07 04:07:55,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [172.35928, 145.48325, 155.75499, 159.60712, 160.62491, 164.91322, 151.00694, 180.64621, 166.52568, 160.62968]
2025-08-07 04:07:55,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 28.0, 30.0, 31.0, 31.0, 32.0, 29.0, 35.0, 32.0, 31.0]
2025-08-07 04:07:55,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 9 minutes, 33 seconds)
2025-08-07 04:09:57,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:09:58,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 179.14700 ± 46.953
2025-08-07 04:09:58,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [304.8422, 140.56775, 207.98601, 145.82281, 166.7296, 165.91136, 191.95576, 135.27715, 171.77551, 160.60178]
2025-08-07 04:09:58,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [62.0, 27.0, 40.0, 28.0, 32.0, 32.0, 37.0, 26.0, 33.0, 31.0]
2025-08-07 04:09:58,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 7 minutes, 27 seconds)
2025-08-07 04:12:00,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:12:01,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 171.56686 ± 21.415
2025-08-07 04:12:01,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [226.65466, 179.93732, 170.0729, 160.7706, 181.68735, 145.55942, 164.77502, 171.44614, 149.16489, 165.60039]
2025-08-07 04:12:01,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [45.0, 35.0, 33.0, 31.0, 35.0, 28.0, 32.0, 33.0, 29.0, 32.0]
2025-08-07 04:12:01,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 5 minutes, 18 seconds)
2025-08-07 04:14:03,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:14:03,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 189.94864 ± 75.487
2025-08-07 04:14:03,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [166.87526, 149.22842, 146.01646, 209.04619, 150.91353, 156.13367, 129.7969, 199.18472, 187.58394, 404.7071]
2025-08-07 04:14:03,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 29.0, 28.0, 41.0, 29.0, 30.0, 25.0, 39.0, 36.0, 80.0]
2025-08-07 04:14:03,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 3 minutes, 8 seconds)
2025-08-07 04:16:06,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:16:07,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 186.31920 ± 29.568
2025-08-07 04:16:07,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [161.32652, 176.1399, 203.55444, 196.16687, 193.7337, 165.50116, 190.82286, 150.38951, 259.7918, 165.76526]
2025-08-07 04:16:07,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 34.0, 39.0, 38.0, 38.0, 32.0, 37.0, 29.0, 50.0, 32.0]
2025-08-07 04:16:07,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 1 minute, 6 seconds)
2025-08-07 04:18:10,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:18:10,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 163.78601 ± 21.622
2025-08-07 04:18:10,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [150.50575, 156.3368, 139.74634, 165.87234, 175.16287, 150.1866, 178.765, 157.13965, 145.95126, 218.19348]
2025-08-07 04:18:10,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 30.0, 27.0, 32.0, 34.0, 29.0, 35.0, 30.0, 28.0, 42.0]
2025-08-07 04:18:10,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 58 minutes, 58 seconds)
2025-08-07 04:20:13,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:20:14,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 188.55412 ± 60.483
2025-08-07 04:20:14,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [124.528465, 176.74937, 180.83192, 216.97206, 140.27898, 160.94275, 215.31506, 344.10532, 130.07774, 195.73958]
2025-08-07 04:20:14,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 34.0, 35.0, 44.0, 27.0, 31.0, 42.0, 67.0, 25.0, 38.0]
2025-08-07 04:20:14,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 57 minutes, 1 second)
2025-08-07 04:22:16,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:22:17,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 164.24184 ± 22.164
2025-08-07 04:22:17,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [154.45688, 208.39696, 161.5836, 171.19469, 163.43752, 167.00294, 174.2512, 182.02527, 134.85323, 125.21606]
2025-08-07 04:22:17,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 41.0, 31.0, 33.0, 32.0, 32.0, 34.0, 35.0, 26.0, 24.0]
2025-08-07 04:22:17,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 55 minutes, 2 seconds)
2025-08-07 04:24:19,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:24:20,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 212.63998 ± 95.406
2025-08-07 04:24:20,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [184.4228, 166.396, 151.7817, 134.82962, 156.2632, 269.0977, 348.61597, 150.8427, 423.5213, 140.62889]
2025-08-07 04:24:20,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [36.0, 32.0, 29.0, 26.0, 30.0, 53.0, 70.0, 29.0, 92.0, 27.0]
2025-08-07 04:24:20,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 53 minutes, 1 second)
2025-08-07 04:26:22,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:26:23,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 169.12645 ± 20.321
2025-08-07 04:26:23,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [169.36147, 204.61293, 150.82388, 140.95552, 200.2226, 165.30173, 156.07367, 167.1655, 151.27798, 185.46922]
2025-08-07 04:26:23,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 40.0, 29.0, 27.0, 39.0, 32.0, 30.0, 32.0, 29.0, 36.0]
2025-08-07 04:26:23,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 50 minutes, 49 seconds)
2025-08-07 04:28:26,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:28:26,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 172.92813 ± 18.681
2025-08-07 04:28:26,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [194.6469, 161.38637, 210.5886, 186.8649, 174.47624, 156.5321, 150.68062, 155.21936, 160.27391, 178.61232]
2025-08-07 04:28:26,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [38.0, 31.0, 41.0, 36.0, 34.0, 30.0, 29.0, 30.0, 31.0, 35.0]
2025-08-07 04:28:26,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 48 minutes, 51 seconds)
2025-08-07 04:30:28,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:30:29,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 166.12367 ± 23.964
2025-08-07 04:30:29,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [205.23924, 156.8777, 145.38774, 156.68452, 140.87119, 206.58292, 191.79442, 146.42256, 149.84227, 161.53406]
2025-08-07 04:30:29,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [39.0, 30.0, 28.0, 30.0, 27.0, 41.0, 37.0, 28.0, 29.0, 31.0]
2025-08-07 04:30:29,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 46 minutes, 40 seconds)
2025-08-07 04:32:32,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:32:33,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 167.80103 ± 34.888
2025-08-07 04:32:33,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [150.97234, 182.15782, 264.18857, 167.50537, 171.57413, 146.37212, 135.51244, 156.61684, 162.11005, 141.00055]
2025-08-07 04:32:33,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 35.0, 51.0, 32.0, 33.0, 28.0, 26.0, 30.0, 31.0, 27.0]
2025-08-07 04:32:33,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 44 minutes, 42 seconds)
2025-08-07 04:34:35,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:34:36,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 228.99185 ± 132.284
2025-08-07 04:34:36,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [155.98505, 391.42175, 190.93617, 567.2565, 160.57977, 135.94624, 181.38467, 151.78046, 193.12552, 161.5025]
2025-08-07 04:34:36,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 76.0, 37.0, 114.0, 31.0, 26.0, 35.0, 29.0, 38.0, 31.0]
2025-08-07 04:34:36,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 42 minutes, 36 seconds)
2025-08-07 04:36:38,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:36:38,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 193.52571 ± 61.376
2025-08-07 04:36:38,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [140.6988, 206.73962, 177.87634, 156.02232, 175.05339, 134.30966, 191.46434, 364.1521, 187.89339, 201.04704]
2025-08-07 04:36:38,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 40.0, 34.0, 30.0, 34.0, 26.0, 37.0, 73.0, 36.0, 39.0]
2025-08-07 04:36:38,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 40 minutes, 32 seconds)
2025-08-07 04:38:41,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:38:41,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 172.97104 ± 23.049
2025-08-07 04:38:41,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [129.94202, 129.99844, 176.52097, 190.57578, 165.47058, 189.99582, 181.02225, 192.369, 194.53784, 179.2776]
2025-08-07 04:38:41,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 25.0, 34.0, 37.0, 32.0, 37.0, 35.0, 37.0, 38.0, 35.0]
2025-08-07 04:38:41,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 38 minutes, 25 seconds)
2025-08-07 04:40:44,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:40:45,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 165.39603 ± 29.683
2025-08-07 04:40:45,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [141.64973, 146.10922, 156.6605, 180.77414, 230.64801, 161.4099, 114.6969, 170.20073, 191.30118, 160.50998]
2025-08-07 04:40:45,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 28.0, 30.0, 35.0, 45.0, 31.0, 22.0, 33.0, 37.0, 31.0]
2025-08-07 04:40:45,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 36 minutes, 26 seconds)
2025-08-07 04:42:47,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:42:48,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 170.60602 ± 20.450
2025-08-07 04:42:48,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [134.85423, 160.9468, 198.2445, 194.7723, 170.36496, 191.58598, 151.91719, 160.7732, 154.33907, 188.262]
2025-08-07 04:42:48,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 31.0, 39.0, 38.0, 33.0, 37.0, 29.0, 31.0, 30.0, 37.0]
2025-08-07 04:42:48,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 34 minutes, 16 seconds)
2025-08-07 04:44:50,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:44:51,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 161.39035 ± 21.671
2025-08-07 04:44:51,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [170.28075, 202.81717, 165.85088, 167.25034, 150.77623, 160.90033, 150.41003, 134.91261, 125.024536, 185.68063]
2025-08-07 04:44:51,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 40.0, 32.0, 32.0, 29.0, 31.0, 29.0, 26.0, 24.0, 36.0]
2025-08-07 04:44:51,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 32 minutes, 16 seconds)
2025-08-07 04:46:53,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:46:54,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 216.76408 ± 118.649
2025-08-07 04:46:54,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [151.37833, 176.58928, 182.35788, 293.9823, 166.68123, 134.9518, 151.14047, 549.6217, 160.4273, 200.51048]
2025-08-07 04:46:54,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 34.0, 35.0, 59.0, 32.0, 26.0, 29.0, 109.0, 31.0, 39.0]
2025-08-07 04:46:54,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 30 minutes, 13 seconds)
2025-08-07 04:48:56,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:48:57,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 182.82730 ± 36.806
2025-08-07 04:48:57,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [283.76, 160.45999, 152.14041, 189.85345, 180.14662, 155.15822, 189.40663, 185.9694, 180.76173, 150.61641]
2025-08-07 04:48:57,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [56.0, 31.0, 29.0, 37.0, 35.0, 30.0, 36.0, 36.0, 35.0, 29.0]
2025-08-07 04:48:57,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 28 minutes, 10 seconds)
2025-08-07 04:50:59,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:51:00,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 186.07480 ± 57.387
2025-08-07 04:51:00,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [160.7697, 171.579, 169.37723, 165.35962, 346.5995, 180.50575, 146.48064, 218.61504, 135.73221, 165.7294]
2025-08-07 04:51:00,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 33.0, 33.0, 32.0, 68.0, 35.0, 28.0, 43.0, 26.0, 32.0]
2025-08-07 04:51:00,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 26 minutes, 8 seconds)
2025-08-07 04:53:03,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:53:03,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 175.58823 ± 14.480
2025-08-07 04:53:03,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [176.84044, 160.65714, 192.34773, 190.63335, 181.24274, 170.38655, 197.42723, 154.87248, 155.59523, 175.87941]
2025-08-07 04:53:03,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 31.0, 37.0, 37.0, 35.0, 33.0, 39.0, 30.0, 30.0, 34.0]
2025-08-07 04:53:03,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 24 minutes, 7 seconds)
2025-08-07 04:55:05,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:55:06,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 182.49469 ± 46.065
2025-08-07 04:55:06,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [182.88905, 156.91609, 155.77339, 166.11165, 140.95775, 178.2494, 185.9483, 160.24069, 183.71939, 314.14124]
2025-08-07 04:55:06,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 30.0, 30.0, 32.0, 27.0, 35.0, 36.0, 31.0, 36.0, 63.0]
2025-08-07 04:55:06,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 22 minutes, 3 seconds)
2025-08-07 04:57:08,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:57:09,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 174.20247 ± 23.401
2025-08-07 04:57:09,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [172.33305, 139.89311, 171.71631, 196.59926, 155.41472, 161.98218, 159.04195, 186.60918, 170.49203, 227.94281]
2025-08-07 04:57:09,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 27.0, 33.0, 38.0, 30.0, 31.0, 31.0, 36.0, 33.0, 45.0]
2025-08-07 04:57:09,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 20 minutes, 1 second)
2025-08-07 04:59:11,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:59:12,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 163.72299 ± 14.215
2025-08-07 04:59:12,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [177.03737, 185.8036, 157.04991, 176.95851, 161.01971, 141.22493, 171.26814, 140.24648, 160.63551, 165.98586]
2025-08-07 04:59:12,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 36.0, 30.0, 34.0, 31.0, 27.0, 33.0, 27.0, 31.0, 32.0]
2025-08-07 04:59:12,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 17 minutes, 56 seconds)
2025-08-07 05:01:15,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:01:15,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 145.03813 ± 13.115
2025-08-07 05:01:15,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [146.54689, 124.894005, 145.6702, 129.71304, 166.42267, 156.00974, 134.29057, 160.90546, 150.81378, 135.11494]
2025-08-07 05:01:15,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 24.0, 28.0, 25.0, 32.0, 30.0, 26.0, 31.0, 29.0, 26.0]
2025-08-07 05:01:15,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 15 minutes, 53 seconds)
2025-08-07 05:03:18,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:03:19,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 183.42097 ± 69.233
2025-08-07 05:03:19,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [125.25784, 186.56169, 196.2223, 164.65065, 135.48582, 186.14667, 140.6631, 379.86218, 150.39238, 168.967]
2025-08-07 05:03:19,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 36.0, 38.0, 32.0, 26.0, 36.0, 27.0, 78.0, 29.0, 33.0]
2025-08-07 05:03:19,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 13 minutes, 50 seconds)
2025-08-07 05:05:21,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:05:22,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 189.67448 ± 39.924
2025-08-07 05:05:22,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [181.12672, 146.2956, 248.0288, 177.1868, 180.92505, 215.09776, 271.3291, 170.20943, 151.94199, 154.6037]
2025-08-07 05:05:22,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 28.0, 48.0, 34.0, 35.0, 42.0, 53.0, 33.0, 29.0, 30.0]
2025-08-07 05:05:22,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 11 minutes, 49 seconds)
2025-08-07 05:07:24,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:07:25,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 177.41333 ± 24.048
2025-08-07 05:07:25,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [172.14432, 215.49611, 156.11592, 204.17278, 152.03035, 190.3075, 165.79105, 135.40735, 186.992, 195.67606]
2025-08-07 05:07:25,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 42.0, 30.0, 39.0, 29.0, 37.0, 32.0, 26.0, 36.0, 38.0]
2025-08-07 05:07:25,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 9 minutes, 48 seconds)
2025-08-07 05:09:27,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:09:28,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 170.87862 ± 21.931
2025-08-07 05:09:28,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [173.0569, 155.4559, 166.17001, 181.52406, 151.13301, 216.04498, 201.8632, 166.862, 146.19342, 150.4827]
2025-08-07 05:09:28,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 30.0, 32.0, 35.0, 29.0, 42.0, 39.0, 32.0, 28.0, 29.0]
2025-08-07 05:09:28,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 7 minutes, 44 seconds)
2025-08-07 05:11:29,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:11:30,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 154.28535 ± 13.561
2025-08-07 05:11:30,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [151.15858, 175.22585, 165.37886, 151.0579, 135.40886, 155.63559, 130.21974, 150.90926, 171.75378, 156.10515]
2025-08-07 05:11:30,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 34.0, 32.0, 29.0, 26.0, 30.0, 25.0, 29.0, 33.0, 30.0]
2025-08-07 05:11:30,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 5 minutes, 32 seconds)
2025-08-07 05:13:31,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:13:32,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 171.19832 ± 27.928
2025-08-07 05:13:32,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [161.93622, 160.48604, 164.62491, 178.74171, 140.7724, 150.96587, 247.15707, 160.72319, 185.03334, 161.54231]
2025-08-07 05:13:32,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 31.0, 32.0, 35.0, 27.0, 29.0, 49.0, 31.0, 36.0, 31.0]
2025-08-07 05:13:32,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 3 minutes, 22 seconds)
2025-08-07 05:15:33,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:15:34,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 187.51743 ± 54.717
2025-08-07 05:15:34,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [130.12283, 150.83289, 181.45059, 135.53056, 330.34998, 190.19467, 227.05203, 187.12906, 176.70137, 165.81036]
2025-08-07 05:15:34,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 29.0, 35.0, 26.0, 65.0, 37.0, 45.0, 36.0, 34.0, 32.0]
2025-08-07 05:15:34,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 1 minute, 10 seconds)
2025-08-07 05:17:34,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:17:35,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 206.68230 ± 79.159
2025-08-07 05:17:35,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [215.7984, 199.37273, 150.821, 382.18927, 170.2826, 202.20354, 139.66583, 324.594, 130.46101, 151.43446]
2025-08-07 05:17:35,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [43.0, 39.0, 29.0, 76.0, 33.0, 39.0, 27.0, 66.0, 25.0, 29.0]
2025-08-07 05:17:35,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 58 minutes, 57 seconds)
2025-08-07 05:19:36,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:19:36,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 165.10362 ± 25.727
2025-08-07 05:19:36,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [199.7693, 124.909515, 140.27562, 207.30534, 145.0426, 172.38234, 150.424, 157.16466, 162.86769, 190.8953]
2025-08-07 05:19:36,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [39.0, 24.0, 27.0, 40.0, 28.0, 33.0, 29.0, 30.0, 31.0, 37.0]
2025-08-07 05:19:36,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 56 minutes, 46 seconds)
2025-08-07 05:21:37,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:21:37,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 193.62869 ± 67.329
2025-08-07 05:21:37,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [231.96599, 150.84123, 206.14282, 155.493, 135.05688, 377.50552, 186.41048, 175.7339, 171.76907, 145.36818]
2025-08-07 05:21:37,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [45.0, 29.0, 41.0, 30.0, 26.0, 79.0, 36.0, 34.0, 33.0, 28.0]
2025-08-07 05:21:37,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 54 minutes, 41 seconds)
2025-08-07 05:23:38,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:23:38,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 156.23148 ± 15.930
2025-08-07 05:23:38,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [135.0535, 167.11397, 140.3932, 140.89262, 152.18585, 190.7351, 150.70172, 151.55399, 166.63414, 167.0506]
2025-08-07 05:23:38,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 32.0, 27.0, 27.0, 29.0, 37.0, 29.0, 29.0, 32.0, 32.0]
2025-08-07 05:23:38,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 52 minutes, 33 seconds)
2025-08-07 05:25:39,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:25:40,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 180.69711 ± 39.422
2025-08-07 05:25:40,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [169.757, 261.81195, 224.13638, 124.582184, 135.44777, 170.85995, 211.36418, 182.39323, 170.85257, 155.76581]
2025-08-07 05:25:40,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 51.0, 44.0, 24.0, 26.0, 33.0, 41.0, 35.0, 33.0, 30.0]
2025-08-07 05:25:40,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 50 minutes, 30 seconds)
2025-08-07 05:27:40,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:27:41,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 174.38193 ± 25.380
2025-08-07 05:27:41,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [209.41061, 165.67847, 135.52794, 150.57796, 171.6713, 151.31805, 198.4782, 187.30428, 159.89854, 213.95393]
2025-08-07 05:27:41,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [41.0, 32.0, 26.0, 29.0, 33.0, 29.0, 39.0, 36.0, 31.0, 41.0]
2025-08-07 05:27:41,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 48 minutes, 28 seconds)
2025-08-07 05:29:42,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:29:42,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 158.25439 ± 16.928
2025-08-07 05:29:42,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [176.15382, 155.11568, 166.02861, 159.75795, 175.28073, 156.51602, 172.34547, 166.48936, 124.91091, 129.94545]
2025-08-07 05:29:42,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 30.0, 32.0, 31.0, 34.0, 30.0, 33.0, 32.0, 24.0, 25.0]
2025-08-07 05:29:42,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 46 minutes, 27 seconds)
2025-08-07 05:31:42,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:31:43,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 161.41989 ± 14.969
2025-08-07 05:31:43,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [171.56267, 179.41501, 155.1721, 140.56012, 165.71132, 174.37146, 140.1386, 182.15913, 145.02121, 160.08723]
2025-08-07 05:31:43,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 35.0, 30.0, 27.0, 32.0, 34.0, 27.0, 35.0, 28.0, 31.0]
2025-08-07 05:31:43,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 44 minutes, 24 seconds)
2025-08-07 05:33:44,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:33:44,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 199.88474 ± 63.260
2025-08-07 05:33:44,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [170.90118, 191.95403, 166.98117, 195.32213, 140.93153, 165.66473, 172.77072, 262.39212, 365.93887, 165.99094]
2025-08-07 05:33:44,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 37.0, 32.0, 38.0, 27.0, 32.0, 33.0, 52.0, 76.0, 32.0]
2025-08-07 05:33:44,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 42 minutes, 25 seconds)
2025-08-07 05:35:45,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:35:45,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 164.03763 ± 26.258
2025-08-07 05:35:45,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [232.58183, 154.40874, 162.0623, 151.32336, 185.57901, 165.7023, 135.8096, 146.0943, 161.3676, 145.44736]
2025-08-07 05:35:45,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [46.0, 30.0, 31.0, 29.0, 36.0, 32.0, 26.0, 28.0, 31.0, 28.0]
2025-08-07 05:35:45,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 40 minutes, 21 seconds)
2025-08-07 05:37:45,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:37:46,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 178.01381 ± 38.279
2025-08-07 05:37:46,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [165.55075, 197.22289, 135.4485, 150.81761, 180.4008, 155.99097, 140.04073, 274.6718, 192.71321, 187.28075]
2025-08-07 05:37:46,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 38.0, 26.0, 29.0, 35.0, 30.0, 27.0, 54.0, 37.0, 36.0]
2025-08-07 05:37:46,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 38 minutes, 18 seconds)
2025-08-07 05:39:47,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:39:47,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 175.58125 ± 32.689
2025-08-07 05:39:47,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [162.01471, 176.49597, 140.39096, 255.75545, 171.77744, 130.90508, 185.06145, 190.5556, 186.84294, 156.01288]
2025-08-07 05:39:47,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 34.0, 27.0, 52.0, 33.0, 25.0, 36.0, 37.0, 36.0, 30.0]
2025-08-07 05:39:47,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 36 minutes, 18 seconds)
2025-08-07 05:41:47,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:41:47,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 160.14694 ± 15.575
2025-08-07 05:41:47,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [180.47853, 140.68369, 170.66391, 165.93661, 172.2033, 134.14171, 174.35504, 167.34001, 155.2785, 140.38809]
2025-08-07 05:41:47,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 27.0, 33.0, 32.0, 33.0, 26.0, 34.0, 32.0, 30.0, 27.0]
2025-08-07 05:41:48,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 34 minutes, 15 seconds)
2025-08-07 05:43:47,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:43:48,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 173.79092 ± 39.109
2025-08-07 05:43:48,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [263.6641, 187.69917, 135.50417, 140.36804, 161.22775, 140.0327, 223.38522, 173.79411, 150.63661, 161.59724]
2025-08-07 05:43:48,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [54.0, 36.0, 26.0, 27.0, 31.0, 27.0, 44.0, 34.0, 29.0, 31.0]
2025-08-07 05:43:48,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 32 minutes, 11 seconds)
2025-08-07 05:45:48,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:45:49,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 158.81345 ± 15.459
2025-08-07 05:45:49,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [135.1207, 151.02325, 177.50023, 130.36617, 170.06947, 171.49089, 166.5465, 150.307, 171.4825, 164.22772]
2025-08-07 05:45:49,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 29.0, 34.0, 25.0, 33.0, 33.0, 32.0, 29.0, 33.0, 32.0]
2025-08-07 05:45:49,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 30 minutes, 10 seconds)
2025-08-07 05:47:48,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:47:49,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 178.98056 ± 17.346
2025-08-07 05:47:49,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [170.80788, 213.46422, 170.96, 202.98885, 161.48035, 181.20615, 155.72763, 186.95992, 165.12236, 181.08832]
2025-08-07 05:47:49,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 42.0, 33.0, 39.0, 31.0, 35.0, 30.0, 36.0, 32.0, 35.0]
2025-08-07 05:47:49,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 28 minutes, 8 seconds)
2025-08-07 05:49:48,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:49:49,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 171.86722 ± 27.181
2025-08-07 05:49:49,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [164.66151, 130.64178, 229.62102, 166.05931, 155.94643, 167.04857, 165.63695, 177.19724, 210.58847, 151.27098]
2025-08-07 05:49:49,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 25.0, 45.0, 32.0, 30.0, 32.0, 32.0, 34.0, 41.0, 29.0]
2025-08-07 05:49:49,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 26 minutes, 4 seconds)
2025-08-07 05:51:49,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:51:49,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 168.64969 ± 31.921
2025-08-07 05:51:49,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [195.15134, 130.12924, 186.23677, 145.63383, 194.00891, 193.38945, 146.3275, 224.78394, 140.4856, 130.35025]
2025-08-07 05:51:49,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [38.0, 25.0, 36.0, 28.0, 38.0, 38.0, 28.0, 44.0, 27.0, 25.0]
2025-08-07 05:51:49,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 24 minutes, 4 seconds)
2025-08-07 05:53:49,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:53:49,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 174.69920 ± 24.946
2025-08-07 05:53:49,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [187.51988, 186.41043, 197.23502, 162.10274, 195.78548, 195.13354, 119.27325, 156.64084, 151.17752, 195.71333]
2025-08-07 05:53:49,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [36.0, 36.0, 38.0, 31.0, 38.0, 38.0, 23.0, 30.0, 29.0, 38.0]
2025-08-07 05:53:50,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 22 minutes, 3 seconds)
2025-08-07 05:55:49,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:55:50,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 167.48285 ± 14.912
2025-08-07 05:55:50,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [135.5772, 171.66217, 169.17664, 166.4377, 155.35655, 183.37907, 160.91165, 166.44688, 194.39719, 171.4835]
2025-08-07 05:55:50,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 33.0, 33.0, 32.0, 30.0, 36.0, 31.0, 32.0, 38.0, 33.0]
2025-08-07 05:55:50,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 20 minutes, 2 seconds)
2025-08-07 05:57:49,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:57:50,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 182.69339 ± 34.375
2025-08-07 05:57:50,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [155.41936, 192.66185, 180.2979, 135.49197, 189.55168, 255.82097, 215.75491, 184.95744, 181.67226, 135.30553]
2025-08-07 05:57:50,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 37.0, 35.0, 26.0, 37.0, 52.0, 42.0, 36.0, 35.0, 26.0]
2025-08-07 05:57:50,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 18 minutes, 1 second)
2025-08-07 05:59:49,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 05:59:50,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 185.86978 ± 33.735
2025-08-07 05:59:50,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [150.2752, 238.90517, 209.55315, 188.74756, 177.4662, 140.89186, 164.73354, 180.9367, 245.65628, 161.53223]
2025-08-07 05:59:50,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 46.0, 41.0, 37.0, 34.0, 27.0, 32.0, 35.0, 49.0, 31.0]
2025-08-07 05:59:50,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 16 minutes, 1 second)
2025-08-07 06:01:50,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:01:50,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 172.90855 ± 31.184
2025-08-07 06:01:50,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [182.52744, 199.71605, 128.75298, 180.5315, 184.77214, 130.13219, 183.00682, 165.6929, 140.15399, 233.79973]
2025-08-07 06:01:50,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 39.0, 25.0, 35.0, 36.0, 25.0, 35.0, 32.0, 27.0, 46.0]
2025-08-07 06:01:50,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 14 minutes, 1 second)
2025-08-07 06:03:50,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:03:50,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 177.56204 ± 23.269
2025-08-07 06:03:50,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [200.75409, 167.4236, 203.74579, 146.67543, 196.8016, 140.71025, 184.16177, 151.69296, 203.79271, 179.86226]
2025-08-07 06:03:50,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [39.0, 32.0, 40.0, 28.0, 38.0, 27.0, 36.0, 29.0, 40.0, 35.0]
2025-08-07 06:03:51,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 12 minutes, 1 second)
2025-08-07 06:05:50,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:05:51,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 172.09869 ± 36.113
2025-08-07 06:05:51,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [171.26169, 173.98178, 141.04588, 141.20721, 268.3081, 166.3494, 178.43599, 180.20131, 169.53372, 130.66185]
2025-08-07 06:05:51,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 34.0, 27.0, 27.0, 53.0, 32.0, 35.0, 35.0, 33.0, 25.0]
2025-08-07 06:05:51,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 10 minutes, 1 second)
2025-08-07 06:07:50,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:07:51,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 162.59476 ± 21.004
2025-08-07 06:07:51,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [151.92944, 140.43178, 166.84949, 176.13367, 196.21281, 194.71637, 151.08089, 140.08183, 172.48587, 136.02542]
2025-08-07 06:07:51,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 27.0, 32.0, 34.0, 38.0, 38.0, 29.0, 27.0, 33.0, 26.0]
2025-08-07 06:07:51,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes)
2025-08-07 06:09:50,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:09:51,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 172.87296 ± 25.108
2025-08-07 06:09:51,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [158.99918, 170.82286, 190.05215, 129.33577, 155.23709, 185.72054, 186.87001, 175.68349, 225.2812, 150.72719]
2025-08-07 06:09:51,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 33.0, 37.0, 25.0, 30.0, 36.0, 37.0, 34.0, 45.0, 29.0]
2025-08-07 06:09:51,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes)
2025-08-07 06:11:50,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:11:51,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 170.72493 ± 37.492
2025-08-07 06:11:51,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [165.81015, 159.90865, 151.02603, 170.76053, 140.53598, 164.19109, 179.79315, 151.68591, 145.58945, 277.94836]
2025-08-07 06:11:51,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 31.0, 29.0, 33.0, 27.0, 32.0, 35.0, 29.0, 28.0, 56.0]
2025-08-07 06:11:51,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes)
2025-08-07 06:13:50,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:13:51,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 189.55299 ± 62.366
2025-08-07 06:13:51,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [140.47551, 161.63858, 313.79605, 170.49045, 155.75114, 179.58153, 150.70987, 310.85947, 145.97562, 166.25183]
2025-08-07 06:13:51,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 31.0, 62.0, 33.0, 30.0, 35.0, 29.0, 61.0, 28.0, 32.0]
2025-08-07 06:13:51,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes)
2025-08-07 06:15:51,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 06:15:52,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1221 [DEBUG]: Total Reward: 165.93747 ± 47.158
2025-08-07 06:15:52,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1222 [DEBUG]: All rewards: [134.75682, 135.42982, 300.33722, 146.60422, 162.04834, 180.64912, 140.87321, 155.78372, 167.50865, 135.38368]
2025-08-07 06:15:52,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 26.0, 64.0, 28.0, 31.0, 35.0, 27.0, 30.0, 32.0, 26.0]
2025-08-07 06:15:52,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-humanoid):1251 [DEBUG]: Training session finished
