2025-05-06 00:34:05,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1006 [DEBUG]: logdir: _logs/benchmark-v3-tc3/noisy-humanoid/SparseU15-bpql-mem32
2025-05-06 00:34:05,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1007 [DEBUG]: trainer_prefix: benchmark-v3-tc3/noisy-humanoid/SparseU15-bpql-mem32
2025-05-06 00:34:05,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1008 [DEBUG]: args.trainer_eval_latencies: {'SparseU15': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x76a2889cba00>}
2025-05-06 00:34:05,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1009 [DEBUG]: using device: cpu
2025-05-06 00:34:05,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1031 [INFO]: Creating new trainer
2025-05-06 00:34:05,247 baseline-bpql-noisy-humanoid:111 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=920, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-05-06 00:34:05,247 baseline-bpql-noisy-humanoid:112 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-06 00:34:07,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1092 [DEBUG]: Starting training session...
2025-05-06 00:34:07,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 1/100
2025-05-06 00:37:59,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 00:38:00,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 154.96072 ± 14.453
2025-05-06 00:38:00,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [152.25148, 135.87149, 172.08983, 135.20149, 151.15717, 145.78265, 151.10683, 155.06427, 169.50902, 181.57294]
2025-05-06 00:38:00,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [29.0, 26.0, 33.0, 26.0, 29.0, 28.0, 29.0, 30.0, 33.0, 35.0]
2025-05-06 00:38:00,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1124 [INFO]: New best (154.96) for latency SparseU15
2025-05-06 00:38:00,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1127 [INFO]: saving network
2025-05-06 00:38:00,799 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-humanoid/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 00:38:00,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 2/100 (estimated time remaining: 6 hours, 24 minutes, 51 seconds)
2025-05-06 00:42:10,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 00:42:11,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 190.81400 ± 93.606
2025-05-06 00:42:11,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [212.71738, 157.64124, 145.46234, 463.92316, 169.34361, 167.23839, 156.47913, 129.84271, 135.65833, 169.83377]
2025-05-06 00:42:11,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [41.0, 30.0, 28.0, 91.0, 33.0, 32.0, 30.0, 25.0, 26.0, 33.0]
2025-05-06 00:42:11,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1124 [INFO]: New best (190.81) for latency SparseU15
2025-05-06 00:42:11,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1127 [INFO]: saving network
2025-05-06 00:42:11,354 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-humanoid/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 00:42:11,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 3/100 (estimated time remaining: 6 hours, 35 minutes, 6 seconds)
2025-05-06 00:46:18,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 00:46:19,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 199.06015 ± 86.226
2025-05-06 00:46:19,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [186.41771, 160.12338, 140.86705, 125.749916, 146.10475, 283.2832, 198.55896, 425.4155, 166.34038, 157.7406]
2025-05-06 00:46:19,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [36.0, 31.0, 27.0, 24.0, 28.0, 59.0, 39.0, 82.0, 32.0, 30.0]
2025-05-06 00:46:19,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1124 [INFO]: New best (199.06) for latency SparseU15
2025-05-06 00:46:19,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1127 [INFO]: saving network
2025-05-06 00:46:19,633 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-humanoid/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 00:46:19,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 4/100 (estimated time remaining: 6 hours, 34 minutes, 30 seconds)
2025-05-06 00:50:26,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 00:50:27,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 157.88882 ± 18.160
2025-05-06 00:50:27,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [155.99194, 166.13911, 146.00941, 129.75537, 161.22095, 171.59799, 140.32643, 199.64136, 156.76648, 151.43921]
2025-05-06 00:50:27,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [30.0, 32.0, 28.0, 25.0, 31.0, 33.0, 27.0, 39.0, 30.0, 29.0]
2025-05-06 00:50:27,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 5/100 (estimated time remaining: 6 hours, 31 minutes, 54 seconds)
2025-05-06 00:54:33,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 00:54:34,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 274.57697 ± 184.796
2025-05-06 00:54:34,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [502.24222, 167.41193, 181.98389, 135.19124, 150.51964, 156.17813, 150.30133, 509.91107, 641.4214, 150.60869]
2025-05-06 00:54:34,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [93.0, 32.0, 35.0, 26.0, 29.0, 30.0, 29.0, 98.0, 130.0, 29.0]
2025-05-06 00:54:34,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1124 [INFO]: New best (274.58) for latency SparseU15
2025-05-06 00:54:34,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1127 [INFO]: saving network
2025-05-06 00:54:34,812 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-humanoid/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 00:54:34,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 6/100 (estimated time remaining: 6 hours, 28 minutes, 37 seconds)
2025-05-06 00:58:40,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 00:58:41,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 188.82071 ± 89.337
2025-05-06 00:58:41,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [185.42348, 145.28212, 453.3173, 141.54628, 151.83466, 156.22813, 150.51001, 154.26262, 164.15836, 185.6442]
2025-05-06 00:58:41,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [36.0, 28.0, 81.0, 27.0, 29.0, 30.0, 29.0, 30.0, 32.0, 36.0]
2025-05-06 00:58:41,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 7/100 (estimated time remaining: 6 hours, 28 minutes, 49 seconds)
2025-05-06 01:02:49,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 01:02:50,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 167.03528 ± 15.594
2025-05-06 01:02:50,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [180.48781, 187.9361, 180.60359, 162.16066, 184.13261, 135.82787, 150.70229, 166.49075, 161.00589, 161.00525]
2025-05-06 01:02:50,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [35.0, 36.0, 35.0, 31.0, 36.0, 26.0, 29.0, 32.0, 31.0, 31.0]
2025-05-06 01:02:50,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 8/100 (estimated time remaining: 6 hours, 24 minutes, 5 seconds)
2025-05-06 01:07:00,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 01:07:01,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 240.65930 ± 115.868
2025-05-06 01:07:01,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [161.01453, 402.1355, 165.19058, 165.73807, 140.58165, 176.40074, 454.6101, 184.55208, 389.18835, 167.18134]
2025-05-06 01:07:01,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [31.0, 77.0, 32.0, 32.0, 27.0, 34.0, 87.0, 36.0, 74.0, 32.0]
2025-05-06 01:07:01,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 9/100 (estimated time remaining: 6 hours, 20 minutes, 55 seconds)
2025-05-06 01:11:14,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 01:11:16,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 198.99252 ± 114.579
2025-05-06 01:11:16,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [151.50055, 161.8414, 186.16364, 145.14201, 145.61234, 157.54874, 150.36534, 162.02396, 189.85184, 539.8755]
2025-05-06 01:11:16,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [29.0, 31.0, 36.0, 28.0, 28.0, 30.0, 29.0, 31.0, 37.0, 103.0]
2025-05-06 01:11:16,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 10/100 (estimated time remaining: 6 hours, 18 minutes, 48 seconds)
2025-05-06 01:15:30,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 01:15:31,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 198.20927 ± 114.350
2025-05-06 01:15:31,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [149.65974, 145.95041, 537.1119, 165.27986, 161.24922, 154.86398, 198.48119, 129.84908, 159.79282, 179.85464]
2025-05-06 01:15:31,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [29.0, 28.0, 102.0, 32.0, 31.0, 30.0, 38.0, 25.0, 31.0, 35.0]
2025-05-06 01:15:31,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 11/100 (estimated time remaining: 6 hours, 16 minutes, 57 seconds)
2025-05-06 01:19:45,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 01:19:46,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 218.75606 ± 109.002
2025-05-06 01:19:46,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [155.29094, 452.00085, 165.21017, 171.9005, 234.92229, 404.9498, 175.5542, 167.13298, 140.59172, 120.00696]
2025-05-06 01:19:46,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [30.0, 86.0, 32.0, 33.0, 45.0, 89.0, 34.0, 32.0, 27.0, 23.0]
2025-05-06 01:19:46,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 12/100 (estimated time remaining: 6 hours, 15 minutes, 11 seconds)
2025-05-06 01:24:00,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 01:24:01,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 222.76208 ± 97.589
2025-05-06 01:24:01,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [387.1418, 156.65015, 179.7915, 191.41528, 140.05815, 161.66162, 438.73227, 170.17644, 207.55016, 194.44354]
2025-05-06 01:24:01,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [74.0, 30.0, 35.0, 37.0, 27.0, 31.0, 87.0, 33.0, 40.0, 38.0]
2025-05-06 01:24:01,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 13/100 (estimated time remaining: 6 hours, 12 minutes, 58 seconds)
2025-05-06 01:28:16,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 01:28:17,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 189.62521 ± 65.108
2025-05-06 01:28:17,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [170.49998, 372.87344, 140.17332, 166.38623, 160.83789, 150.71979, 194.94838, 223.0857, 155.61856, 161.10896]
2025-05-06 01:28:17,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [33.0, 72.0, 27.0, 32.0, 31.0, 29.0, 38.0, 43.0, 30.0, 31.0]
2025-05-06 01:28:17,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 14/100 (estimated time remaining: 6 hours, 9 minutes, 56 seconds)
2025-05-06 01:32:30,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 01:32:31,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 201.22960 ± 100.456
2025-05-06 01:32:31,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [411.7072, 166.97636, 389.10773, 135.36583, 141.46873, 171.16533, 150.11775, 165.35098, 135.14323, 145.89279]
2025-05-06 01:32:31,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [80.0, 32.0, 72.0, 26.0, 27.0, 33.0, 29.0, 32.0, 26.0, 28.0]
2025-05-06 01:32:31,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 15/100 (estimated time remaining: 6 hours, 5 minutes, 32 seconds)
2025-05-06 01:36:43,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 01:36:45,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 276.62888 ± 131.702
2025-05-06 01:36:45,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [469.63757, 361.37573, 430.2788, 172.327, 466.80634, 155.77785, 192.60356, 166.13943, 220.9813, 130.36124]
2025-05-06 01:36:45,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [91.0, 72.0, 81.0, 33.0, 89.0, 30.0, 37.0, 32.0, 43.0, 25.0]
2025-05-06 01:36:45,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1124 [INFO]: New best (276.63) for latency SparseU15
2025-05-06 01:36:45,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1127 [INFO]: saving network
2025-05-06 01:36:45,363 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc3/noisy-humanoid/SparseU15-bpql-mem32/checkpoints/best_SparseU15.pkl
2025-05-06 01:36:45,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 16/100 (estimated time remaining: 6 hours, 58 seconds)
2025-05-06 01:40:58,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 01:40:59,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 177.18802 ± 22.318
2025-05-06 01:40:59,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [171.2943, 230.56471, 156.12646, 199.13127, 181.07993, 162.04225, 161.28339, 160.27176, 161.27333, 188.81256]
2025-05-06 01:40:59,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [33.0, 46.0, 30.0, 38.0, 35.0, 31.0, 31.0, 31.0, 31.0, 36.0]
2025-05-06 01:40:59,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 17/100 (estimated time remaining: 5 hours, 56 minutes, 31 seconds)
2025-05-06 01:45:12,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 01:45:14,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 238.96301 ± 124.928
2025-05-06 01:45:14,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [481.14517, 159.54843, 419.44095, 170.52452, 166.62724, 375.17868, 183.61513, 156.80418, 141.26062, 135.48515]
2025-05-06 01:45:14,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [93.0, 31.0, 81.0, 33.0, 32.0, 70.0, 36.0, 30.0, 27.0, 26.0]
2025-05-06 01:45:14,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 18/100 (estimated time remaining: 5 hours, 52 minutes, 4 seconds)
2025-05-06 01:49:26,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 01:49:27,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 228.25687 ± 109.829
2025-05-06 01:49:27,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [146.2161, 486.1159, 391.05563, 151.34886, 182.82306, 149.38368, 210.67668, 157.46764, 213.11, 194.37077]
2025-05-06 01:49:27,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [28.0, 94.0, 72.0, 29.0, 35.0, 29.0, 41.0, 30.0, 41.0, 38.0]
2025-05-06 01:49:27,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 19/100 (estimated time remaining: 5 hours, 47 minutes, 15 seconds)
2025-05-06 01:53:40,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 01:53:41,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 190.61055 ± 97.329
2025-05-06 01:53:41,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [154.94592, 197.14093, 165.55531, 161.543, 155.73778, 175.71788, 130.4142, 135.506, 151.96527, 477.5791]
2025-05-06 01:53:41,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [30.0, 38.0, 32.0, 31.0, 30.0, 34.0, 25.0, 26.0, 29.0, 91.0]
2025-05-06 01:53:41,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 20/100 (estimated time remaining: 5 hours, 43 minutes)
2025-05-06 01:57:54,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 01:57:55,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 254.19360 ± 131.140
2025-05-06 01:57:55,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [507.2928, 135.37607, 134.91382, 323.69928, 361.85373, 155.17236, 425.93872, 181.7769, 151.53447, 164.37793]
2025-05-06 01:57:55,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [97.0, 26.0, 26.0, 63.0, 72.0, 30.0, 79.0, 35.0, 29.0, 32.0]
2025-05-06 01:57:55,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 21/100 (estimated time remaining: 5 hours, 38 minutes, 46 seconds)
2025-05-06 02:02:05,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 02:02:07,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 258.37604 ± 139.819
2025-05-06 02:02:07,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [214.3808, 156.81061, 155.65422, 160.63202, 493.03934, 170.59628, 145.60352, 497.20526, 176.72961, 413.1089]
2025-05-06 02:02:07,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [41.0, 30.0, 30.0, 31.0, 91.0, 33.0, 28.0, 102.0, 34.0, 77.0]
2025-05-06 02:02:07,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 22/100 (estimated time remaining: 5 hours, 33 minutes, 46 seconds)
2025-05-06 02:06:20,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 02:06:22,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 238.01128 ± 108.137
2025-05-06 02:06:22,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [399.54614, 195.83707, 160.58435, 129.97992, 185.45824, 447.91574, 165.19725, 344.79364, 171.40474, 179.39574]
2025-05-06 02:06:22,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [78.0, 38.0, 31.0, 25.0, 36.0, 87.0, 32.0, 71.0, 33.0, 35.0]
2025-05-06 02:06:22,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 23/100 (estimated time remaining: 5 hours, 29 minutes, 34 seconds)
2025-05-06 02:10:30,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 02:10:32,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 201.80940 ± 105.486
2025-05-06 02:10:32,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [145.77611, 512.2188, 144.75041, 150.89972, 193.44896, 166.542, 178.72687, 199.79887, 139.89706, 186.03497]
2025-05-06 02:10:32,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [28.0, 96.0, 28.0, 29.0, 37.0, 32.0, 34.0, 39.0, 27.0, 36.0]
2025-05-06 02:10:32,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 24/100 (estimated time remaining: 5 hours, 24 minutes, 28 seconds)
2025-05-06 02:14:42,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 02:14:44,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 250.86003 ± 131.739
2025-05-06 02:14:44,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [167.27821, 145.9857, 515.24506, 429.0727, 175.90771, 180.84816, 155.9124, 186.29526, 155.73865, 396.31668]
2025-05-06 02:14:44,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [32.0, 28.0, 100.0, 82.0, 34.0, 35.0, 30.0, 36.0, 30.0, 73.0]
2025-05-06 02:14:44,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 25/100 (estimated time remaining: 5 hours, 19 minutes, 51 seconds)
2025-05-06 02:18:53,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 02:18:55,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 241.14050 ± 119.134
2025-05-06 02:18:55,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [141.20125, 446.9521, 151.64151, 172.87675, 372.67447, 180.29362, 441.02353, 165.9504, 171.46754, 167.32396]
2025-05-06 02:18:55,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [27.0, 90.0, 29.0, 33.0, 72.0, 35.0, 89.0, 32.0, 33.0, 32.0]
2025-05-06 02:18:55,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 26/100 (estimated time remaining: 5 hours, 14 minutes, 53 seconds)
2025-05-06 02:23:02,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 02:23:04,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 206.57944 ± 108.874
2025-05-06 02:23:04,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [135.38899, 156.13567, 151.3086, 130.24728, 397.23578, 157.28403, 166.80951, 154.85338, 169.83957, 446.69174]
2025-05-06 02:23:04,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [26.0, 30.0, 29.0, 25.0, 75.0, 30.0, 32.0, 30.0, 33.0, 89.0]
2025-05-06 02:23:04,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 27/100 (estimated time remaining: 5 hours, 10 minutes, 3 seconds)
2025-05-06 02:27:13,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 02:27:14,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 162.93153 ± 12.816
2025-05-06 02:27:14,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [135.79803, 156.0884, 170.77301, 172.855, 170.6331, 159.9321, 145.9456, 178.97095, 167.19778, 171.12143]
2025-05-06 02:27:14,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [26.0, 30.0, 33.0, 33.0, 33.0, 31.0, 28.0, 35.0, 32.0, 33.0]
2025-05-06 02:27:14,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 28/100 (estimated time remaining: 5 hours, 4 minutes, 45 seconds)
2025-05-06 02:31:22,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 02:31:23,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 209.86252 ± 114.870
2025-05-06 02:31:23,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [179.94458, 185.12675, 150.39502, 184.80167, 551.9518, 194.49731, 175.44379, 155.50595, 161.22787, 159.73055]
2025-05-06 02:31:23,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [35.0, 36.0, 29.0, 36.0, 106.0, 38.0, 34.0, 30.0, 31.0, 31.0]
2025-05-06 02:31:23,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 29/100 (estimated time remaining: 5 hours, 21 seconds)
2025-05-06 02:35:31,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 02:35:32,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 177.64584 ± 52.956
2025-05-06 02:35:32,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [146.40434, 172.0202, 150.9974, 150.36307, 155.55048, 149.69402, 181.00624, 331.9765, 156.16907, 182.2772]
2025-05-06 02:35:32,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [28.0, 33.0, 29.0, 29.0, 30.0, 29.0, 35.0, 63.0, 30.0, 35.0]
2025-05-06 02:35:32,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 30/100 (estimated time remaining: 4 hours, 55 minutes, 18 seconds)
2025-05-06 02:39:40,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 02:39:41,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 169.35190 ± 18.730
2025-05-06 02:39:41,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [185.6372, 198.4335, 135.35371, 166.32031, 160.86565, 186.05275, 177.02695, 151.25374, 181.71538, 150.85976]
2025-05-06 02:39:41,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [36.0, 39.0, 26.0, 32.0, 31.0, 36.0, 34.0, 29.0, 35.0, 29.0]
2025-05-06 02:39:41,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 31/100 (estimated time remaining: 4 hours, 50 minutes, 42 seconds)
2025-05-06 02:43:48,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 02:43:49,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 176.13123 ± 22.631
2025-05-06 02:43:49,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [206.3477, 135.67642, 199.25282, 164.59264, 171.07936, 184.62529, 167.30424, 202.78223, 145.29822, 184.35344]
2025-05-06 02:43:49,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [40.0, 26.0, 39.0, 32.0, 33.0, 36.0, 32.0, 39.0, 28.0, 36.0]
2025-05-06 02:43:49,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 32/100 (estimated time remaining: 4 hours, 46 minutes, 19 seconds)
2025-05-06 02:47:55,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 02:47:56,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 199.37827 ± 101.658
2025-05-06 02:47:56,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [185.91968, 175.50555, 187.00868, 146.04764, 150.49045, 500.74844, 150.23425, 186.1326, 156.02994, 155.66544]
2025-05-06 02:47:56,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [36.0, 34.0, 36.0, 28.0, 29.0, 97.0, 29.0, 36.0, 30.0, 30.0]
2025-05-06 02:47:56,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 33/100 (estimated time remaining: 4 hours, 41 minutes, 38 seconds)
2025-05-06 02:52:02,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 02:52:04,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 207.12724 ± 89.691
2025-05-06 02:52:04,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [154.92873, 166.19437, 471.1457, 169.83803, 187.45773, 196.89699, 154.56609, 180.83089, 213.33669, 176.0773]
2025-05-06 02:52:04,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [30.0, 32.0, 88.0, 33.0, 36.0, 38.0, 30.0, 35.0, 41.0, 34.0]
2025-05-06 02:52:04,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 34/100 (estimated time remaining: 4 hours, 37 minutes, 2 seconds)
2025-05-06 02:56:10,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 02:56:11,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 170.09993 ± 21.230
2025-05-06 02:56:11,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [176.00575, 140.97714, 155.0864, 211.8813, 151.17697, 181.42232, 181.93092, 146.60956, 164.69463, 191.21439]
2025-05-06 02:56:11,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [34.0, 27.0, 30.0, 41.0, 29.0, 35.0, 35.0, 28.0, 32.0, 37.0]
2025-05-06 02:56:11,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 35/100 (estimated time remaining: 4 hours, 32 minutes, 37 seconds)
2025-05-06 03:00:16,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 03:00:17,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 175.49362 ± 53.080
2025-05-06 03:00:17,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [141.45343, 198.63303, 191.60161, 146.51007, 166.79129, 322.52893, 134.91582, 140.56593, 150.62532, 161.3107]
2025-05-06 03:00:17,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [27.0, 38.0, 37.0, 28.0, 32.0, 66.0, 26.0, 27.0, 29.0, 31.0]
2025-05-06 03:00:17,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 36/100 (estimated time remaining: 4 hours, 27 minutes, 48 seconds)
2025-05-06 03:04:21,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 03:04:22,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 217.40250 ± 119.586
2025-05-06 03:04:22,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [125.43564, 172.87445, 140.79036, 129.95523, 167.1855, 449.63428, 190.84833, 454.6619, 145.31107, 197.32805]
2025-05-06 03:04:22,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [24.0, 33.0, 27.0, 25.0, 32.0, 84.0, 37.0, 85.0, 28.0, 38.0]
2025-05-06 03:04:22,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 37/100 (estimated time remaining: 4 hours, 23 minutes, 6 seconds)
2025-05-06 03:08:27,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 03:08:28,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 158.20268 ± 14.315
2025-05-06 03:08:28,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [140.53844, 165.7171, 151.662, 171.98982, 169.01964, 139.69238, 176.33284, 135.89653, 160.44945, 170.72864]
2025-05-06 03:08:28,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [27.0, 32.0, 29.0, 33.0, 33.0, 27.0, 34.0, 26.0, 31.0, 33.0]
2025-05-06 03:08:28,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 38/100 (estimated time remaining: 4 hours, 18 minutes, 31 seconds)
2025-05-06 03:12:31,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 03:12:32,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 164.58359 ± 25.057
2025-05-06 03:12:32,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [151.14986, 185.45543, 197.16286, 204.614, 130.21428, 160.90144, 160.07751, 146.11064, 180.18915, 129.96056]
2025-05-06 03:12:32,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [29.0, 36.0, 38.0, 40.0, 25.0, 31.0, 31.0, 28.0, 35.0, 25.0]
2025-05-06 03:12:32,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 39/100 (estimated time remaining: 4 hours, 13 minutes, 56 seconds)
2025-05-06 03:16:38,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 03:16:39,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 201.98145 ± 94.317
2025-05-06 03:16:39,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [215.45036, 135.78714, 471.849, 175.29234, 222.71997, 155.79478, 189.1576, 146.04353, 167.359, 140.36072]
2025-05-06 03:16:39,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [42.0, 26.0, 102.0, 34.0, 44.0, 30.0, 36.0, 28.0, 32.0, 27.0]
2025-05-06 03:16:39,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 40/100 (estimated time remaining: 4 hours, 9 minutes, 41 seconds)
2025-05-06 03:20:42,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 03:20:43,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 163.84639 ± 15.030
2025-05-06 03:20:43,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [135.81277, 162.11485, 162.1847, 192.08463, 155.54716, 180.55446, 155.1408, 177.34369, 156.19577, 161.48512]
2025-05-06 03:20:43,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [26.0, 31.0, 31.0, 37.0, 30.0, 35.0, 30.0, 34.0, 30.0, 31.0]
2025-05-06 03:20:43,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 41/100 (estimated time remaining: 4 hours, 5 minutes, 14 seconds)
2025-05-06 03:24:47,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 03:24:48,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 159.97729 ± 26.325
2025-05-06 03:24:48,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [196.97514, 151.15463, 141.31776, 140.7494, 135.74432, 156.1267, 182.22545, 209.18185, 124.6967, 161.6011]
2025-05-06 03:24:48,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [38.0, 29.0, 27.0, 27.0, 26.0, 30.0, 35.0, 41.0, 24.0, 31.0]
2025-05-06 03:24:48,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 42/100 (estimated time remaining: 4 hours, 1 minute, 6 seconds)
2025-05-06 03:28:52,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 03:28:53,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 163.04929 ± 22.324
2025-05-06 03:28:53,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [171.14124, 145.98322, 164.42064, 130.34364, 190.63576, 160.85602, 195.20088, 180.96083, 125.262985, 165.6876]
2025-05-06 03:28:53,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [33.0, 28.0, 32.0, 25.0, 37.0, 31.0, 38.0, 35.0, 24.0, 32.0]
2025-05-06 03:28:53,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 56 minutes, 54 seconds)
2025-05-06 03:32:56,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 03:32:58,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 195.91037 ± 92.703
2025-05-06 03:32:58,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [160.68098, 151.1536, 195.95088, 129.53935, 165.64868, 179.53943, 175.728, 169.80078, 161.5868, 469.47525]
2025-05-06 03:32:58,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [31.0, 29.0, 38.0, 25.0, 32.0, 35.0, 34.0, 33.0, 31.0, 94.0]
2025-05-06 03:32:58,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 52 minutes, 48 seconds)
2025-05-06 03:37:02,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 03:37:03,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 194.22232 ± 68.273
2025-05-06 03:37:03,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [190.14252, 390.987, 129.88686, 176.01535, 168.89644, 164.71002, 157.92014, 171.33412, 200.97887, 191.35182]
2025-05-06 03:37:03,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [37.0, 76.0, 25.0, 34.0, 33.0, 32.0, 30.0, 33.0, 39.0, 37.0]
2025-05-06 03:37:03,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 48 minutes, 34 seconds)
2025-05-06 03:41:08,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 03:41:09,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 191.90324 ± 50.515
2025-05-06 03:41:09,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [140.05925, 155.39761, 182.2715, 166.1812, 199.59709, 195.15173, 331.26056, 166.34676, 174.0778, 208.68912]
2025-05-06 03:41:09,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [27.0, 30.0, 35.0, 32.0, 39.0, 38.0, 67.0, 32.0, 34.0, 41.0]
2025-05-06 03:41:09,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 46/100 (estimated time remaining: 3 hours, 44 minutes, 47 seconds)
2025-05-06 03:45:14,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 03:45:15,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 165.73854 ± 22.885
2025-05-06 03:45:15,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [185.906, 205.74098, 140.71324, 171.2447, 179.61078, 145.5191, 180.81638, 172.3412, 129.55168, 145.94136]
2025-05-06 03:45:15,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [36.0, 40.0, 27.0, 33.0, 35.0, 28.0, 35.0, 33.0, 25.0, 28.0]
2025-05-06 03:45:15,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 47/100 (estimated time remaining: 3 hours, 40 minutes, 48 seconds)
2025-05-06 03:49:18,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 03:49:19,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 190.30997 ± 61.874
2025-05-06 03:49:19,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [205.43521, 160.01715, 150.1175, 172.41238, 130.36143, 179.66457, 358.90945, 181.0071, 144.74857, 220.42628]
2025-05-06 03:49:19,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [40.0, 31.0, 29.0, 33.0, 25.0, 35.0, 72.0, 35.0, 28.0, 43.0]
2025-05-06 03:49:19,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 48/100 (estimated time remaining: 3 hours, 36 minutes, 39 seconds)
2025-05-06 03:53:24,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 03:53:25,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 165.25049 ± 16.714
2025-05-06 03:53:25,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [159.67719, 156.76625, 146.51622, 175.4049, 145.82666, 181.8557, 175.35423, 198.76056, 146.33377, 166.00938]
2025-05-06 03:53:25,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [31.0, 30.0, 28.0, 34.0, 28.0, 35.0, 34.0, 38.0, 28.0, 32.0]
2025-05-06 03:53:25,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 49/100 (estimated time remaining: 3 hours, 32 minutes, 47 seconds)
2025-05-06 03:57:30,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 03:57:31,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 183.96663 ± 53.349
2025-05-06 03:57:31,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [146.16946, 184.97174, 150.92609, 155.84499, 176.78358, 166.66429, 170.97972, 176.70131, 340.15463, 170.4704]
2025-05-06 03:57:31,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [28.0, 36.0, 29.0, 30.0, 34.0, 32.0, 33.0, 34.0, 70.0, 33.0]
2025-05-06 03:57:31,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 50/100 (estimated time remaining: 3 hours, 28 minutes, 40 seconds)
2025-05-06 04:01:35,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 04:01:36,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 194.63145 ± 81.034
2025-05-06 04:01:36,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [184.1881, 161.63965, 427.51227, 189.75821, 125.332085, 196.53065, 194.79826, 135.4447, 155.30081, 175.80983]
2025-05-06 04:01:36,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [36.0, 31.0, 84.0, 37.0, 24.0, 38.0, 38.0, 26.0, 30.0, 34.0]
2025-05-06 04:01:36,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 51/100 (estimated time remaining: 3 hours, 24 minutes, 33 seconds)
2025-05-06 04:05:41,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 04:05:42,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 174.47856 ± 26.529
2025-05-06 04:05:42,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [192.54895, 155.04858, 235.7634, 169.3507, 184.78336, 175.05142, 160.0433, 129.60231, 161.22849, 181.36507]
2025-05-06 04:05:42,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [37.0, 30.0, 46.0, 33.0, 36.0, 34.0, 31.0, 25.0, 31.0, 35.0]
2025-05-06 04:05:42,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 52/100 (estimated time remaining: 3 hours, 20 minutes, 28 seconds)
2025-05-06 04:09:47,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 04:09:48,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 177.95314 ± 19.090
2025-05-06 04:09:48,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [169.58028, 155.65735, 197.64354, 201.20499, 209.33371, 183.93474, 161.57866, 154.91867, 161.06529, 184.61427]
2025-05-06 04:09:48,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [33.0, 30.0, 38.0, 39.0, 41.0, 36.0, 31.0, 30.0, 31.0, 36.0]
2025-05-06 04:09:48,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 53/100 (estimated time remaining: 3 hours, 16 minutes, 33 seconds)
2025-05-06 04:13:53,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 04:13:54,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 164.14731 ± 24.145
2025-05-06 04:13:54,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [161.3083, 171.17645, 180.70633, 180.00098, 161.20096, 156.0916, 124.931404, 125.02249, 209.40637, 171.62833]
2025-05-06 04:13:54,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [31.0, 33.0, 35.0, 35.0, 31.0, 30.0, 24.0, 24.0, 41.0, 33.0]
2025-05-06 04:13:54,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 54/100 (estimated time remaining: 3 hours, 12 minutes, 25 seconds)
2025-05-06 04:17:58,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 04:17:59,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 164.20766 ± 17.813
2025-05-06 04:17:59,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [180.83171, 145.42699, 165.67137, 163.99313, 164.8209, 189.40677, 151.52528, 193.69063, 145.15222, 141.55751]
2025-05-06 04:17:59,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [35.0, 28.0, 32.0, 32.0, 32.0, 37.0, 29.0, 38.0, 28.0, 27.0]
2025-05-06 04:17:59,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 55/100 (estimated time remaining: 3 hours, 8 minutes, 18 seconds)
2025-05-06 04:22:03,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 04:22:04,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 161.89241 ± 17.956
2025-05-06 04:22:04,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [130.06734, 159.60895, 140.79958, 187.02965, 176.5908, 180.46387, 177.76949, 156.54245, 164.8577, 145.1943]
2025-05-06 04:22:04,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [25.0, 31.0, 27.0, 36.0, 34.0, 35.0, 34.0, 30.0, 32.0, 28.0]
2025-05-06 04:22:04,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 56/100 (estimated time remaining: 3 hours, 4 minutes, 12 seconds)
2025-05-06 04:26:09,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 04:26:10,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 185.56943 ± 82.728
2025-05-06 04:26:10,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [129.21413, 124.63397, 125.22831, 170.62367, 366.07224, 326.77365, 180.42816, 145.81589, 141.06058, 145.84366]
2025-05-06 04:26:10,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [25.0, 24.0, 24.0, 33.0, 74.0, 64.0, 35.0, 28.0, 27.0, 28.0]
2025-05-06 04:26:10,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 57/100 (estimated time remaining: 3 hours, 5 seconds)
2025-05-06 04:30:15,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 04:30:16,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 176.65904 ± 16.458
2025-05-06 04:30:16,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [160.7257, 150.1035, 165.3448, 190.1009, 185.17564, 197.75102, 172.5059, 204.78603, 166.31667, 173.78027]
2025-05-06 04:30:16,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [31.0, 29.0, 32.0, 37.0, 36.0, 39.0, 33.0, 40.0, 32.0, 34.0]
2025-05-06 04:30:16,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 55 minutes, 59 seconds)
2025-05-06 04:34:21,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 04:34:22,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 193.73978 ± 68.024
2025-05-06 04:34:22,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [225.38792, 155.07361, 176.6522, 135.12944, 383.79727, 151.58945, 196.66426, 190.95187, 161.29915, 160.85246]
2025-05-06 04:34:22,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [44.0, 30.0, 34.0, 26.0, 77.0, 29.0, 38.0, 37.0, 31.0, 31.0]
2025-05-06 04:34:22,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 51 minutes, 58 seconds)
2025-05-06 04:38:26,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 04:38:27,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 161.37334 ± 16.693
2025-05-06 04:38:27,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [150.76064, 150.50763, 182.16641, 144.89813, 160.58945, 190.82903, 140.13295, 176.29852, 146.59969, 170.95111]
2025-05-06 04:38:27,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [29.0, 29.0, 35.0, 28.0, 31.0, 37.0, 27.0, 34.0, 28.0, 33.0]
2025-05-06 04:38:27,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 47 minutes, 53 seconds)
2025-05-06 04:42:32,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 04:42:33,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 203.25809 ± 104.052
2025-05-06 04:42:33,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [134.23444, 134.71709, 144.54498, 139.92546, 361.57416, 150.88564, 448.4913, 165.05963, 160.48753, 192.66055]
2025-05-06 04:42:33,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [26.0, 26.0, 28.0, 27.0, 76.0, 29.0, 90.0, 32.0, 31.0, 37.0]
2025-05-06 04:42:33,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 43 minutes, 48 seconds)
2025-05-06 04:46:38,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 04:46:39,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 174.10472 ± 15.939
2025-05-06 04:46:39,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [181.51505, 203.4041, 187.68358, 146.5196, 156.07927, 172.79936, 186.11833, 170.31604, 160.26578, 176.34622]
2025-05-06 04:46:39,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [35.0, 40.0, 36.0, 28.0, 30.0, 33.0, 36.0, 33.0, 31.0, 34.0]
2025-05-06 04:46:39,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 39 minutes, 50 seconds)
2025-05-06 04:50:44,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 04:50:45,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 200.07040 ± 72.937
2025-05-06 04:50:45,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [169.38145, 171.12296, 408.36105, 211.16035, 161.45267, 139.79807, 215.20865, 176.90244, 155.98671, 191.32968]
2025-05-06 04:50:45,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [33.0, 33.0, 85.0, 41.0, 31.0, 27.0, 42.0, 34.0, 30.0, 37.0]
2025-05-06 04:50:45,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 35 minutes, 42 seconds)
2025-05-06 04:54:49,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 04:54:51,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 200.04863 ± 65.620
2025-05-06 04:54:51,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [170.77989, 185.7274, 178.32968, 150.16791, 392.03583, 207.97444, 172.93742, 180.63892, 192.67715, 169.2176]
2025-05-06 04:54:51,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [33.0, 36.0, 35.0, 29.0, 78.0, 40.0, 33.0, 35.0, 37.0, 33.0]
2025-05-06 04:54:51,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 31 minutes, 31 seconds)
2025-05-06 04:58:54,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 04:58:56,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 195.10904 ± 58.129
2025-05-06 04:58:56,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [175.04942, 361.46402, 202.33113, 165.83546, 179.91621, 140.81833, 187.47629, 156.21628, 194.29015, 187.6931]
2025-05-06 04:58:56,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [34.0, 78.0, 39.0, 32.0, 35.0, 27.0, 36.0, 30.0, 38.0, 36.0]
2025-05-06 04:58:56,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 27 minutes, 23 seconds)
2025-05-06 05:03:01,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 05:03:02,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 187.99092 ± 71.430
2025-05-06 05:03:02,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [155.80832, 197.74617, 198.0407, 134.56926, 173.47992, 170.67548, 151.95332, 392.96283, 169.25273, 135.42055]
2025-05-06 05:03:02,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [30.0, 38.0, 39.0, 26.0, 34.0, 33.0, 29.0, 80.0, 33.0, 26.0]
2025-05-06 05:03:02,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 23 minutes, 23 seconds)
2025-05-06 05:07:06,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 05:07:07,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 197.00543 ± 64.098
2025-05-06 05:07:07,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [177.05272, 195.67313, 196.04134, 150.8562, 200.63734, 165.53711, 175.9156, 383.18698, 168.59317, 156.56082]
2025-05-06 05:07:07,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [34.0, 38.0, 38.0, 29.0, 39.0, 32.0, 34.0, 74.0, 33.0, 30.0]
2025-05-06 05:07:07,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 19 minutes, 10 seconds)
2025-05-06 05:11:12,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 05:11:13,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 183.19237 ± 48.038
2025-05-06 05:11:13,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [151.24136, 220.59814, 174.76645, 188.87897, 151.72253, 307.92618, 184.60252, 171.05461, 130.05237, 151.08058]
2025-05-06 05:11:13,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [29.0, 43.0, 34.0, 37.0, 29.0, 60.0, 36.0, 33.0, 25.0, 29.0]
2025-05-06 05:11:13,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 68/100 (estimated time remaining: 2 hours, 15 minutes, 5 seconds)
2025-05-06 05:15:18,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 05:15:19,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 173.99655 ± 22.699
2025-05-06 05:15:19,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [201.39926, 157.0749, 196.05768, 172.41461, 150.3363, 214.05269, 140.77191, 181.03758, 169.89331, 156.92723]
2025-05-06 05:15:19,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [39.0, 30.0, 38.0, 33.0, 29.0, 42.0, 27.0, 35.0, 33.0, 30.0]
2025-05-06 05:15:19,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 69/100 (estimated time remaining: 2 hours, 11 minutes, 5 seconds)
2025-05-06 05:19:23,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 05:19:24,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 161.79468 ± 19.175
2025-05-06 05:19:24,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [202.36052, 187.3216, 154.91205, 155.54599, 165.7048, 155.82323, 140.83542, 152.1445, 135.7766, 167.52213]
2025-05-06 05:19:24,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [39.0, 36.0, 30.0, 30.0, 32.0, 30.0, 27.0, 29.0, 26.0, 32.0]
2025-05-06 05:19:24,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 70/100 (estimated time remaining: 2 hours, 6 minutes, 58 seconds)
2025-05-06 05:23:29,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 05:23:29,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 157.03134 ± 8.956
2025-05-06 05:23:29,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [154.79419, 174.70312, 165.14871, 155.88554, 145.42345, 167.04356, 156.34727, 155.0913, 145.39232, 150.48401]
2025-05-06 05:23:29,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [30.0, 34.0, 32.0, 30.0, 28.0, 32.0, 30.0, 30.0, 28.0, 29.0]
2025-05-06 05:23:29,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 71/100 (estimated time remaining: 2 hours, 2 minutes, 44 seconds)
2025-05-06 05:27:34,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 05:27:35,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 165.47099 ± 23.853
2025-05-06 05:27:35,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [172.11128, 139.70326, 165.62064, 176.89468, 155.0684, 193.72887, 124.61154, 145.98778, 172.5622, 208.42137]
2025-05-06 05:27:35,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [33.0, 27.0, 32.0, 34.0, 30.0, 37.0, 24.0, 28.0, 33.0, 41.0]
2025-05-06 05:27:35,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 58 minutes, 41 seconds)
2025-05-06 05:31:39,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 05:31:40,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 187.80705 ± 65.721
2025-05-06 05:31:40,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [150.1118, 191.44931, 202.65065, 175.70139, 376.30872, 140.98212, 166.6854, 151.43279, 145.56268, 177.18582]
2025-05-06 05:31:40,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [29.0, 37.0, 39.0, 34.0, 74.0, 27.0, 32.0, 29.0, 28.0, 34.0]
2025-05-06 05:31:40,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 54 minutes, 33 seconds)
2025-05-06 05:35:45,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 05:35:46,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 211.97990 ± 80.297
2025-05-06 05:35:46,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [350.25714, 151.35359, 186.37943, 205.07272, 383.12207, 178.27171, 135.3245, 185.23999, 152.0754, 192.70244]
2025-05-06 05:35:46,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [69.0, 29.0, 36.0, 40.0, 79.0, 34.0, 26.0, 36.0, 29.0, 38.0]
2025-05-06 05:35:46,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 50 minutes, 25 seconds)
2025-05-06 05:39:51,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 05:39:52,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 182.74396 ± 21.177
2025-05-06 05:39:52,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [211.56213, 169.75827, 199.38545, 181.22302, 175.60654, 160.9399, 170.61742, 146.735, 196.51434, 215.09758]
2025-05-06 05:39:52,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [41.0, 33.0, 38.0, 35.0, 34.0, 31.0, 33.0, 28.0, 38.0, 43.0]
2025-05-06 05:39:52,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 46 minutes, 22 seconds)
2025-05-06 05:43:56,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 05:43:57,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 165.26524 ± 19.011
2025-05-06 05:43:57,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [155.20973, 171.50145, 160.99248, 160.78885, 206.7075, 179.44261, 166.34282, 171.51799, 150.27151, 129.87755]
2025-05-06 05:43:57,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [30.0, 33.0, 31.0, 31.0, 40.0, 35.0, 32.0, 33.0, 29.0, 25.0]
2025-05-06 05:43:57,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 42 minutes, 18 seconds)
2025-05-06 05:48:03,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 05:48:04,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 190.75810 ± 69.716
2025-05-06 05:48:04,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [165.40733, 171.87701, 150.79813, 150.86307, 155.90245, 221.72914, 146.3614, 390.49667, 177.30603, 176.8397]
2025-05-06 05:48:04,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [32.0, 33.0, 29.0, 29.0, 30.0, 43.0, 28.0, 79.0, 34.0, 34.0]
2025-05-06 05:48:04,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 38 minutes, 19 seconds)
2025-05-06 05:52:08,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 05:52:09,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 182.83568 ± 66.663
2025-05-06 05:52:09,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [151.65158, 149.91011, 207.95845, 167.03102, 149.98622, 161.36165, 166.80629, 376.15552, 146.68062, 150.81532]
2025-05-06 05:52:09,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [29.0, 29.0, 40.0, 32.0, 29.0, 31.0, 32.0, 78.0, 28.0, 29.0]
2025-05-06 05:52:09,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 34 minutes, 10 seconds)
2025-05-06 05:56:14,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 05:56:15,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 195.14450 ± 59.602
2025-05-06 05:56:15,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [185.54497, 161.3243, 161.52501, 185.97032, 166.1268, 366.87637, 156.09697, 214.15529, 186.63966, 167.18513]
2025-05-06 05:56:15,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [36.0, 31.0, 31.0, 36.0, 32.0, 76.0, 30.0, 42.0, 36.0, 32.0]
2025-05-06 05:56:15,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 30 minutes, 5 seconds)
2025-05-06 06:00:19,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 06:00:20,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 163.78964 ± 22.294
2025-05-06 06:00:20,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [193.46713, 190.86176, 186.39119, 160.37724, 149.12619, 175.91484, 130.46216, 129.70232, 170.31973, 151.27388]
2025-05-06 06:00:20,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [37.0, 37.0, 36.0, 31.0, 29.0, 34.0, 25.0, 25.0, 33.0, 29.0]
2025-05-06 06:00:20,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 25 minutes, 57 seconds)
2025-05-06 06:04:25,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 06:04:26,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 167.83673 ± 34.239
2025-05-06 06:04:26,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [165.0191, 114.09248, 216.60387, 130.31633, 150.19608, 196.29172, 129.59877, 192.51736, 173.23872, 210.49283]
2025-05-06 06:04:26,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [32.0, 22.0, 43.0, 25.0, 29.0, 38.0, 25.0, 37.0, 33.0, 41.0]
2025-05-06 06:04:26,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 21 minutes, 56 seconds)
2025-05-06 06:08:30,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 06:08:31,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 190.37520 ± 36.168
2025-05-06 06:08:31,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [206.51375, 171.92958, 168.67328, 145.22792, 185.6605, 191.02013, 166.04695, 194.12265, 187.56471, 286.99255]
2025-05-06 06:08:31,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [40.0, 33.0, 33.0, 28.0, 36.0, 37.0, 32.0, 38.0, 36.0, 57.0]
2025-05-06 06:08:31,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 17 minutes, 43 seconds)
2025-05-06 06:12:37,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 06:12:38,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 178.61478 ± 19.367
2025-05-06 06:12:38,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [170.69125, 176.07909, 199.58806, 155.94516, 178.09467, 186.36899, 175.79132, 179.26901, 145.8703, 218.45007]
2025-05-06 06:12:38,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [33.0, 34.0, 39.0, 30.0, 34.0, 36.0, 34.0, 35.0, 28.0, 43.0]
2025-05-06 06:12:38,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 13 minutes, 44 seconds)
2025-05-06 06:16:41,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 06:16:42,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 171.15857 ± 14.858
2025-05-06 06:16:42,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [166.4255, 169.42052, 195.15808, 161.16324, 174.23474, 159.89809, 161.23422, 171.32297, 201.03368, 151.6946]
2025-05-06 06:16:42,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [32.0, 33.0, 38.0, 31.0, 34.0, 31.0, 31.0, 33.0, 39.0, 29.0]
2025-05-06 06:16:42,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 9 minutes, 33 seconds)
2025-05-06 06:20:59,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 06:21:00,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 166.58806 ± 11.105
2025-05-06 06:21:00,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [145.87093, 173.21198, 186.84448, 172.0948, 170.32938, 150.63165, 172.02411, 167.12628, 162.08452, 165.66257]
2025-05-06 06:21:00,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [28.0, 33.0, 36.0, 33.0, 33.0, 29.0, 33.0, 32.0, 31.0, 32.0]
2025-05-06 06:21:00,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 85/100 (estimated time remaining: 1 hour, 6 minutes, 9 seconds)
2025-05-06 06:25:21,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 06:25:22,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 195.74516 ± 71.945
2025-05-06 06:25:22,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [170.25969, 130.46007, 175.04866, 391.01865, 165.77652, 181.02057, 135.34688, 177.19064, 250.2219, 181.10803]
2025-05-06 06:25:22,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [33.0, 25.0, 34.0, 78.0, 32.0, 35.0, 26.0, 34.0, 49.0, 35.0]
2025-05-06 06:25:22,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 86/100 (estimated time remaining: 1 hour, 2 minutes, 46 seconds)
2025-05-06 06:29:26,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 06:29:27,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 191.43538 ± 70.143
2025-05-06 06:29:27,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [275.93256, 130.16989, 183.25807, 140.98065, 177.32985, 176.59428, 362.89603, 191.63876, 145.47351, 130.08026]
2025-05-06 06:29:27,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [54.0, 25.0, 35.0, 27.0, 34.0, 34.0, 75.0, 37.0, 28.0, 25.0]
2025-05-06 06:29:27,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 87/100 (estimated time remaining: 58 minutes, 35 seconds)
2025-05-06 06:33:30,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 06:33:31,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 158.63263 ± 24.125
2025-05-06 06:33:31,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [129.72829, 176.34077, 135.94029, 201.91005, 124.73183, 182.13853, 140.59618, 172.94374, 156.51624, 165.48033]
2025-05-06 06:33:31,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [25.0, 34.0, 26.0, 39.0, 24.0, 35.0, 27.0, 33.0, 30.0, 32.0]
2025-05-06 06:33:31,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 88/100 (estimated time remaining: 54 minutes, 18 seconds)
2025-05-06 06:37:35,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 06:37:36,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 178.61276 ± 35.030
2025-05-06 06:37:36,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [152.02594, 150.03029, 178.79584, 170.87549, 268.33792, 187.25551, 181.10791, 202.05235, 140.406, 155.24037]
2025-05-06 06:37:36,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [29.0, 29.0, 35.0, 33.0, 53.0, 36.0, 35.0, 39.0, 27.0, 30.0]
2025-05-06 06:37:36,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 89/100 (estimated time remaining: 50 minutes, 8 seconds)
2025-05-06 06:41:39,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 06:41:40,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 187.34048 ± 68.011
2025-05-06 06:41:40,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [151.75424, 181.81075, 171.15553, 194.98232, 175.2407, 140.83102, 140.00023, 385.2974, 164.99667, 167.33577]
2025-05-06 06:41:40,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [29.0, 35.0, 33.0, 38.0, 34.0, 27.0, 27.0, 78.0, 32.0, 32.0]
2025-05-06 06:41:40,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 90/100 (estimated time remaining: 45 minutes, 28 seconds)
2025-05-06 06:45:44,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 06:45:45,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 170.65166 ± 24.885
2025-05-06 06:45:45,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [146.84558, 166.32506, 157.65462, 187.29082, 193.94475, 213.00626, 171.7554, 194.01877, 145.71384, 129.96141]
2025-05-06 06:45:45,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [28.0, 32.0, 30.0, 36.0, 38.0, 41.0, 33.0, 38.0, 28.0, 25.0]
2025-05-06 06:45:45,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 91/100 (estimated time remaining: 40 minutes, 45 seconds)
2025-05-06 06:49:48,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 06:49:49,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 171.60970 ± 43.320
2025-05-06 06:49:49,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [160.76573, 136.2243, 146.1194, 156.4516, 171.25824, 171.0167, 130.08809, 293.0068, 172.16866, 178.99734]
2025-05-06 06:49:49,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [31.0, 26.0, 28.0, 30.0, 33.0, 33.0, 25.0, 58.0, 33.0, 35.0]
2025-05-06 06:49:49,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 92/100 (estimated time remaining: 36 minutes, 40 seconds)
2025-05-06 06:53:52,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 06:53:53,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 155.13126 ± 22.933
2025-05-06 06:53:53,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [145.9663, 193.92677, 192.62444, 130.29657, 164.54817, 134.8551, 146.12242, 124.85149, 151.34743, 166.77397]
2025-05-06 06:53:53,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [28.0, 37.0, 37.0, 25.0, 32.0, 26.0, 28.0, 24.0, 29.0, 32.0]
2025-05-06 06:53:53,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 93/100 (estimated time remaining: 32 minutes, 34 seconds)
2025-05-06 06:57:55,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 06:57:56,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 166.58881 ± 36.907
2025-05-06 06:57:56,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [145.29436, 140.11807, 144.08664, 179.17538, 190.8976, 140.81271, 145.22998, 140.76999, 176.3239, 263.1794]
2025-05-06 06:57:56,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [28.0, 27.0, 28.0, 35.0, 37.0, 27.0, 28.0, 27.0, 34.0, 52.0]
2025-05-06 06:57:56,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 94/100 (estimated time remaining: 28 minutes, 28 seconds)
2025-05-06 07:01:59,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 07:02:00,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 159.17433 ± 20.040
2025-05-06 07:02:00,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [134.83047, 176.7648, 160.90958, 140.65976, 129.71106, 185.82213, 182.33232, 155.5325, 145.0229, 180.15767]
2025-05-06 07:02:00,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [26.0, 34.0, 31.0, 27.0, 25.0, 36.0, 35.0, 30.0, 28.0, 35.0]
2025-05-06 07:02:00,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 95/100 (estimated time remaining: 24 minutes, 24 seconds)
2025-05-06 07:06:04,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 07:06:05,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 173.62537 ± 13.963
2025-05-06 07:06:05,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [151.84732, 169.8142, 178.42828, 181.16812, 151.79903, 195.35199, 164.97243, 175.7554, 192.37991, 174.73708]
2025-05-06 07:06:05,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [29.0, 33.0, 35.0, 35.0, 29.0, 38.0, 32.0, 34.0, 37.0, 34.0]
2025-05-06 07:06:05,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 96/100 (estimated time remaining: 20 minutes, 19 seconds)
2025-05-06 07:10:08,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 07:10:09,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 199.26958 ± 95.068
2025-05-06 07:10:09,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [175.33766, 405.75323, 124.89984, 201.14299, 139.99034, 125.20285, 130.20595, 194.60501, 355.34, 140.2179]
2025-05-06 07:10:09,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [34.0, 85.0, 24.0, 39.0, 27.0, 24.0, 25.0, 38.0, 72.0, 27.0]
2025-05-06 07:10:09,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 97/100 (estimated time remaining: 16 minutes, 15 seconds)
2025-05-06 07:14:13,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 07:14:14,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 165.55527 ± 27.302
2025-05-06 07:14:14,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [180.8771, 188.86603, 161.3576, 150.89207, 176.71562, 130.39812, 135.75995, 165.97592, 139.74968, 224.96068]
2025-05-06 07:14:14,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [35.0, 36.0, 31.0, 29.0, 34.0, 25.0, 26.0, 32.0, 27.0, 44.0]
2025-05-06 07:14:14,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 98/100 (estimated time remaining: 12 minutes, 12 seconds)
2025-05-06 07:18:17,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 07:18:18,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 193.01819 ± 51.473
2025-05-06 07:18:18,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [264.51627, 204.1516, 166.53984, 190.93008, 160.77477, 145.58382, 180.33435, 145.73451, 310.65976, 160.95695]
2025-05-06 07:18:18,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [52.0, 40.0, 32.0, 37.0, 31.0, 28.0, 35.0, 28.0, 62.0, 31.0]
2025-05-06 07:18:18,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 99/100 (estimated time remaining: 8 minutes, 8 seconds)
2025-05-06 07:22:21,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 07:22:22,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 172.80534 ± 29.569
2025-05-06 07:22:22,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [221.4382, 171.29727, 217.109, 169.93512, 146.34775, 187.73962, 162.00255, 130.5724, 186.23929, 135.37228]
2025-05-06 07:22:22,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [43.0, 33.0, 42.0, 33.0, 28.0, 36.0, 31.0, 25.0, 36.0, 26.0]
2025-05-06 07:22:22,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1097 [INFO]: Iteration 100/100 (estimated time remaining: 4 minutes, 4 seconds)
2025-05-06 07:26:25,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1112 [DEBUG]: Evaluating for latency SparseU15...
2025-05-06 07:26:26,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1119 [DEBUG]: Total Reward: 158.00162 ± 17.112
2025-05-06 07:26:26,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1120 [DEBUG]: All rewards: [159.6848, 154.63773, 169.4843, 130.1607, 130.89442, 189.47786, 162.11832, 150.18307, 161.56335, 171.81148]
2025-05-06 07:26:26,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1121 [DEBUG]: All trajectory lengths: [31.0, 30.0, 33.0, 25.0, 25.0, 37.0, 31.0, 29.0, 31.0, 33.0]
2025-05-06 07:26:26,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-humanoid):1149 [DEBUG]: Training session finished
