2025-09-16 14:59:23,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.075-delay_24
2025-09-16 14:59:23,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.075-delay_24
2025-09-16 14:59:23,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'24': <latency_env.delayed_mdp.ConstantDelay object at 0x147b195a8850>}
2025-09-16 14:59:23,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 14:59:23,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 14:59:23,523 baseline-bpql-noisepromille75-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=784, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 14:59:23,523 baseline-bpql-noisepromille75-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 14:59:25,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 14:59:25,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 15:01:19,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:01:19,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 172.00462 ± 38.502
2025-09-16 15:01:19,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [149.47809, 123.88541, 154.3456, 181.24266, 193.87256, 114.59763, 165.11781, 255.2485, 183.31587, 198.94205]
2025-09-16 15:01:19,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 24.0, 30.0, 38.0, 40.0, 22.0, 32.0, 49.0, 38.0, 40.0]
2025-09-16 15:01:19,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (172.00) for latency 24
2025-09-16 15:01:19,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 9 minutes, 5 seconds)
2025-09-16 15:03:22,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:03:23,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 346.29022 ± 109.184
2025-09-16 15:03:23,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [164.95792, 323.01474, 428.80362, 294.9861, 460.41556, 134.66312, 413.34402, 423.67734, 404.21805, 414.8218]
2025-09-16 15:03:23,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 62.0, 81.0, 56.0, 88.0, 26.0, 86.0, 85.0, 75.0, 78.0]
2025-09-16 15:03:23,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (346.29) for latency 24
2025-09-16 15:03:23,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 14 minutes, 35 seconds)
2025-09-16 15:05:26,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:05:26,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 257.32660 ± 144.655
2025-09-16 15:05:26,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [336.28098, 159.66286, 145.80319, 125.33069, 451.44696, 304.2045, 113.89163, 140.98682, 239.4826, 556.1759]
2025-09-16 15:05:26,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [67.0, 31.0, 28.0, 24.0, 89.0, 59.0, 22.0, 27.0, 46.0, 107.0]
2025-09-16 15:05:26,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 14 minutes, 53 seconds)
2025-09-16 15:07:29,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:07:30,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 201.43106 ± 90.570
2025-09-16 15:07:30,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [215.57402, 416.93195, 130.12447, 299.31845, 243.69969, 140.24629, 129.80429, 145.93954, 167.68233, 124.989334]
2025-09-16 15:07:30,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [41.0, 82.0, 25.0, 61.0, 47.0, 27.0, 25.0, 28.0, 32.0, 24.0]
2025-09-16 15:07:30,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 13 minutes, 59 seconds)
2025-09-16 15:09:32,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:09:32,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 197.82965 ± 92.809
2025-09-16 15:09:32,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [339.6163, 145.09398, 109.09859, 305.42236, 130.49782, 139.20001, 150.55461, 129.19266, 165.03188, 364.58835]
2025-09-16 15:09:32,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [68.0, 28.0, 21.0, 60.0, 25.0, 27.0, 29.0, 25.0, 32.0, 73.0]
2025-09-16 15:09:32,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 12 minutes, 23 seconds)
2025-09-16 15:11:34,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:11:35,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 258.62802 ± 135.916
2025-09-16 15:11:35,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [114.101814, 146.53824, 130.22252, 499.98517, 150.85619, 135.51463, 360.61343, 268.89777, 340.46988, 439.08063]
2025-09-16 15:11:35,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 28.0, 25.0, 98.0, 29.0, 26.0, 74.0, 57.0, 77.0, 87.0]
2025-09-16 15:11:35,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 12 minutes, 55 seconds)
2025-09-16 15:13:37,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:13:37,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 174.34329 ± 61.250
2025-09-16 15:13:37,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [165.19962, 146.52353, 212.25452, 154.69156, 130.67065, 135.22934, 241.22134, 124.74093, 114.1808, 318.72052]
2025-09-16 15:13:37,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 28.0, 41.0, 30.0, 25.0, 26.0, 47.0, 24.0, 22.0, 62.0]
2025-09-16 15:13:37,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 10 minutes, 26 seconds)
2025-09-16 15:15:38,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:15:39,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 282.32022 ± 131.908
2025-09-16 15:15:39,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [139.33362, 183.9828, 450.59683, 347.91647, 493.46802, 139.05919, 190.56271, 385.21497, 135.65717, 357.41013]
2025-09-16 15:15:39,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 36.0, 87.0, 68.0, 100.0, 27.0, 37.0, 77.0, 26.0, 69.0]
2025-09-16 15:15:39,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 7 minutes, 44 seconds)
2025-09-16 15:17:40,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:17:41,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 242.58432 ± 149.798
2025-09-16 15:17:41,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [129.81339, 152.02997, 594.7823, 135.70479, 129.84566, 339.92603, 344.9684, 139.85954, 118.89733, 340.0158]
2025-09-16 15:17:41,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 29.0, 120.0, 26.0, 25.0, 68.0, 71.0, 27.0, 23.0, 68.0]
2025-09-16 15:17:41,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 3 hours, 5 minutes, 17 seconds)
2025-09-16 15:19:42,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:19:42,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 182.37331 ± 58.403
2025-09-16 15:19:42,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [178.96103, 321.76068, 185.70255, 161.84999, 259.85977, 155.59666, 124.20497, 144.81004, 144.87256, 146.1147]
2025-09-16 15:19:42,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 63.0, 36.0, 31.0, 52.0, 30.0, 24.0, 28.0, 28.0, 28.0]
2025-09-16 15:19:42,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 3 hours, 3 minutes)
2025-09-16 15:21:44,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:21:44,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 166.73055 ± 50.526
2025-09-16 15:21:44,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [130.09651, 205.421, 155.40756, 303.2494, 156.09024, 144.3722, 151.27992, 166.08183, 130.63261, 124.67413]
2025-09-16 15:21:44,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 40.0, 30.0, 60.0, 30.0, 28.0, 29.0, 32.0, 25.0, 24.0]
2025-09-16 15:21:44,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 3 hours, 40 seconds)
2025-09-16 15:23:48,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:23:49,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 182.39023 ± 72.804
2025-09-16 15:23:49,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [119.892426, 323.58258, 139.92853, 139.23053, 140.90457, 180.13278, 129.5006, 154.97209, 324.183, 171.57516]
2025-09-16 15:23:49,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 65.0, 27.0, 27.0, 27.0, 35.0, 25.0, 30.0, 67.0, 33.0]
2025-09-16 15:23:49,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 59 minutes, 18 seconds)
2025-09-16 15:25:52,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:25:53,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 164.47331 ± 43.700
2025-09-16 15:25:53,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [163.54187, 134.60088, 141.41223, 280.82672, 151.02829, 144.30139, 129.60516, 203.2858, 160.02034, 136.11053]
2025-09-16 15:25:53,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 26.0, 27.0, 55.0, 29.0, 28.0, 25.0, 40.0, 31.0, 26.0]
2025-09-16 15:25:53,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 58 minutes, 6 seconds)
2025-09-16 15:27:57,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:27:57,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 164.03172 ± 57.343
2025-09-16 15:27:57,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [134.98457, 191.38339, 150.65964, 323.2422, 125.129456, 129.54555, 145.219, 140.50198, 120.25024, 179.40126]
2025-09-16 15:27:57,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 38.0, 29.0, 67.0, 24.0, 25.0, 28.0, 27.0, 23.0, 35.0]
2025-09-16 15:27:57,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 56 minutes, 43 seconds)
2025-09-16 15:30:00,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:30:01,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 169.74167 ± 49.937
2025-09-16 15:30:01,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [162.00644, 183.4145, 168.16264, 161.5286, 141.26562, 119.664215, 311.46472, 151.15517, 144.85094, 153.90387]
2025-09-16 15:30:01,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 36.0, 33.0, 31.0, 27.0, 23.0, 65.0, 29.0, 28.0, 30.0]
2025-09-16 15:30:01,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 55 minutes, 15 seconds)
2025-09-16 15:32:04,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:32:04,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 202.87218 ± 78.567
2025-09-16 15:32:04,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [174.89362, 182.48535, 210.5927, 135.44237, 194.1455, 114.43721, 322.75378, 374.4713, 144.12294, 175.37695]
2025-09-16 15:32:04,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 36.0, 43.0, 26.0, 38.0, 22.0, 64.0, 77.0, 28.0, 35.0]
2025-09-16 15:32:04,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 53 minutes, 37 seconds)
2025-09-16 15:34:03,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:34:04,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 162.62955 ± 13.005
2025-09-16 15:34:04,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [157.0299, 188.29495, 143.92699, 154.71648, 168.23956, 164.16939, 179.63326, 163.75139, 159.94894, 146.58449]
2025-09-16 15:34:04,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 37.0, 28.0, 30.0, 33.0, 32.0, 35.0, 32.0, 31.0, 28.0]
2025-09-16 15:34:04,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 50 minutes, 15 seconds)
2025-09-16 15:36:04,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:36:04,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 154.74217 ± 16.940
2025-09-16 15:36:04,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [174.96265, 137.15416, 150.14949, 149.8976, 150.37857, 140.54387, 167.20027, 160.80873, 187.0675, 129.25888]
2025-09-16 15:36:04,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 27.0, 29.0, 29.0, 29.0, 27.0, 32.0, 32.0, 37.0, 25.0]
2025-09-16 15:36:04,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 47 minutes, 10 seconds)
2025-09-16 15:38:04,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:38:05,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 153.89427 ± 17.923
2025-09-16 15:38:05,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [144.45618, 194.75528, 130.26529, 160.73845, 139.19867, 149.54243, 150.28549, 155.57152, 140.0724, 174.05717]
2025-09-16 15:38:05,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 38.0, 25.0, 31.0, 27.0, 29.0, 29.0, 30.0, 27.0, 34.0]
2025-09-16 15:38:05,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 44 minutes, 5 seconds)
2025-09-16 15:40:05,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:40:05,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 176.78732 ± 56.569
2025-09-16 15:40:05,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [136.10805, 165.52415, 141.86372, 333.1604, 171.91963, 190.32779, 159.7511, 140.2192, 129.78369, 199.21547]
2025-09-16 15:40:05,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 32.0, 27.0, 68.0, 33.0, 37.0, 31.0, 27.0, 25.0, 39.0]
2025-09-16 15:40:05,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 41 minutes, 9 seconds)
2025-09-16 15:42:05,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:42:06,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 140.40244 ± 18.768
2025-09-16 15:42:06,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [109.08523, 139.21825, 135.58003, 162.4967, 108.56369, 144.39972, 162.63155, 162.77534, 140.88823, 138.38577]
2025-09-16 15:42:06,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 27.0, 26.0, 32.0, 21.0, 28.0, 32.0, 32.0, 27.0, 27.0]
2025-09-16 15:42:06,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 38 minutes, 22 seconds)
2025-09-16 15:44:05,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:44:06,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 191.63466 ± 75.631
2025-09-16 15:44:06,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [129.7301, 143.73964, 251.92433, 148.8245, 119.842995, 160.52902, 173.77231, 136.20448, 321.14722, 330.632]
2025-09-16 15:44:06,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 28.0, 50.0, 29.0, 23.0, 31.0, 33.0, 26.0, 65.0, 67.0]
2025-09-16 15:44:06,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 36 minutes, 26 seconds)
2025-09-16 15:46:06,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:46:06,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 181.97452 ± 49.607
2025-09-16 15:46:06,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [315.69406, 196.05165, 209.34283, 179.31308, 161.88803, 144.54836, 145.3143, 146.06255, 146.00146, 175.52896]
2025-09-16 15:46:06,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [64.0, 39.0, 41.0, 35.0, 31.0, 28.0, 28.0, 28.0, 28.0, 35.0]
2025-09-16 15:46:06,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 34 minutes, 32 seconds)
2025-09-16 15:48:06,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:48:07,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 182.29259 ± 63.929
2025-09-16 15:48:07,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [155.23488, 149.08998, 297.5811, 310.83792, 174.17456, 150.02563, 186.85522, 154.77126, 129.48657, 114.86872]
2025-09-16 15:48:07,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 29.0, 62.0, 62.0, 34.0, 29.0, 38.0, 30.0, 25.0, 22.0]
2025-09-16 15:48:07,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 32 minutes, 29 seconds)
2025-09-16 15:50:06,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:50:07,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 192.70476 ± 80.142
2025-09-16 15:50:07,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [108.70197, 160.47466, 168.72227, 114.1482, 344.91418, 161.27287, 159.32439, 167.15587, 345.8326, 196.50052]
2025-09-16 15:50:07,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 31.0, 33.0, 22.0, 70.0, 31.0, 31.0, 32.0, 69.0, 39.0]
2025-09-16 15:50:07,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 30 minutes, 25 seconds)
2025-09-16 15:52:07,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:52:08,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 170.05818 ± 53.187
2025-09-16 15:52:08,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [155.81784, 141.14905, 140.35004, 324.4639, 154.90834, 139.05722, 140.60765, 177.93845, 175.60118, 150.68805]
2025-09-16 15:52:08,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 27.0, 27.0, 64.0, 30.0, 27.0, 27.0, 35.0, 34.0, 29.0]
2025-09-16 15:52:08,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 28 minutes, 31 seconds)
2025-09-16 15:54:07,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:54:08,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 184.79947 ± 66.503
2025-09-16 15:54:08,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [154.51196, 148.84776, 344.562, 139.57819, 169.98068, 170.08752, 166.25879, 275.433, 113.62854, 165.10622]
2025-09-16 15:54:08,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 29.0, 70.0, 27.0, 33.0, 33.0, 32.0, 56.0, 22.0, 32.0]
2025-09-16 15:54:08,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 26 minutes, 33 seconds)
2025-09-16 15:56:09,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:56:09,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 168.78835 ± 54.028
2025-09-16 15:56:09,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [130.63306, 139.81328, 141.536, 165.57224, 168.2626, 145.31097, 326.58044, 166.49336, 159.31651, 144.36494]
2025-09-16 15:56:09,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 27.0, 27.0, 32.0, 33.0, 28.0, 67.0, 33.0, 31.0, 28.0]
2025-09-16 15:56:09,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 24 minutes, 39 seconds)
2025-09-16 15:58:09,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 15:58:09,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 192.36678 ± 71.446
2025-09-16 15:58:09,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [155.71248, 174.40785, 168.27538, 170.31923, 145.24385, 165.49622, 129.98439, 270.19775, 378.63055, 165.39992]
2025-09-16 15:58:09,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 34.0, 33.0, 33.0, 28.0, 32.0, 25.0, 54.0, 81.0, 32.0]
2025-09-16 15:58:09,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 22 minutes, 34 seconds)
2025-09-16 16:00:09,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:00:10,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 159.74356 ± 20.381
2025-09-16 16:00:10,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [143.59149, 184.3495, 179.62433, 139.86484, 177.84901, 118.55525, 150.85114, 165.06737, 158.84647, 178.83621]
2025-09-16 16:00:10,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 37.0, 35.0, 27.0, 35.0, 23.0, 29.0, 32.0, 31.0, 35.0]
2025-09-16 16:00:10,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 20 minutes, 38 seconds)
2025-09-16 16:02:11,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:02:11,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 153.23701 ± 16.880
2025-09-16 16:02:11,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [155.38611, 164.88817, 159.23976, 163.10025, 125.33558, 163.17421, 172.92706, 119.68672, 163.8793, 144.75308]
2025-09-16 16:02:11,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 32.0, 31.0, 32.0, 24.0, 32.0, 34.0, 23.0, 32.0, 28.0]
2025-09-16 16:02:11,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 18 minutes, 46 seconds)
2025-09-16 16:04:09,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:04:10,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 174.71779 ± 45.835
2025-09-16 16:04:10,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [168.84958, 182.51985, 160.19562, 306.74918, 165.34872, 150.708, 140.08621, 175.89182, 151.67058, 145.15846]
2025-09-16 16:04:10,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 36.0, 31.0, 62.0, 32.0, 29.0, 27.0, 34.0, 29.0, 28.0]
2025-09-16 16:04:10,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 16 minutes, 26 seconds)
2025-09-16 16:06:06,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:06:07,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 180.13655 ± 59.004
2025-09-16 16:06:07,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [140.00896, 162.23254, 155.51152, 354.6585, 180.37775, 154.32071, 169.3487, 164.19308, 160.06859, 160.64507]
2025-09-16 16:06:07,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 31.0, 30.0, 75.0, 35.0, 30.0, 33.0, 32.0, 31.0, 31.0]
2025-09-16 16:06:07,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 13 minutes, 28 seconds)
2025-09-16 16:08:04,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:08:05,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 207.47067 ± 93.926
2025-09-16 16:08:05,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [149.98302, 130.80826, 143.55745, 128.79611, 135.58965, 173.37877, 343.25937, 354.2254, 350.94247, 164.16643]
2025-09-16 16:08:05,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 25.0, 28.0, 25.0, 26.0, 34.0, 69.0, 69.0, 73.0, 32.0]
2025-09-16 16:08:05,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 10 minutes, 59 seconds)
2025-09-16 16:10:02,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:10:02,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 219.17009 ± 107.117
2025-09-16 16:10:02,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [123.82735, 192.49231, 151.40225, 351.8859, 357.57764, 173.1878, 143.6446, 154.50589, 423.3776, 119.7995]
2025-09-16 16:10:02,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 38.0, 29.0, 70.0, 79.0, 34.0, 28.0, 30.0, 90.0, 23.0]
2025-09-16 16:10:02,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 8 minutes, 25 seconds)
2025-09-16 16:12:00,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:12:00,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 182.80580 ± 55.496
2025-09-16 16:12:00,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [270.06012, 134.87961, 308.4329, 135.03775, 164.27693, 163.59984, 155.08109, 149.20656, 168.75868, 178.72453]
2025-09-16 16:12:00,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [55.0, 26.0, 66.0, 26.0, 32.0, 32.0, 30.0, 29.0, 34.0, 35.0]
2025-09-16 16:12:00,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 5 minutes, 40 seconds)
2025-09-16 16:13:58,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:13:59,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 199.07651 ± 78.740
2025-09-16 16:13:59,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [140.762, 382.95343, 154.23451, 145.28278, 155.3889, 268.64792, 158.94542, 280.06384, 135.22069, 169.26553]
2025-09-16 16:13:59,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 80.0, 30.0, 28.0, 30.0, 55.0, 31.0, 59.0, 26.0, 33.0]
2025-09-16 16:13:59,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 3 minutes, 36 seconds)
2025-09-16 16:15:59,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:15:59,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 204.94096 ± 119.395
2025-09-16 16:15:59,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [338.0099, 139.92468, 135.77812, 479.56168, 311.2321, 140.18588, 124.59152, 119.92157, 125.59797, 134.60616]
2025-09-16 16:15:59,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [70.0, 27.0, 26.0, 97.0, 65.0, 27.0, 24.0, 23.0, 24.0, 26.0]
2025-09-16 16:15:59,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 2 minutes, 29 seconds)
2025-09-16 16:17:56,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:17:57,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 161.34831 ± 19.848
2025-09-16 16:17:57,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [180.49814, 179.74698, 125.92456, 161.11597, 144.18776, 146.12865, 154.86143, 154.77626, 168.63272, 197.6107]
2025-09-16 16:17:57,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [36.0, 35.0, 24.0, 31.0, 28.0, 28.0, 30.0, 30.0, 33.0, 38.0]
2025-09-16 16:17:57,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 27 seconds)
2025-09-16 16:19:54,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:19:54,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 177.28159 ± 93.758
2025-09-16 16:19:54,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [197.55473, 134.32825, 129.90762, 140.306, 163.15314, 124.86378, 124.4872, 450.78903, 167.61534, 139.8107]
2025-09-16 16:19:54,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [38.0, 26.0, 25.0, 27.0, 32.0, 24.0, 24.0, 94.0, 33.0, 27.0]
2025-09-16 16:19:54,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 58 minutes, 23 seconds)
2025-09-16 16:21:51,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:21:52,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 148.75327 ± 18.912
2025-09-16 16:21:52,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [173.72893, 170.94144, 120.086174, 144.77586, 144.19038, 167.53583, 114.09535, 146.6409, 150.37949, 155.15831]
2025-09-16 16:21:52,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 34.0, 23.0, 28.0, 28.0, 33.0, 22.0, 28.0, 29.0, 30.0]
2025-09-16 16:21:52,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 56 minutes, 22 seconds)
2025-09-16 16:23:48,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:23:49,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 173.54982 ± 58.536
2025-09-16 16:23:49,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [145.76645, 169.01099, 178.21121, 150.95609, 119.30796, 156.94879, 342.37677, 167.84744, 164.8615, 140.21107]
2025-09-16 16:23:49,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 33.0, 35.0, 29.0, 23.0, 31.0, 69.0, 33.0, 32.0, 27.0]
2025-09-16 16:23:49,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 54 minutes, 7 seconds)
2025-09-16 16:25:46,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:25:46,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 171.66130 ± 53.519
2025-09-16 16:25:46,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [171.30142, 165.546, 120.0172, 146.30373, 169.83807, 124.604256, 149.79762, 167.07552, 180.46442, 321.66486]
2025-09-16 16:25:46,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 32.0, 23.0, 28.0, 33.0, 24.0, 29.0, 33.0, 36.0, 63.0]
2025-09-16 16:25:46,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 51 minutes, 30 seconds)
2025-09-16 16:27:44,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:27:44,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 147.36375 ± 21.023
2025-09-16 16:27:44,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [135.2892, 150.28412, 181.87404, 168.85893, 139.4693, 159.08131, 118.831116, 124.50224, 170.91473, 124.53264]
2025-09-16 16:27:44,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 29.0, 36.0, 33.0, 27.0, 31.0, 23.0, 24.0, 33.0, 24.0]
2025-09-16 16:27:44,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 49 minutes, 34 seconds)
2025-09-16 16:29:44,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:29:44,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 185.47711 ± 81.955
2025-09-16 16:29:44,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [152.8606, 149.53828, 133.47267, 149.25027, 133.3482, 347.05133, 350.56824, 153.87228, 144.14711, 140.66208]
2025-09-16 16:29:44,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 29.0, 26.0, 29.0, 26.0, 71.0, 73.0, 30.0, 28.0, 27.0]
2025-09-16 16:29:44,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 48 minutes, 8 seconds)
2025-09-16 16:31:44,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:31:44,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 149.50594 ± 14.394
2025-09-16 16:31:44,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [135.49629, 129.61122, 162.08548, 124.97403, 150.28247, 155.62004, 158.78757, 173.45663, 150.68933, 154.05632]
2025-09-16 16:31:44,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 25.0, 32.0, 24.0, 29.0, 30.0, 31.0, 35.0, 29.0, 30.0]
2025-09-16 16:31:44,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 46 minutes, 36 seconds)
2025-09-16 16:33:43,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:33:44,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 176.06808 ± 33.511
2025-09-16 16:33:44,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [166.0985, 156.44029, 168.7848, 273.9883, 162.80573, 182.97794, 169.65991, 153.56741, 162.6269, 163.73106]
2025-09-16 16:33:44,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 30.0, 33.0, 57.0, 32.0, 36.0, 33.0, 30.0, 32.0, 32.0]
2025-09-16 16:33:44,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 45 minutes, 4 seconds)
2025-09-16 16:35:39,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:35:40,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 158.99429 ± 13.436
2025-09-16 16:35:40,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [162.8509, 154.50261, 160.10799, 163.822, 134.72588, 173.84926, 176.90137, 139.96962, 172.55327, 150.66006]
2025-09-16 16:35:40,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 30.0, 31.0, 32.0, 26.0, 34.0, 34.0, 27.0, 34.0, 29.0]
2025-09-16 16:35:40,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 42 minutes, 53 seconds)
2025-09-16 16:37:36,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:37:37,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 148.34966 ± 23.142
2025-09-16 16:37:37,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [178.30331, 124.26295, 169.498, 129.35944, 124.547714, 124.67978, 163.81291, 184.73486, 154.4701, 129.82755]
2025-09-16 16:37:37,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 24.0, 33.0, 25.0, 24.0, 24.0, 32.0, 36.0, 30.0, 25.0]
2025-09-16 16:37:37,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 40 minutes, 46 seconds)
2025-09-16 16:39:33,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:39:34,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 150.74593 ± 9.471
2025-09-16 16:39:34,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [160.0038, 171.15195, 148.98888, 148.14551, 142.75012, 154.3804, 140.48753, 139.48526, 144.8795, 157.18633]
2025-09-16 16:39:34,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 34.0, 29.0, 29.0, 28.0, 30.0, 27.0, 27.0, 28.0, 31.0]
2025-09-16 16:39:34,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 38 minutes, 13 seconds)
2025-09-16 16:41:30,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:41:31,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 175.93930 ± 83.209
2025-09-16 16:41:31,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [162.8459, 161.68144, 164.56198, 153.81905, 155.4562, 417.54953, 113.95, 134.70119, 181.1647, 113.66304]
2025-09-16 16:41:31,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 32.0, 32.0, 30.0, 30.0, 89.0, 22.0, 26.0, 37.0, 22.0]
2025-09-16 16:41:31,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 35 minutes, 46 seconds)
2025-09-16 16:43:27,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:43:28,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 197.23056 ± 62.924
2025-09-16 16:43:28,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [165.40222, 166.96352, 321.0474, 154.50095, 210.76271, 311.84454, 171.26535, 129.74472, 186.50226, 154.27184]
2025-09-16 16:43:28,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 33.0, 63.0, 30.0, 42.0, 66.0, 34.0, 25.0, 38.0, 30.0]
2025-09-16 16:43:28,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 33 minutes, 25 seconds)
2025-09-16 16:45:24,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:45:25,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 160.60245 ± 61.444
2025-09-16 16:45:25,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [155.35257, 145.62927, 125.60186, 145.84807, 130.30023, 144.11429, 341.1573, 120.1449, 162.36523, 135.51076]
2025-09-16 16:45:25,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 28.0, 24.0, 28.0, 25.0, 28.0, 70.0, 23.0, 32.0, 26.0]
2025-09-16 16:45:25,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 31 minutes, 36 seconds)
2025-09-16 16:47:21,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:47:22,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 143.88809 ± 18.274
2025-09-16 16:47:22,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [139.34653, 143.67989, 160.9682, 119.86058, 167.24028, 114.146576, 134.95709, 172.33406, 151.22592, 135.12183]
2025-09-16 16:47:22,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 28.0, 31.0, 23.0, 33.0, 22.0, 26.0, 34.0, 30.0, 26.0]
2025-09-16 16:47:22,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 29 minutes, 40 seconds)
2025-09-16 16:49:18,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:49:18,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 169.26543 ± 46.555
2025-09-16 16:49:18,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [145.30063, 158.58586, 307.26083, 149.65755, 159.51302, 145.57231, 153.87433, 167.5421, 160.12997, 145.2177]
2025-09-16 16:49:18,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 31.0, 61.0, 29.0, 31.0, 28.0, 30.0, 33.0, 31.0, 28.0]
2025-09-16 16:49:18,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 27 minutes, 42 seconds)
2025-09-16 16:51:14,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:51:15,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 178.03226 ± 69.813
2025-09-16 16:51:15,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [114.43543, 152.04509, 140.3278, 143.91205, 307.52066, 321.9747, 144.57487, 139.61958, 141.44362, 174.46881]
2025-09-16 16:51:15,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 30.0, 27.0, 28.0, 64.0, 69.0, 28.0, 27.0, 27.0, 34.0]
2025-09-16 16:51:15,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 25 minutes, 42 seconds)
2025-09-16 16:53:11,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:53:12,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 146.94716 ± 15.293
2025-09-16 16:53:12,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [135.4228, 181.5149, 136.17603, 144.32153, 160.036, 124.713684, 135.65845, 156.2065, 149.71994, 145.70183]
2025-09-16 16:53:12,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 36.0, 26.0, 28.0, 31.0, 24.0, 26.0, 30.0, 29.0, 28.0]
2025-09-16 16:53:12,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 23 minutes, 42 seconds)
2025-09-16 16:55:08,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:55:08,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 187.62125 ± 68.956
2025-09-16 16:55:08,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [162.30794, 124.698845, 158.88437, 158.2031, 161.69057, 125.01138, 167.78613, 313.2802, 330.11148, 174.23851]
2025-09-16 16:55:08,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 24.0, 31.0, 31.0, 32.0, 24.0, 33.0, 63.0, 68.0, 35.0]
2025-09-16 16:55:08,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 21 minutes, 43 seconds)
2025-09-16 16:57:05,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:57:06,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 149.85008 ± 21.732
2025-09-16 16:57:06,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [108.96576, 178.15237, 153.8285, 119.62595, 165.25093, 145.3021, 149.69641, 139.56743, 158.57834, 179.53294]
2025-09-16 16:57:06,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 35.0, 30.0, 23.0, 32.0, 28.0, 29.0, 27.0, 31.0, 35.0]
2025-09-16 16:57:06,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 19 minutes, 47 seconds)
2025-09-16 16:59:02,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 16:59:02,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 151.80367 ± 14.807
2025-09-16 16:59:02,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [130.9202, 145.5448, 163.6842, 151.26324, 125.36445, 166.60167, 174.50206, 161.53996, 145.11589, 153.50012]
2025-09-16 16:59:02,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 28.0, 32.0, 29.0, 24.0, 33.0, 34.0, 31.0, 28.0, 30.0]
2025-09-16 16:59:02,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 17 minutes, 50 seconds)
2025-09-16 17:00:58,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:00:59,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 155.00829 ± 16.066
2025-09-16 17:00:59,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [160.27547, 181.30518, 144.51022, 125.67894, 150.63187, 178.85905, 163.48866, 149.5053, 155.00447, 140.82378]
2025-09-16 17:00:59,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 36.0, 28.0, 24.0, 29.0, 35.0, 32.0, 29.0, 30.0, 27.0]
2025-09-16 17:00:59,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 15 minutes, 54 seconds)
2025-09-16 17:02:55,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:02:56,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 169.51375 ± 43.183
2025-09-16 17:02:56,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [178.45958, 163.70938, 172.34456, 165.7677, 165.31252, 124.65771, 124.560486, 162.7839, 287.92252, 149.61925]
2025-09-16 17:02:56,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 32.0, 34.0, 32.0, 32.0, 24.0, 24.0, 32.0, 58.0, 29.0]
2025-09-16 17:02:56,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 13 minutes, 57 seconds)
2025-09-16 17:04:52,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:04:52,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 169.91466 ± 48.799
2025-09-16 17:04:52,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [124.38438, 145.5155, 158.14879, 140.82855, 159.29903, 159.99095, 308.05655, 190.3317, 150.02484, 162.5663]
2025-09-16 17:04:52,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 28.0, 31.0, 27.0, 31.0, 31.0, 64.0, 38.0, 29.0, 32.0]
2025-09-16 17:04:52,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 11 minutes, 59 seconds)
2025-09-16 17:06:48,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:06:49,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 174.80446 ± 90.676
2025-09-16 17:06:49,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [444.75153, 145.53091, 130.63068, 148.34758, 145.22932, 163.5841, 130.21313, 160.59236, 148.79497, 130.36995]
2025-09-16 17:06:49,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [95.0, 28.0, 25.0, 29.0, 28.0, 32.0, 25.0, 31.0, 29.0, 25.0]
2025-09-16 17:06:49,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 10 minutes, 1 second)
2025-09-16 17:08:45,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:08:46,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 140.32924 ± 8.822
2025-09-16 17:08:46,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [150.55345, 129.73494, 135.29366, 135.83165, 150.47597, 145.88441, 125.444176, 134.54602, 150.63266, 144.89539]
2025-09-16 17:08:46,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 25.0, 26.0, 26.0, 29.0, 28.0, 24.0, 26.0, 29.0, 28.0]
2025-09-16 17:08:46,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 8 minutes, 6 seconds)
2025-09-16 17:10:42,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:10:42,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 147.11787 ± 21.080
2025-09-16 17:10:42,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [139.36986, 161.20065, 155.11679, 103.33604, 163.84299, 154.2849, 114.314575, 158.88959, 172.76723, 148.0561]
2025-09-16 17:10:42,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 32.0, 30.0, 20.0, 32.0, 30.0, 22.0, 31.0, 34.0, 29.0]
2025-09-16 17:10:42,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 6 minutes, 9 seconds)
2025-09-16 17:12:39,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:12:39,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 190.76407 ± 69.168
2025-09-16 17:12:39,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [161.11746, 320.04178, 162.79384, 332.98557, 163.912, 171.24648, 158.54619, 119.99189, 156.21193, 160.79353]
2025-09-16 17:12:39,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 64.0, 32.0, 70.0, 33.0, 34.0, 31.0, 23.0, 30.0, 31.0]
2025-09-16 17:12:39,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 4 minutes, 13 seconds)
2025-09-16 17:14:36,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:14:36,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 155.93164 ± 19.389
2025-09-16 17:14:36,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [174.24994, 129.35919, 180.2042, 167.5082, 166.51642, 140.57367, 168.03624, 146.31227, 166.59448, 119.96184]
2025-09-16 17:14:36,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 25.0, 35.0, 33.0, 33.0, 27.0, 33.0, 28.0, 33.0, 23.0]
2025-09-16 17:14:36,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 2 minutes, 17 seconds)
2025-09-16 17:16:32,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:16:33,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 163.51906 ± 71.623
2025-09-16 17:16:33,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [129.94226, 168.34253, 109.17208, 370.8911, 159.75427, 158.94768, 145.88176, 113.68815, 148.9677, 129.60304]
2025-09-16 17:16:33,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 33.0, 21.0, 76.0, 31.0, 31.0, 28.0, 22.0, 29.0, 25.0]
2025-09-16 17:16:33,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 19 seconds)
2025-09-16 17:18:29,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:18:29,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 153.39456 ± 17.831
2025-09-16 17:18:29,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [152.2547, 160.9348, 135.47745, 149.08888, 130.74521, 129.8299, 155.70659, 162.76686, 191.84879, 165.2925]
2025-09-16 17:18:29,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 31.0, 26.0, 29.0, 25.0, 25.0, 30.0, 33.0, 38.0, 33.0]
2025-09-16 17:18:29,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 58 minutes, 19 seconds)
2025-09-16 17:20:25,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:20:26,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 165.59195 ± 23.477
2025-09-16 17:20:26,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [140.31479, 179.71588, 154.6569, 155.21866, 134.55838, 185.22447, 211.72668, 154.56677, 149.85367, 190.08328]
2025-09-16 17:20:26,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 36.0, 30.0, 30.0, 26.0, 37.0, 42.0, 30.0, 29.0, 37.0]
2025-09-16 17:20:26,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 56 minutes, 22 seconds)
2025-09-16 17:22:22,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:22:23,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 202.62459 ± 70.195
2025-09-16 17:22:23,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [300.84253, 169.93227, 162.60309, 131.2268, 160.04297, 296.1311, 150.08965, 168.05305, 160.05948, 327.26505]
2025-09-16 17:22:23,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [61.0, 33.0, 32.0, 25.0, 31.0, 59.0, 29.0, 34.0, 31.0, 66.0]
2025-09-16 17:22:23,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 54 minutes, 26 seconds)
2025-09-16 17:24:19,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:24:19,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 154.32454 ± 17.863
2025-09-16 17:24:19,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [155.0239, 153.34561, 164.60263, 149.43878, 119.589294, 168.35243, 175.29384, 170.25598, 124.485634, 162.85744]
2025-09-16 17:24:19,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 30.0, 32.0, 29.0, 23.0, 33.0, 35.0, 34.0, 24.0, 32.0]
2025-09-16 17:24:19,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 52 minutes, 30 seconds)
2025-09-16 17:26:16,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:26:16,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 155.94772 ± 17.257
2025-09-16 17:26:16,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [144.82208, 147.62439, 168.29735, 176.5695, 168.34416, 161.84659, 128.88531, 168.96452, 125.16537, 168.9579]
2025-09-16 17:26:16,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 29.0, 33.0, 35.0, 34.0, 32.0, 25.0, 33.0, 24.0, 33.0]
2025-09-16 17:26:16,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 50 minutes, 32 seconds)
2025-09-16 17:28:12,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:28:13,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 168.58676 ± 57.025
2025-09-16 17:28:13,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [140.27316, 156.30064, 174.04474, 144.60838, 134.815, 163.7157, 332.29977, 114.754715, 159.06502, 165.99051]
2025-09-16 17:28:13,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 31.0, 34.0, 28.0, 26.0, 32.0, 67.0, 22.0, 31.0, 33.0]
2025-09-16 17:28:13,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 48 minutes, 38 seconds)
2025-09-16 17:30:09,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:30:10,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 153.56032 ± 18.561
2025-09-16 17:30:10,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [162.81183, 124.53691, 135.47206, 133.87743, 170.80882, 144.31915, 165.60353, 184.96114, 168.22867, 144.98369]
2025-09-16 17:30:10,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 24.0, 26.0, 26.0, 34.0, 28.0, 33.0, 36.0, 33.0, 28.0]
2025-09-16 17:30:10,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 46 minutes, 43 seconds)
2025-09-16 17:32:06,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:32:07,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 149.17409 ± 15.077
2025-09-16 17:32:07,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [117.77191, 145.02728, 159.82367, 160.47379, 165.74867, 169.67294, 150.56387, 138.29054, 149.78766, 134.58049]
2025-09-16 17:32:07,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 28.0, 31.0, 31.0, 32.0, 33.0, 29.0, 27.0, 29.0, 26.0]
2025-09-16 17:32:07,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 44 minutes, 45 seconds)
2025-09-16 17:34:03,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:34:04,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 153.14203 ± 23.048
2025-09-16 17:34:04,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [135.12224, 155.8693, 159.56766, 156.20638, 168.11766, 145.14209, 108.43963, 202.26381, 140.0043, 160.6873]
2025-09-16 17:34:04,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 30.0, 31.0, 30.0, 33.0, 28.0, 21.0, 40.0, 27.0, 32.0]
2025-09-16 17:34:04,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 42 minutes, 51 seconds)
2025-09-16 17:36:00,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:36:00,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 150.09293 ± 19.150
2025-09-16 17:36:00,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [156.99092, 173.45653, 166.9668, 103.246376, 140.1056, 159.65579, 134.41714, 153.60754, 162.1316, 150.35098]
2025-09-16 17:36:00,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 35.0, 33.0, 20.0, 27.0, 31.0, 26.0, 30.0, 32.0, 29.0]
2025-09-16 17:36:00,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 40 minutes, 54 seconds)
2025-09-16 17:37:57,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:37:57,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 155.30228 ± 12.539
2025-09-16 17:37:57,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [172.45505, 172.6385, 164.3898, 130.22156, 158.95645, 153.24475, 152.1865, 154.14449, 154.87811, 139.9075]
2025-09-16 17:37:57,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 34.0, 33.0, 25.0, 31.0, 30.0, 30.0, 30.0, 30.0, 27.0]
2025-09-16 17:37:57,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 38 minutes, 59 seconds)
2025-09-16 17:39:54,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:39:55,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 188.65138 ± 58.083
2025-09-16 17:39:55,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [162.99391, 154.06099, 150.3062, 277.0903, 196.44264, 160.48717, 134.48933, 178.6629, 151.66661, 320.31375]
2025-09-16 17:39:55,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 30.0, 29.0, 58.0, 38.0, 32.0, 26.0, 36.0, 29.0, 66.0]
2025-09-16 17:39:55,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 37 minutes, 2 seconds)
2025-09-16 17:41:51,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:41:52,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 192.97617 ± 101.117
2025-09-16 17:41:52,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [146.27478, 139.24533, 154.14517, 164.61848, 330.20193, 150.87361, 155.87039, 124.6127, 120.13078, 443.78842]
2025-09-16 17:41:52,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 27.0, 30.0, 32.0, 69.0, 29.0, 30.0, 24.0, 23.0, 88.0]
2025-09-16 17:41:52,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 35 minutes, 6 seconds)
2025-09-16 17:43:48,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:43:48,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 155.79427 ± 21.418
2025-09-16 17:43:48,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [162.68715, 189.70972, 151.08215, 176.16803, 114.1979, 155.88179, 165.06136, 171.84769, 141.44403, 129.86278]
2025-09-16 17:43:48,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 38.0, 29.0, 36.0, 22.0, 30.0, 33.0, 34.0, 27.0, 25.0]
2025-09-16 17:43:48,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 33 minutes, 7 seconds)
2025-09-16 17:45:45,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:45:46,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 183.05319 ± 70.058
2025-09-16 17:45:46,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [385.10144, 156.05562, 164.48956, 162.2862, 149.5534, 168.0048, 185.08464, 113.829025, 187.35054, 158.7766]
2025-09-16 17:45:46,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 30.0, 33.0, 32.0, 29.0, 33.0, 37.0, 22.0, 37.0, 31.0]
2025-09-16 17:45:46,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 31 minutes, 12 seconds)
2025-09-16 17:47:42,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:47:42,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 174.75943 ± 51.851
2025-09-16 17:47:42,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [170.3716, 148.60783, 149.34108, 178.25801, 325.97458, 167.40175, 157.20786, 134.64899, 150.21635, 165.56635]
2025-09-16 17:47:42,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 29.0, 29.0, 36.0, 69.0, 33.0, 31.0, 26.0, 29.0, 32.0]
2025-09-16 17:47:42,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 29 minutes, 14 seconds)
2025-09-16 17:49:38,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:49:39,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 151.76939 ± 15.051
2025-09-16 17:49:39,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [162.11627, 149.81854, 148.81927, 120.05156, 156.18956, 149.44939, 140.52203, 172.85233, 144.08916, 173.78587]
2025-09-16 17:49:39,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 29.0, 29.0, 23.0, 30.0, 29.0, 27.0, 34.0, 28.0, 34.0]
2025-09-16 17:49:39,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 27 minutes, 15 seconds)
2025-09-16 17:51:35,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:51:36,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 148.63443 ± 17.215
2025-09-16 17:51:36,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [155.4631, 163.2781, 182.13452, 155.8168, 133.5835, 140.40143, 140.23358, 149.79778, 151.01213, 114.623405]
2025-09-16 17:51:36,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 32.0, 37.0, 30.0, 26.0, 27.0, 27.0, 29.0, 30.0, 22.0]
2025-09-16 17:51:36,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 25 minutes, 18 seconds)
2025-09-16 17:53:32,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:53:33,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 153.78966 ± 13.328
2025-09-16 17:53:33,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [167.29637, 150.60165, 164.07051, 156.17804, 171.43764, 130.1879, 156.99872, 159.44159, 130.20493, 151.47925]
2025-09-16 17:53:33,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 29.0, 32.0, 31.0, 34.0, 25.0, 31.0, 31.0, 25.0, 29.0]
2025-09-16 17:53:33,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 23 minutes, 21 seconds)
2025-09-16 17:55:29,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:55:29,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 150.28091 ± 20.769
2025-09-16 17:55:29,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [139.8757, 134.55052, 172.95297, 157.8915, 192.88629, 162.2535, 134.84695, 134.08437, 120.00288, 153.46434]
2025-09-16 17:55:29,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 26.0, 34.0, 31.0, 38.0, 32.0, 26.0, 26.0, 23.0, 30.0]
2025-09-16 17:55:29,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 21 minutes, 23 seconds)
2025-09-16 17:57:26,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:57:26,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 155.80621 ± 39.150
2025-09-16 17:57:26,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [143.31546, 149.75002, 135.51236, 150.11754, 138.32848, 270.30695, 124.1771, 158.13364, 144.15015, 144.27039]
2025-09-16 17:57:26,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 29.0, 26.0, 29.0, 27.0, 53.0, 24.0, 31.0, 28.0, 28.0]
2025-09-16 17:57:26,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 19 minutes, 27 seconds)
2025-09-16 17:59:22,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 17:59:23,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 148.00626 ± 9.955
2025-09-16 17:59:23,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [145.20139, 140.0228, 148.16843, 163.01712, 155.4305, 139.0085, 130.00026, 144.12892, 161.83377, 153.25099]
2025-09-16 17:59:23,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 27.0, 29.0, 32.0, 30.0, 27.0, 25.0, 28.0, 32.0, 30.0]
2025-09-16 17:59:23,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 17 minutes, 31 seconds)
2025-09-16 18:01:19,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:01:20,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 169.56979 ± 64.516
2025-09-16 18:01:20,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [146.31638, 119.48818, 131.25365, 155.86705, 144.40546, 359.01382, 161.59627, 162.26006, 157.79684, 157.70003]
2025-09-16 18:01:20,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 23.0, 25.0, 30.0, 28.0, 71.0, 32.0, 32.0, 31.0, 31.0]
2025-09-16 18:01:20,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 15 minutes, 34 seconds)
2025-09-16 18:03:16,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:03:17,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 138.52098 ± 8.361
2025-09-16 18:03:17,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [139.14923, 139.34753, 150.5259, 145.45914, 129.4253, 144.27118, 119.8005, 134.97174, 138.54779, 143.7115]
2025-09-16 18:03:17,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 27.0, 29.0, 28.0, 25.0, 28.0, 23.0, 26.0, 27.0, 28.0]
2025-09-16 18:03:17,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 13 minutes, 37 seconds)
2025-09-16 18:05:13,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:05:13,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 145.72818 ± 14.290
2025-09-16 18:05:13,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [154.6082, 153.95009, 124.62912, 130.1446, 148.8654, 158.33455, 139.41805, 154.5981, 124.92571, 167.80801]
2025-09-16 18:05:13,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 30.0, 24.0, 25.0, 29.0, 31.0, 27.0, 30.0, 24.0, 33.0]
2025-09-16 18:05:13,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 11 minutes, 40 seconds)
2025-09-16 18:07:07,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:07:08,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 151.53036 ± 22.866
2025-09-16 18:07:08,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [154.73813, 185.14848, 149.34799, 163.9706, 176.55513, 176.05432, 140.47952, 124.705444, 113.990685, 130.31323]
2025-09-16 18:07:08,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 37.0, 29.0, 32.0, 35.0, 35.0, 27.0, 24.0, 22.0, 25.0]
2025-09-16 18:07:08,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes, 41 seconds)
2025-09-16 18:09:02,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:09:03,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 160.77530 ± 18.155
2025-09-16 18:09:03,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [167.54768, 196.35213, 167.08629, 129.57169, 153.00447, 167.42784, 168.6567, 149.0173, 171.69615, 137.39279]
2025-09-16 18:09:03,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 39.0, 33.0, 25.0, 30.0, 34.0, 34.0, 29.0, 34.0, 27.0]
2025-09-16 18:09:03,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 44 seconds)
2025-09-16 18:11:00,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:11:01,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 151.57974 ± 15.923
2025-09-16 18:11:01,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [129.42145, 134.81844, 169.56085, 155.23163, 159.13216, 135.37468, 135.02177, 156.91248, 162.611, 177.71317]
2025-09-16 18:11:01,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 26.0, 33.0, 30.0, 31.0, 26.0, 26.0, 31.0, 32.0, 36.0]
2025-09-16 18:11:01,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 48 seconds)
2025-09-16 18:12:59,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:13:00,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 143.90036 ± 15.863
2025-09-16 18:13:00,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [143.84525, 160.41774, 108.65432, 145.3548, 162.95424, 155.09282, 149.43176, 153.34471, 130.96909, 128.93867]
2025-09-16 18:13:00,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 31.0, 21.0, 28.0, 32.0, 30.0, 29.0, 30.0, 25.0, 25.0]
2025-09-16 18:13:00,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 53 seconds)
2025-09-16 18:14:56,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:14:57,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 152.83633 ± 12.905
2025-09-16 18:14:57,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [139.39024, 158.60251, 145.8302, 181.7579, 145.45384, 145.07506, 158.71165, 135.3187, 155.17351, 163.04974]
2025-09-16 18:14:57,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 31.0, 28.0, 36.0, 28.0, 28.0, 31.0, 26.0, 30.0, 32.0]
2025-09-16 18:14:57,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 56 seconds)
2025-09-16 18:16:53,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 24...
2025-09-16 18:16:53,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 154.90709 ± 11.206
2025-09-16 18:16:53,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [151.5036, 143.15689, 160.15958, 129.84492, 166.93636, 164.41801, 148.81169, 159.3632, 157.70312, 167.17343]
2025-09-16 18:16:53,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 28.0, 31.0, 25.0, 33.0, 32.0, 29.0, 31.0, 31.0, 33.0]
2025-09-16 18:16:53,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1251 [DEBUG]: Training session finished
