2025-09-16 14:49:41,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.200-delay_21
2025-09-16 14:49:41,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.200-delay_21
2025-09-16 14:49:41,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'21': <latency_env.delayed_mdp.ConstantDelay object at 0x14ef138f0850>}
2025-09-16 14:49:41,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 14:49:41,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 14:49:41,061 baseline-bpql-noisepromille200-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=733, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 14:49:41,061 baseline-bpql-noisepromille200-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 14:49:42,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 14:49:42,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 14:51:34,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 14:51:35,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 239.57796 ± 214.934
2025-09-16 14:51:35,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [123.9653, 789.80225, 443.24057, 340.71976, 146.33069, 146.42203, 96.79193, 112.640785, 106.48397, 89.38233]
2025-09-16 14:51:35,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 157.0, 86.0, 64.0, 28.0, 28.0, 19.0, 22.0, 21.0, 18.0]
2025-09-16 14:51:35,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (239.58) for latency 21
2025-09-16 14:51:35,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 5 minutes, 11 seconds)
2025-09-16 14:53:35,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 14:53:35,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 189.17801 ± 102.369
2025-09-16 14:53:35,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [120.801094, 178.8757, 313.81802, 148.7885, 367.48154, 96.16021, 89.55724, 129.21397, 108.39105, 338.69266]
2025-09-16 14:53:35,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 34.0, 61.0, 30.0, 71.0, 19.0, 18.0, 25.0, 21.0, 63.0]
2025-09-16 14:53:35,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 10 minutes, 17 seconds)
2025-09-16 14:55:34,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 14:55:35,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 233.65189 ± 107.037
2025-09-16 14:55:35,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [140.65115, 348.31818, 124.67038, 351.94803, 113.814285, 197.61247, 390.09305, 355.88263, 157.86407, 155.66449]
2025-09-16 14:55:35,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 65.0, 24.0, 74.0, 22.0, 38.0, 82.0, 66.0, 30.0, 30.0]
2025-09-16 14:55:35,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 9 minutes, 59 seconds)
2025-09-16 14:57:35,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 14:57:35,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 175.97806 ± 102.245
2025-09-16 14:57:35,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [431.83035, 107.86979, 164.0874, 160.0779, 84.32552, 134.92009, 128.10602, 298.2467, 97.278496, 153.03836]
2025-09-16 14:57:35,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 21.0, 32.0, 31.0, 17.0, 26.0, 26.0, 58.0, 19.0, 29.0]
2025-09-16 14:57:35,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 9 minutes, 10 seconds)
2025-09-16 14:59:35,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 14:59:36,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 178.28568 ± 99.954
2025-09-16 14:59:36,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [119.221085, 177.27785, 125.65005, 330.22296, 89.17762, 262.57285, 107.02457, 372.83966, 90.64139, 108.22892]
2025-09-16 14:59:36,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 34.0, 24.0, 60.0, 18.0, 50.0, 21.0, 72.0, 18.0, 21.0]
2025-09-16 14:59:36,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 7 minutes, 57 seconds)
2025-09-16 15:01:36,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:01:37,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 197.20242 ± 135.214
2025-09-16 15:01:37,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [172.72758, 256.21884, 124.63747, 113.26644, 479.83185, 97.21568, 89.23852, 111.558014, 107.56917, 419.76053]
2025-09-16 15:01:37,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [37.0, 49.0, 24.0, 22.0, 101.0, 19.0, 18.0, 22.0, 21.0, 93.0]
2025-09-16 15:01:37,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 8 minutes, 43 seconds)
2025-09-16 15:03:35,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:03:35,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 149.33400 ± 53.684
2025-09-16 15:03:35,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [203.52011, 122.15482, 277.34772, 84.13403, 153.80939, 106.517075, 117.083855, 135.87378, 119.84627, 173.05283]
2025-09-16 15:03:35,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [42.0, 24.0, 57.0, 17.0, 30.0, 21.0, 23.0, 26.0, 23.0, 34.0]
2025-09-16 15:03:35,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 6 minutes, 3 seconds)
2025-09-16 15:05:34,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:05:35,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 161.89812 ± 85.542
2025-09-16 15:05:35,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [236.12837, 94.98521, 156.78758, 105.820274, 158.18349, 134.71329, 389.44495, 113.464066, 96.16588, 133.28802]
2025-09-16 15:05:35,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [46.0, 19.0, 30.0, 21.0, 31.0, 26.0, 78.0, 22.0, 19.0, 26.0]
2025-09-16 15:05:35,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 4 minutes, 2 seconds)
2025-09-16 15:07:36,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:07:37,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 156.72534 ± 38.836
2025-09-16 15:07:37,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [191.27954, 134.7802, 121.7648, 146.11882, 113.05084, 151.56787, 165.35262, 191.76729, 239.46095, 112.11044]
2025-09-16 15:07:37,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [37.0, 26.0, 24.0, 28.0, 22.0, 29.0, 33.0, 38.0, 50.0, 22.0]
2025-09-16 15:07:37,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 3 hours, 2 minutes, 25 seconds)
2025-09-16 15:09:38,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:09:38,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 131.81828 ± 55.144
2025-09-16 15:09:38,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [106.575226, 102.6836, 84.29456, 248.8023, 96.189545, 101.49572, 145.56702, 225.9541, 110.85695, 95.76381]
2025-09-16 15:09:38,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 20.0, 17.0, 51.0, 19.0, 20.0, 28.0, 45.0, 22.0, 19.0]
2025-09-16 15:09:38,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 3 hours, 44 seconds)
2025-09-16 15:11:40,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:11:40,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 141.67053 ± 41.628
2025-09-16 15:11:40,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [157.58467, 190.9474, 106.78701, 113.54945, 95.201065, 113.50633, 106.96797, 161.86061, 231.48868, 138.8122]
2025-09-16 15:11:40,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 37.0, 21.0, 22.0, 19.0, 22.0, 21.0, 32.0, 45.0, 28.0]
2025-09-16 15:11:40,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 58 minutes, 59 seconds)
2025-09-16 15:13:41,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:13:42,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 139.31299 ± 55.763
2025-09-16 15:13:42,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [165.5105, 95.30998, 89.97504, 101.38788, 183.01624, 274.15445, 165.01816, 91.40171, 102.785904, 124.570015]
2025-09-16 15:13:42,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 19.0, 18.0, 20.0, 36.0, 56.0, 32.0, 18.0, 20.0, 24.0]
2025-09-16 15:13:42,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 57 minutes, 51 seconds)
2025-09-16 15:15:43,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:15:43,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 123.86422 ± 32.596
2025-09-16 15:15:43,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [95.28398, 160.02333, 185.02388, 113.893394, 89.66475, 129.62898, 102.25326, 90.73232, 164.31099, 107.82726]
2025-09-16 15:15:43,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 32.0, 36.0, 22.0, 18.0, 25.0, 20.0, 18.0, 32.0, 21.0]
2025-09-16 15:15:43,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 56 minutes, 24 seconds)
2025-09-16 15:17:41,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:17:42,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 150.55873 ± 47.700
2025-09-16 15:17:42,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [168.11542, 238.18776, 217.48933, 168.6041, 112.5336, 118.51181, 172.37624, 89.626465, 106.9097, 113.23286]
2025-09-16 15:17:42,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 47.0, 44.0, 33.0, 22.0, 23.0, 34.0, 18.0, 21.0, 22.0]
2025-09-16 15:17:42,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 53 minutes, 31 seconds)
2025-09-16 15:19:40,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:19:41,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 191.21225 ± 83.298
2025-09-16 15:19:41,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [117.32983, 356.70792, 153.17464, 185.89098, 135.42372, 138.07082, 143.91048, 289.64706, 103.37025, 288.59683]
2025-09-16 15:19:41,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 72.0, 30.0, 39.0, 26.0, 27.0, 28.0, 62.0, 20.0, 59.0]
2025-09-16 15:19:41,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 50 minutes, 40 seconds)
2025-09-16 15:21:39,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:21:39,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 231.54422 ± 155.486
2025-09-16 15:21:39,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [103.66043, 124.15189, 111.95535, 561.93646, 442.93042, 289.08566, 101.154945, 311.29437, 146.32391, 122.948494]
2025-09-16 15:21:39,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 24.0, 22.0, 116.0, 87.0, 57.0, 20.0, 61.0, 28.0, 24.0]
2025-09-16 15:21:39,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 47 minutes, 46 seconds)
2025-09-16 15:23:37,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:23:38,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 190.27119 ± 91.441
2025-09-16 15:23:38,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [154.77715, 101.674805, 134.42813, 317.83182, 158.52339, 119.36399, 348.79547, 313.39233, 107.28746, 146.63734]
2025-09-16 15:23:38,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 20.0, 26.0, 66.0, 31.0, 23.0, 74.0, 66.0, 21.0, 28.0]
2025-09-16 15:23:38,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 44 minutes, 56 seconds)
2025-09-16 15:25:36,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:25:37,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 206.84859 ± 84.571
2025-09-16 15:25:37,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [284.33505, 240.6972, 165.51631, 116.658134, 130.8964, 329.4637, 164.01201, 113.42584, 354.30814, 169.17307]
2025-09-16 15:25:37,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [56.0, 47.0, 33.0, 23.0, 26.0, 65.0, 33.0, 22.0, 69.0, 35.0]
2025-09-16 15:25:37,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 42 minutes, 15 seconds)
2025-09-16 15:27:35,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:27:35,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 162.11125 ± 73.584
2025-09-16 15:27:35,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [122.39869, 227.01378, 123.741486, 113.765114, 294.1622, 89.99724, 90.59035, 128.29858, 286.05173, 145.09338]
2025-09-16 15:27:35,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 47.0, 24.0, 22.0, 57.0, 18.0, 18.0, 25.0, 56.0, 28.0]
2025-09-16 15:27:35,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 40 minutes, 14 seconds)
2025-09-16 15:29:34,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:29:34,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 131.69630 ± 27.525
2025-09-16 15:29:34,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [119.557, 165.19423, 140.93214, 101.64014, 180.53233, 144.83548, 95.73517, 149.93738, 117.40746, 101.191734]
2025-09-16 15:29:34,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 33.0, 27.0, 20.0, 36.0, 28.0, 19.0, 30.0, 23.0, 20.0]
2025-09-16 15:29:34,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 38 minutes, 10 seconds)
2025-09-16 15:31:31,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:31:31,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 120.14439 ± 31.506
2025-09-16 15:31:31,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [101.10854, 101.50784, 129.04599, 96.254425, 114.193794, 127.488075, 102.682526, 209.24615, 106.36475, 113.551765]
2025-09-16 15:31:31,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 20.0, 25.0, 19.0, 23.0, 25.0, 20.0, 42.0, 21.0, 22.0]
2025-09-16 15:31:31,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 35 minutes, 54 seconds)
2025-09-16 15:33:30,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:33:30,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 154.82370 ± 79.423
2025-09-16 15:33:30,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [99.57447, 95.37818, 368.57492, 107.02572, 101.17918, 148.5408, 169.10178, 105.990456, 144.18407, 208.68738]
2025-09-16 15:33:30,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 19.0, 74.0, 21.0, 20.0, 29.0, 33.0, 21.0, 28.0, 41.0]
2025-09-16 15:33:30,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 33 minutes, 55 seconds)
2025-09-16 15:35:28,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:35:28,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 134.15781 ± 30.499
2025-09-16 15:35:28,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [107.01399, 170.89932, 172.23898, 149.4656, 97.11854, 123.78368, 169.48697, 89.328476, 111.43921, 150.80322]
2025-09-16 15:35:28,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 33.0, 33.0, 29.0, 19.0, 24.0, 35.0, 18.0, 22.0, 29.0]
2025-09-16 15:35:28,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 31 minutes, 42 seconds)
2025-09-16 15:37:26,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:37:27,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 169.80782 ± 70.418
2025-09-16 15:37:27,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [113.966576, 105.165276, 226.85329, 143.41655, 345.89374, 168.98961, 106.02696, 117.37808, 189.7865, 180.60158]
2025-09-16 15:37:27,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 21.0, 46.0, 28.0, 67.0, 33.0, 21.0, 23.0, 38.0, 35.0]
2025-09-16 15:37:27,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 29 minutes, 48 seconds)
2025-09-16 15:39:24,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:39:25,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 177.21257 ± 77.044
2025-09-16 15:39:25,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [156.10362, 222.429, 137.10277, 340.25977, 116.66093, 142.95831, 154.42323, 97.25721, 290.9163, 114.0146]
2025-09-16 15:39:25,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 44.0, 27.0, 72.0, 23.0, 28.0, 30.0, 19.0, 60.0, 22.0]
2025-09-16 15:39:25,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 27 minutes, 42 seconds)
2025-09-16 15:41:22,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:41:23,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 140.88852 ± 36.638
2025-09-16 15:41:23,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [108.9517, 107.5333, 123.99886, 185.59393, 102.153694, 215.97061, 163.31984, 156.0546, 137.9997, 107.308815]
2025-09-16 15:41:23,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 21.0, 24.0, 37.0, 20.0, 43.0, 32.0, 30.0, 27.0, 21.0]
2025-09-16 15:41:23,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 25 minutes, 51 seconds)
2025-09-16 15:43:21,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:43:21,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 144.48424 ± 53.103
2025-09-16 15:43:21,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [249.67479, 90.396614, 96.472626, 224.51205, 155.13345, 107.237755, 178.11401, 124.343994, 111.88693, 107.07015]
2025-09-16 15:43:21,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [53.0, 18.0, 19.0, 47.0, 30.0, 21.0, 35.0, 24.0, 22.0, 21.0]
2025-09-16 15:43:21,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 23 minutes, 51 seconds)
2025-09-16 15:45:18,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:45:19,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 141.53549 ± 33.046
2025-09-16 15:45:19,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [123.67959, 118.77372, 217.48375, 151.07317, 137.67586, 180.57414, 102.78942, 133.33952, 141.84824, 108.11752]
2025-09-16 15:45:19,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 23.0, 43.0, 32.0, 27.0, 35.0, 20.0, 26.0, 28.0, 21.0]
2025-09-16 15:45:19,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 21 minutes, 44 seconds)
2025-09-16 15:47:15,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:47:15,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 141.56436 ± 87.239
2025-09-16 15:47:15,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [112.97284, 402.13434, 128.11746, 111.82687, 118.76209, 114.3213, 105.96255, 118.74574, 106.4263, 96.37403]
2025-09-16 15:47:15,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 79.0, 25.0, 22.0, 23.0, 22.0, 21.0, 23.0, 21.0, 19.0]
2025-09-16 15:47:15,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 19 minutes, 16 seconds)
2025-09-16 15:49:13,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:49:13,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 152.42636 ± 44.755
2025-09-16 15:49:13,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [146.87686, 104.97365, 117.361275, 127.64924, 168.37958, 162.23402, 95.72312, 171.11604, 260.6062, 169.3437]
2025-09-16 15:49:13,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 21.0, 23.0, 25.0, 34.0, 33.0, 19.0, 36.0, 52.0, 33.0]
2025-09-16 15:49:13,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 17 minutes, 18 seconds)
2025-09-16 15:51:13,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:51:13,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 134.66553 ± 44.285
2025-09-16 15:51:13,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [94.766426, 90.651215, 110.12014, 96.61787, 205.11575, 215.0114, 147.75739, 118.15011, 167.16406, 101.30084]
2025-09-16 15:51:13,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 18.0, 22.0, 19.0, 41.0, 42.0, 30.0, 23.0, 34.0, 20.0]
2025-09-16 15:51:13,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 15 minutes, 50 seconds)
2025-09-16 15:53:12,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:53:12,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 129.02815 ± 35.412
2025-09-16 15:53:12,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [152.94743, 183.91461, 99.843895, 96.59496, 107.91403, 102.19374, 190.22823, 153.79454, 100.13979, 102.71036]
2025-09-16 15:53:12,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 36.0, 20.0, 19.0, 21.0, 20.0, 37.0, 30.0, 20.0, 20.0]
2025-09-16 15:53:12,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 14 minutes)
2025-09-16 15:55:12,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:55:12,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 124.76467 ± 27.816
2025-09-16 15:55:12,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [101.07144, 165.58975, 140.51337, 106.47098, 101.35273, 178.04114, 107.632385, 103.49882, 141.43729, 102.03892]
2025-09-16 15:55:12,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 32.0, 27.0, 21.0, 20.0, 35.0, 21.0, 20.0, 27.0, 20.0]
2025-09-16 15:55:12,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 12 minutes, 32 seconds)
2025-09-16 15:57:11,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:57:11,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 144.16287 ± 62.998
2025-09-16 15:57:11,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [127.70543, 134.99962, 95.856636, 142.43114, 102.73281, 108.334724, 140.8001, 106.57821, 324.10403, 158.08589]
2025-09-16 15:57:11,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 26.0, 19.0, 28.0, 20.0, 21.0, 27.0, 21.0, 64.0, 31.0]
2025-09-16 15:57:12,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 11 minutes, 9 seconds)
2025-09-16 15:59:08,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:59:09,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 197.80710 ± 125.336
2025-09-16 15:59:09,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [133.89746, 190.28459, 142.90564, 153.18098, 102.96885, 327.7137, 528.2673, 128.30363, 124.7879, 145.76091]
2025-09-16 15:59:09,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 38.0, 28.0, 30.0, 20.0, 71.0, 105.0, 25.0, 24.0, 29.0]
2025-09-16 15:59:09,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 9 minutes)
2025-09-16 16:01:04,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:01:05,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 136.31850 ± 36.129
2025-09-16 16:01:05,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [135.0739, 89.34869, 190.80067, 145.62384, 155.57986, 186.23738, 162.20816, 100.65911, 89.48329, 108.17007]
2025-09-16 16:01:05,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 18.0, 38.0, 28.0, 30.0, 36.0, 33.0, 20.0, 18.0, 21.0]
2025-09-16 16:01:05,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 6 minutes, 6 seconds)
2025-09-16 16:03:00,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:03:01,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 134.94620 ± 30.585
2025-09-16 16:03:01,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [204.2679, 109.18572, 162.8993, 123.50457, 107.92589, 128.27858, 142.87213, 95.97833, 119.86583, 154.68382]
2025-09-16 16:03:01,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [42.0, 21.0, 33.0, 25.0, 21.0, 25.0, 28.0, 19.0, 23.0, 31.0]
2025-09-16 16:03:01,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 3 minutes, 33 seconds)
2025-09-16 16:04:56,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:04:56,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 158.79591 ± 64.000
2025-09-16 16:04:56,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [176.9557, 134.88957, 126.50021, 132.12228, 336.2549, 84.27649, 137.08992, 145.54057, 174.59286, 139.73656]
2025-09-16 16:04:56,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 26.0, 25.0, 26.0, 66.0, 17.0, 27.0, 29.0, 36.0, 28.0]
2025-09-16 16:04:56,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 44 seconds)
2025-09-16 16:06:51,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:06:51,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 126.81197 ± 25.726
2025-09-16 16:06:51,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [150.72737, 130.47882, 107.44447, 102.43869, 142.35384, 166.02347, 89.760345, 139.97609, 148.00966, 90.90684]
2025-09-16 16:06:51,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 25.0, 21.0, 20.0, 28.0, 34.0, 18.0, 28.0, 30.0, 18.0]
2025-09-16 16:06:51,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 57 minutes, 54 seconds)
2025-09-16 16:08:47,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:08:47,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 128.89896 ± 31.057
2025-09-16 16:08:47,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [112.32771, 90.08281, 163.40607, 94.42584, 154.1697, 182.78522, 140.55617, 120.033226, 141.81413, 89.38881]
2025-09-16 16:08:47,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 18.0, 32.0, 19.0, 30.0, 37.0, 28.0, 23.0, 28.0, 18.0]
2025-09-16 16:08:47,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 55 minutes, 42 seconds)
2025-09-16 16:10:42,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:10:43,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 145.25885 ± 21.721
2025-09-16 16:10:43,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [172.70692, 144.3191, 163.71527, 131.20891, 102.05502, 160.67284, 127.85842, 125.79921, 156.171, 168.08183]
2025-09-16 16:10:43,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [36.0, 28.0, 33.0, 25.0, 20.0, 32.0, 25.0, 24.0, 31.0, 34.0]
2025-09-16 16:10:43,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 53 minutes, 43 seconds)
2025-09-16 16:12:39,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:12:39,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 156.66690 ± 55.633
2025-09-16 16:12:39,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [118.29494, 314.1354, 151.83215, 145.50752, 151.72267, 112.36662, 164.834, 164.10913, 124.32033, 119.54628]
2025-09-16 16:12:39,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 62.0, 29.0, 29.0, 29.0, 22.0, 32.0, 34.0, 24.0, 23.0]
2025-09-16 16:12:39,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 51 minutes, 48 seconds)
2025-09-16 16:14:34,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:14:34,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 135.33109 ± 64.740
2025-09-16 16:14:34,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [319.98273, 131.10507, 89.82693, 106.0184, 110.19982, 162.74117, 95.10451, 102.41738, 127.45758, 108.45743]
2025-09-16 16:14:34,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [69.0, 26.0, 18.0, 21.0, 22.0, 31.0, 19.0, 20.0, 25.0, 21.0]
2025-09-16 16:14:34,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 49 minutes, 51 seconds)
2025-09-16 16:16:30,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:16:30,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 150.26816 ± 82.394
2025-09-16 16:16:30,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [123.81141, 112.641495, 113.49612, 108.20352, 392.08978, 95.482216, 148.1803, 153.50006, 119.50569, 135.77094]
2025-09-16 16:16:30,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 22.0, 22.0, 21.0, 83.0, 19.0, 30.0, 31.0, 23.0, 26.0]
2025-09-16 16:16:30,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 48 minutes, 3 seconds)
2025-09-16 16:18:25,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:18:26,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 108.39260 ± 15.273
2025-09-16 16:18:26,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [117.34534, 90.21084, 101.414665, 89.98685, 95.8355, 122.16422, 116.9249, 108.05942, 140.95006, 101.03414]
2025-09-16 16:18:26,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 18.0, 20.0, 18.0, 19.0, 24.0, 23.0, 21.0, 27.0, 20.0]
2025-09-16 16:18:26,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 46 minutes, 3 seconds)
2025-09-16 16:20:21,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:20:21,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 117.39888 ± 17.211
2025-09-16 16:20:21,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [108.68542, 146.6626, 102.04307, 102.45814, 138.66902, 101.98709, 143.61394, 105.680984, 112.27727, 111.911255]
2025-09-16 16:20:21,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 30.0, 20.0, 20.0, 27.0, 20.0, 28.0, 21.0, 22.0, 22.0]
2025-09-16 16:20:21,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 44 minutes, 7 seconds)
2025-09-16 16:22:16,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:22:17,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 129.79169 ± 20.380
2025-09-16 16:22:17,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [151.03998, 113.12643, 123.85524, 162.8104, 113.81846, 96.13052, 113.12849, 144.70619, 129.26114, 150.04001]
2025-09-16 16:22:17,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 22.0, 24.0, 32.0, 22.0, 19.0, 22.0, 28.0, 25.0, 29.0]
2025-09-16 16:22:17,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 42 minutes, 2 seconds)
2025-09-16 16:24:12,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:24:12,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 137.50928 ± 19.050
2025-09-16 16:24:12,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [134.27902, 151.98889, 141.31177, 89.56276, 134.32068, 137.16318, 156.69981, 156.12117, 150.41977, 123.225685]
2025-09-16 16:24:12,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 30.0, 27.0, 18.0, 26.0, 27.0, 30.0, 31.0, 30.0, 24.0]
2025-09-16 16:24:12,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 40 minutes, 7 seconds)
2025-09-16 16:26:07,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:26:07,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 169.26018 ± 79.788
2025-09-16 16:26:07,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [112.76541, 89.61588, 157.25777, 102.09462, 113.61892, 158.43259, 173.14607, 301.0258, 337.72803, 146.91672]
2025-09-16 16:26:07,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 18.0, 32.0, 20.0, 22.0, 32.0, 34.0, 58.0, 70.0, 29.0]
2025-09-16 16:26:07,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 38 minutes, 5 seconds)
2025-09-16 16:28:03,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:28:04,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 115.89923 ± 22.885
2025-09-16 16:28:04,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [150.51334, 114.055084, 146.31255, 89.86987, 96.79951, 96.02831, 96.15889, 142.16452, 96.47215, 130.61809]
2025-09-16 16:28:04,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 22.0, 29.0, 18.0, 19.0, 19.0, 19.0, 28.0, 19.0, 25.0]
2025-09-16 16:28:04,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 36 minutes, 21 seconds)
2025-09-16 16:29:59,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:30:00,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 127.95199 ± 22.374
2025-09-16 16:30:00,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [125.438446, 133.75706, 95.54022, 164.97658, 125.304, 96.541794, 150.03175, 112.53809, 119.861374, 155.53069]
2025-09-16 16:30:00,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 26.0, 19.0, 33.0, 24.0, 19.0, 29.0, 22.0, 23.0, 30.0]
2025-09-16 16:30:00,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 34 minutes, 27 seconds)
2025-09-16 16:31:54,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:31:54,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 123.44405 ± 18.819
2025-09-16 16:31:54,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [114.33946, 121.941635, 136.52362, 131.37007, 133.9761, 102.40625, 100.50192, 138.26959, 158.28749, 96.824356]
2025-09-16 16:31:54,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 24.0, 27.0, 25.0, 26.0, 20.0, 20.0, 27.0, 32.0, 19.0]
2025-09-16 16:31:54,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 32 minutes, 25 seconds)
2025-09-16 16:33:49,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:33:49,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 125.32957 ± 19.648
2025-09-16 16:33:49,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [157.46736, 122.28208, 125.2219, 149.61325, 95.224335, 131.00024, 137.58943, 101.990974, 101.66116, 131.24504]
2025-09-16 16:33:49,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 24.0, 24.0, 30.0, 19.0, 25.0, 27.0, 20.0, 20.0, 25.0]
2025-09-16 16:33:49,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 30 minutes, 23 seconds)
2025-09-16 16:35:43,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:35:43,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 107.45995 ± 14.883
2025-09-16 16:35:43,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [95.122536, 108.262375, 116.767456, 123.60138, 85.06349, 89.43928, 125.78073, 128.15219, 106.026665, 96.3834]
2025-09-16 16:35:43,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 21.0, 23.0, 24.0, 17.0, 18.0, 24.0, 25.0, 21.0, 19.0]
2025-09-16 16:35:43,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 28 minutes, 20 seconds)
2025-09-16 16:37:37,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:37:38,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 131.87106 ± 47.115
2025-09-16 16:37:38,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [138.26218, 97.11917, 96.42081, 111.70333, 159.33615, 260.48053, 117.08826, 112.046234, 94.728134, 131.5259]
2025-09-16 16:37:38,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 19.0, 19.0, 22.0, 32.0, 51.0, 23.0, 22.0, 19.0, 26.0]
2025-09-16 16:37:38,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 26 minutes, 5 seconds)
2025-09-16 16:39:32,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:39:32,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 117.50421 ± 24.347
2025-09-16 16:39:32,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [103.34308, 97.19297, 84.38307, 156.78558, 121.338066, 89.35531, 134.67455, 138.16328, 147.66774, 102.13848]
2025-09-16 16:39:32,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 19.0, 17.0, 30.0, 25.0, 18.0, 26.0, 27.0, 29.0, 20.0]
2025-09-16 16:39:32,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 23 minutes, 56 seconds)
2025-09-16 16:41:26,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:41:27,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 154.21245 ± 56.972
2025-09-16 16:41:27,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [126.20543, 125.89467, 152.05745, 135.69952, 112.82516, 140.62195, 145.5504, 319.1915, 164.81177, 119.26669]
2025-09-16 16:41:27,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 25.0, 30.0, 27.0, 22.0, 27.0, 28.0, 64.0, 33.0, 23.0]
2025-09-16 16:41:27,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 22 minutes)
2025-09-16 16:43:20,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:43:21,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 126.71444 ± 23.609
2025-09-16 16:43:21,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [89.95865, 108.25416, 128.74953, 155.78027, 95.45625, 112.58012, 151.96747, 134.21819, 160.69423, 129.48564]
2025-09-16 16:43:21,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 21.0, 25.0, 30.0, 19.0, 22.0, 30.0, 26.0, 32.0, 25.0]
2025-09-16 16:43:21,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 20 minutes, 1 second)
2025-09-16 16:45:14,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:45:15,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 125.34070 ± 21.143
2025-09-16 16:45:15,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [114.079895, 138.33661, 125.329666, 96.68548, 153.07367, 129.85506, 89.975105, 147.14413, 109.416756, 149.51065]
2025-09-16 16:45:15,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 27.0, 25.0, 19.0, 30.0, 25.0, 18.0, 29.0, 22.0, 29.0]
2025-09-16 16:45:15,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 18 minutes, 5 seconds)
2025-09-16 16:47:09,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:47:09,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 134.15268 ± 41.182
2025-09-16 16:47:09,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [237.52266, 153.2512, 157.48064, 106.310684, 154.26651, 113.140366, 96.498825, 102.71546, 119.84862, 100.49198]
2025-09-16 16:47:09,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [50.0, 30.0, 31.0, 21.0, 30.0, 22.0, 19.0, 20.0, 23.0, 20.0]
2025-09-16 16:47:09,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 16 minutes, 11 seconds)
2025-09-16 16:49:03,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:49:03,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 129.71463 ± 26.747
2025-09-16 16:49:03,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [134.76157, 145.2269, 142.30792, 176.49744, 107.164116, 156.8271, 96.06068, 108.03495, 89.941154, 140.32457]
2025-09-16 16:49:03,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 28.0, 29.0, 34.0, 21.0, 30.0, 19.0, 21.0, 18.0, 28.0]
2025-09-16 16:49:03,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 14 minutes, 16 seconds)
2025-09-16 16:50:57,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:50:57,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 134.79753 ± 21.297
2025-09-16 16:50:57,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [108.90302, 146.33069, 163.6274, 113.73109, 148.80489, 102.05871, 143.78244, 163.8194, 118.82027, 138.09738]
2025-09-16 16:50:57,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 29.0, 33.0, 22.0, 29.0, 20.0, 28.0, 33.0, 23.0, 27.0]
2025-09-16 16:50:57,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 12 minutes, 18 seconds)
2025-09-16 16:52:51,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:52:51,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 120.75090 ± 19.083
2025-09-16 16:52:51,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [139.66377, 115.00417, 142.97595, 117.74845, 89.40596, 150.71512, 125.472786, 106.40341, 124.39048, 95.72895]
2025-09-16 16:52:51,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 23.0, 28.0, 23.0, 18.0, 31.0, 25.0, 21.0, 24.0, 19.0]
2025-09-16 16:52:51,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 10 minutes, 20 seconds)
2025-09-16 16:54:45,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:54:45,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 125.16260 ± 21.930
2025-09-16 16:54:45,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [121.67964, 168.18411, 111.75045, 119.0904, 147.8799, 106.78348, 128.68141, 101.6671, 148.46219, 97.44732]
2025-09-16 16:54:45,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 34.0, 22.0, 23.0, 29.0, 21.0, 25.0, 20.0, 29.0, 19.0]
2025-09-16 16:54:45,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 8 minutes, 25 seconds)
2025-09-16 16:56:39,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:56:39,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 171.03870 ± 48.065
2025-09-16 16:56:39,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [232.28528, 166.34413, 149.91144, 138.4552, 288.63544, 143.51208, 126.276375, 145.56822, 146.94391, 172.45485]
2025-09-16 16:56:39,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [45.0, 34.0, 29.0, 27.0, 60.0, 28.0, 25.0, 28.0, 29.0, 34.0]
2025-09-16 16:56:39,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 6 minutes, 30 seconds)
2025-09-16 16:58:33,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:58:34,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 147.39282 ± 36.524
2025-09-16 16:58:34,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [162.82918, 160.82854, 133.22916, 159.24265, 111.51765, 100.97042, 127.8249, 144.41402, 133.23381, 239.838]
2025-09-16 16:58:34,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 31.0, 26.0, 31.0, 22.0, 20.0, 25.0, 28.0, 26.0, 48.0]
2025-09-16 16:58:34,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 4 minutes, 38 seconds)
2025-09-16 17:00:27,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:00:27,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 125.41405 ± 22.273
2025-09-16 17:00:27,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [118.61318, 157.7927, 125.03562, 119.692116, 95.621796, 168.08516, 106.47939, 99.55637, 128.41916, 134.84494]
2025-09-16 17:00:27,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 32.0, 24.0, 23.0, 19.0, 33.0, 21.0, 20.0, 25.0, 26.0]
2025-09-16 17:00:27,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 2 minutes, 41 seconds)
2025-09-16 17:02:21,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:02:21,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 161.30826 ± 113.953
2025-09-16 17:02:21,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [135.97328, 119.41388, 127.97219, 101.5543, 139.13696, 107.770065, 108.33501, 118.35756, 500.0128, 154.55647]
2025-09-16 17:02:21,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 24.0, 25.0, 20.0, 29.0, 21.0, 21.0, 23.0, 98.0, 31.0]
2025-09-16 17:02:21,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 51 seconds)
2025-09-16 17:04:15,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:04:16,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 130.31558 ± 17.802
2025-09-16 17:04:16,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [146.20798, 114.42478, 127.745926, 95.81002, 143.27411, 146.19337, 143.41132, 108.06265, 149.71199, 128.31374]
2025-09-16 17:04:16,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 22.0, 25.0, 19.0, 29.0, 28.0, 28.0, 21.0, 29.0, 25.0]
2025-09-16 17:04:16,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 58 minutes, 57 seconds)
2025-09-16 17:06:09,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:06:10,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 137.63705 ± 26.358
2025-09-16 17:06:10,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [171.18748, 105.746315, 119.21941, 123.16795, 112.956154, 115.9897, 162.19438, 188.24149, 141.25914, 136.4086]
2025-09-16 17:06:10,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 21.0, 23.0, 24.0, 22.0, 23.0, 33.0, 38.0, 27.0, 27.0]
2025-09-16 17:06:10,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 57 minutes, 3 seconds)
2025-09-16 17:08:03,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:08:04,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 127.13462 ± 15.933
2025-09-16 17:08:04,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [132.12944, 123.42298, 116.01667, 128.15305, 119.73721, 140.30783, 129.11668, 143.32016, 149.29282, 89.84941]
2025-09-16 17:08:04,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 24.0, 23.0, 25.0, 23.0, 27.0, 26.0, 28.0, 30.0, 18.0]
2025-09-16 17:08:04,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 55 minutes, 7 seconds)
2025-09-16 17:09:57,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:09:57,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 125.73975 ± 16.389
2025-09-16 17:09:57,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [133.87369, 143.42273, 134.74683, 125.05394, 129.90504, 112.08556, 102.36366, 156.5729, 111.852264, 107.520805]
2025-09-16 17:09:57,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 28.0, 26.0, 24.0, 25.0, 22.0, 20.0, 32.0, 22.0, 21.0]
2025-09-16 17:09:57,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 53 minutes, 13 seconds)
2025-09-16 17:11:51,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:11:51,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 125.33717 ± 25.469
2025-09-16 17:11:51,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [113.049614, 170.44698, 90.09792, 100.49959, 139.245, 125.92669, 151.78154, 152.06532, 108.10678, 102.15219]
2025-09-16 17:11:51,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 34.0, 18.0, 20.0, 27.0, 25.0, 30.0, 30.0, 21.0, 20.0]
2025-09-16 17:11:51,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 51 minutes, 16 seconds)
2025-09-16 17:13:45,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:13:45,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 137.04462 ± 27.702
2025-09-16 17:13:45,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [130.12177, 178.49927, 102.13076, 164.2121, 153.09439, 172.59163, 100.432304, 107.403946, 139.25027, 122.70983]
2025-09-16 17:13:45,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 35.0, 20.0, 33.0, 31.0, 33.0, 20.0, 21.0, 27.0, 24.0]
2025-09-16 17:13:45,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 49 minutes, 21 seconds)
2025-09-16 17:15:39,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:15:39,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 134.90292 ± 42.225
2025-09-16 17:15:39,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [101.45017, 99.4856, 141.7051, 89.92433, 118.10374, 243.26346, 142.82863, 123.34074, 122.2931, 166.63446]
2025-09-16 17:15:39,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 20.0, 28.0, 18.0, 23.0, 48.0, 28.0, 25.0, 24.0, 32.0]
2025-09-16 17:15:39,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 47 minutes, 26 seconds)
2025-09-16 17:17:33,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:17:33,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 143.97977 ± 27.827
2025-09-16 17:17:33,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [117.64176, 120.41282, 146.56795, 217.5516, 151.89183, 113.72676, 146.61765, 146.95341, 143.45612, 134.97787]
2025-09-16 17:17:33,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 24.0, 29.0, 46.0, 31.0, 22.0, 30.0, 28.0, 28.0, 26.0]
2025-09-16 17:17:33,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 45 minutes, 32 seconds)
2025-09-16 17:19:27,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:19:27,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 130.16776 ± 28.307
2025-09-16 17:19:27,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [136.57149, 100.86736, 156.02747, 154.42448, 89.98366, 138.60114, 102.09194, 168.42613, 96.498245, 158.18564]
2025-09-16 17:19:27,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 20.0, 30.0, 30.0, 18.0, 27.0, 20.0, 33.0, 19.0, 31.0]
2025-09-16 17:19:27,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 43 minutes, 39 seconds)
2025-09-16 17:21:21,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:21:21,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 116.97308 ± 25.186
2025-09-16 17:21:21,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [128.9457, 100.50377, 146.56471, 131.02963, 95.62529, 90.2767, 96.75947, 84.01179, 159.53932, 136.47447]
2025-09-16 17:21:21,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 20.0, 30.0, 25.0, 19.0, 18.0, 19.0, 17.0, 31.0, 27.0]
2025-09-16 17:21:21,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 41 minutes, 47 seconds)
2025-09-16 17:23:14,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:23:15,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 138.87285 ± 27.944
2025-09-16 17:23:15,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [179.12602, 124.6442, 127.86661, 161.43916, 156.80933, 100.95428, 185.69331, 119.524254, 118.94403, 113.727356]
2025-09-16 17:23:15,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 25.0, 25.0, 32.0, 31.0, 20.0, 36.0, 23.0, 23.0, 22.0]
2025-09-16 17:23:15,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 39 minutes, 53 seconds)
2025-09-16 17:25:08,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:25:09,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 134.93895 ± 31.198
2025-09-16 17:25:09,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [89.93065, 113.17461, 166.36961, 179.27519, 107.45405, 163.97954, 91.4901, 134.82233, 140.30312, 162.59038]
2025-09-16 17:25:09,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 22.0, 34.0, 35.0, 21.0, 33.0, 18.0, 26.0, 27.0, 32.0]
2025-09-16 17:25:09,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 37 minutes, 59 seconds)
2025-09-16 17:27:03,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:27:03,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 141.50639 ± 29.400
2025-09-16 17:27:03,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [143.3126, 135.4983, 163.51653, 136.50717, 213.2709, 113.821465, 120.09176, 101.3276, 136.98572, 150.7319]
2025-09-16 17:27:03,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 26.0, 33.0, 27.0, 44.0, 22.0, 24.0, 20.0, 27.0, 30.0]
2025-09-16 17:27:03,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 36 minutes, 5 seconds)
2025-09-16 17:28:56,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:28:57,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 123.19505 ± 29.261
2025-09-16 17:28:57,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [101.65774, 96.05268, 179.02708, 156.18634, 89.778496, 100.131744, 118.35273, 159.802, 118.98567, 111.97603]
2025-09-16 17:28:57,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 19.0, 37.0, 32.0, 18.0, 20.0, 23.0, 31.0, 23.0, 22.0]
2025-09-16 17:28:57,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 34 minutes, 11 seconds)
2025-09-16 17:30:50,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:30:51,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 132.31024 ± 16.244
2025-09-16 17:30:51,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [127.56077, 117.426, 134.45036, 148.47997, 113.696175, 123.27375, 130.03947, 170.47417, 139.21208, 118.48969]
2025-09-16 17:30:51,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 23.0, 26.0, 29.0, 22.0, 24.0, 26.0, 33.0, 27.0, 23.0]
2025-09-16 17:30:51,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 32 minutes, 16 seconds)
2025-09-16 17:32:44,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:32:44,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 125.05042 ± 22.936
2025-09-16 17:32:44,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [102.45329, 151.0643, 136.83989, 89.165016, 134.56854, 112.70345, 166.96616, 129.0121, 127.65876, 100.07264]
2025-09-16 17:32:44,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 29.0, 27.0, 18.0, 26.0, 22.0, 33.0, 25.0, 25.0, 20.0]
2025-09-16 17:32:44,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 30 minutes, 22 seconds)
2025-09-16 17:34:38,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:34:38,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 150.56543 ± 30.649
2025-09-16 17:34:38,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [118.19755, 143.97447, 149.86986, 146.41902, 112.653496, 146.48395, 123.89596, 166.37819, 174.40878, 223.37292]
2025-09-16 17:34:38,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 28.0, 29.0, 29.0, 22.0, 30.0, 24.0, 35.0, 34.0, 45.0]
2025-09-16 17:34:38,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 28 minutes, 28 seconds)
2025-09-16 17:36:32,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:36:33,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 154.72609 ± 46.471
2025-09-16 17:36:33,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [157.01025, 153.5253, 124.87725, 163.30733, 133.90962, 286.671, 144.31819, 111.58844, 128.5009, 143.55275]
2025-09-16 17:36:33,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 32.0, 24.0, 32.0, 26.0, 61.0, 30.0, 22.0, 25.0, 29.0]
2025-09-16 17:36:33,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 26 minutes, 34 seconds)
2025-09-16 17:38:26,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:38:27,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 137.67358 ± 30.218
2025-09-16 17:38:27,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [167.48254, 89.98429, 157.70932, 146.23755, 123.37555, 149.49968, 107.34905, 184.04326, 95.65635, 155.39825]
2025-09-16 17:38:27,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 18.0, 31.0, 29.0, 24.0, 29.0, 21.0, 39.0, 19.0, 30.0]
2025-09-16 17:38:27,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 24 minutes, 41 seconds)
2025-09-16 17:40:20,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:40:21,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 137.42795 ± 10.364
2025-09-16 17:40:21,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [139.68283, 133.66074, 157.47806, 132.93431, 145.88943, 116.742615, 135.54752, 137.02528, 129.61679, 145.70206]
2025-09-16 17:40:21,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 26.0, 32.0, 26.0, 29.0, 23.0, 27.0, 27.0, 26.0, 30.0]
2025-09-16 17:40:21,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 22 minutes, 47 seconds)
2025-09-16 17:42:14,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:42:14,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 121.97298 ± 23.452
2025-09-16 17:42:14,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [102.947784, 101.30917, 107.72602, 129.72897, 145.7638, 121.0465, 155.287, 105.57518, 89.94736, 160.39793]
2025-09-16 17:42:14,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 20.0, 21.0, 25.0, 28.0, 24.0, 31.0, 21.0, 18.0, 32.0]
2025-09-16 17:42:14,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 20 minutes, 53 seconds)
2025-09-16 17:44:08,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:44:09,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 128.29282 ± 24.620
2025-09-16 17:44:09,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [113.149185, 139.66438, 120.6705, 95.15155, 150.92119, 109.10189, 145.20827, 152.52557, 90.901474, 165.63419]
2025-09-16 17:44:09,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 27.0, 24.0, 19.0, 30.0, 21.0, 28.0, 31.0, 18.0, 33.0]
2025-09-16 17:44:09,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 19 minutes)
2025-09-16 17:46:02,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:46:03,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 183.60992 ± 70.828
2025-09-16 17:46:03,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [153.82762, 296.17795, 175.90694, 90.00384, 181.9107, 235.17482, 129.71973, 107.937515, 309.1072, 156.33301]
2025-09-16 17:46:03,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 62.0, 34.0, 18.0, 37.0, 46.0, 26.0, 21.0, 63.0, 31.0]
2025-09-16 17:46:03,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 17 minutes, 6 seconds)
2025-09-16 17:47:56,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:47:56,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 134.86980 ± 27.087
2025-09-16 17:47:56,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [167.4174, 127.18947, 95.8203, 153.61345, 96.55285, 96.108246, 158.2368, 149.79683, 154.92377, 149.03879]
2025-09-16 17:47:56,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 25.0, 19.0, 30.0, 19.0, 19.0, 31.0, 31.0, 31.0, 30.0]
2025-09-16 17:47:57,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 15 minutes, 11 seconds)
2025-09-16 17:49:50,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:49:50,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 125.38039 ± 22.972
2025-09-16 17:49:50,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [153.06775, 129.72476, 129.06601, 90.04349, 105.47825, 133.64136, 110.60824, 94.725876, 154.2249, 153.22343]
2025-09-16 17:49:50,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 26.0, 25.0, 18.0, 21.0, 26.0, 22.0, 19.0, 30.0, 30.0]
2025-09-16 17:49:50,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 13 minutes, 17 seconds)
2025-09-16 17:51:43,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:51:44,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 121.64104 ± 20.589
2025-09-16 17:51:44,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [106.28984, 141.32791, 155.1476, 113.08011, 101.846725, 95.51484, 137.5877, 142.21184, 127.03729, 96.36642]
2025-09-16 17:51:44,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 27.0, 31.0, 22.0, 20.0, 19.0, 27.0, 28.0, 25.0, 19.0]
2025-09-16 17:51:44,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 11 minutes, 23 seconds)
2025-09-16 17:53:37,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:53:38,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 143.50394 ± 33.440
2025-09-16 17:53:38,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [138.34802, 168.83879, 216.07147, 119.72185, 146.32619, 179.22314, 130.05264, 103.029755, 125.20062, 108.226746]
2025-09-16 17:53:38,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 34.0, 43.0, 24.0, 28.0, 35.0, 25.0, 20.0, 24.0, 21.0]
2025-09-16 17:53:38,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes, 29 seconds)
2025-09-16 17:55:31,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:55:32,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 127.76660 ± 27.119
2025-09-16 17:55:32,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [84.5001, 143.88959, 95.99369, 118.99946, 176.76271, 105.46412, 146.00336, 114.71227, 152.52563, 138.81508]
2025-09-16 17:55:32,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [17.0, 28.0, 19.0, 23.0, 35.0, 21.0, 28.0, 23.0, 30.0, 27.0]
2025-09-16 17:55:32,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 35 seconds)
2025-09-16 17:57:25,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:57:25,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 122.27116 ± 24.931
2025-09-16 17:57:25,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [124.2324, 109.85638, 96.73811, 118.417595, 172.93538, 95.54016, 150.92517, 96.10494, 112.956505, 145.00485]
2025-09-16 17:57:25,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 22.0, 19.0, 23.0, 34.0, 19.0, 30.0, 19.0, 22.0, 29.0]
2025-09-16 17:57:25,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 41 seconds)
2025-09-16 17:59:17,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:59:18,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 127.81592 ± 22.443
2025-09-16 17:59:18,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [117.97853, 120.3908, 180.48239, 135.12479, 147.46103, 133.52362, 117.41446, 101.44704, 125.32042, 99.0161]
2025-09-16 17:59:18,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 23.0, 37.0, 27.0, 29.0, 26.0, 23.0, 20.0, 24.0, 20.0]
2025-09-16 17:59:18,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 47 seconds)
2025-09-16 18:01:10,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:01:10,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 124.05078 ± 20.424
2025-09-16 18:01:10,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [135.62163, 127.99003, 118.01845, 116.76509, 89.416306, 89.187035, 154.91858, 129.76508, 144.89029, 133.93529]
2025-09-16 18:01:10,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 25.0, 23.0, 23.0, 18.0, 18.0, 30.0, 25.0, 29.0, 26.0]
2025-09-16 18:01:10,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 53 seconds)
2025-09-16 18:03:02,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:03:03,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 131.71997 ± 25.068
2025-09-16 18:03:03,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [168.77678, 102.12439, 161.76707, 123.03718, 124.990456, 117.165764, 102.55739, 111.6089, 170.94092, 134.23085]
2025-09-16 18:03:03,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 20.0, 32.0, 24.0, 24.0, 23.0, 20.0, 22.0, 35.0, 26.0]
2025-09-16 18:03:03,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1251 [DEBUG]: Training session finished
