2025-09-16 14:38:30,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.200-delay_18
2025-09-16 14:38:30,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.200-delay_18
2025-09-16 14:38:30,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'18': <latency_env.delayed_mdp.ConstantDelay object at 0x14ce742b4890>}
2025-09-16 14:38:30,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 14:38:30,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 14:38:30,369 baseline-bpql-noisepromille200-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=682, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 14:38:30,369 baseline-bpql-noisepromille200-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 14:38:32,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 14:38:32,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 14:40:17,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:40:18,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 222.19888 ± 104.137
2025-09-16 14:40:18,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [349.55865, 313.84772, 102.750496, 323.8024, 318.6981, 95.61624, 107.31284, 267.12448, 254.00554, 89.27217]
2025-09-16 14:40:18,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [67.0, 61.0, 20.0, 60.0, 63.0, 19.0, 21.0, 49.0, 50.0, 18.0]
2025-09-16 14:40:18,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (222.20) for latency 18
2025-09-16 14:40:18,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 55 minutes, 25 seconds)
2025-09-16 14:42:12,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:42:12,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 166.36487 ± 92.691
2025-09-16 14:42:12,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [101.258484, 101.50861, 125.94198, 335.99576, 102.81178, 112.934456, 312.98666, 113.28056, 266.67017, 90.26028]
2025-09-16 14:42:12,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 20.0, 25.0, 65.0, 20.0, 22.0, 60.0, 22.0, 54.0, 18.0]
2025-09-16 14:42:12,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 14 seconds)
2025-09-16 14:44:07,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:44:08,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 274.45868 ± 119.492
2025-09-16 14:44:08,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [102.5364, 94.72069, 284.52435, 302.04446, 342.7726, 373.7607, 289.80664, 429.5238, 405.55975, 119.337074]
2025-09-16 14:44:08,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 19.0, 54.0, 60.0, 68.0, 72.0, 59.0, 83.0, 77.0, 23.0]
2025-09-16 14:44:08,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (274.46) for latency 18
2025-09-16 14:44:08,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 1 minute, 13 seconds)
2025-09-16 14:46:03,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:46:03,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 222.17740 ± 127.611
2025-09-16 14:46:03,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [135.08733, 503.57132, 245.7735, 102.026146, 107.385376, 101.654884, 320.85968, 305.3319, 288.78204, 111.30176]
2025-09-16 14:46:03,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 98.0, 52.0, 20.0, 21.0, 20.0, 61.0, 57.0, 54.0, 22.0]
2025-09-16 14:46:03,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 41 seconds)
2025-09-16 14:47:58,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:47:59,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 233.68852 ± 95.557
2025-09-16 14:47:59,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [222.89355, 89.63797, 285.63416, 272.61877, 425.64502, 275.99182, 118.522896, 226.43787, 290.78143, 128.72179]
2025-09-16 14:47:59,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [45.0, 18.0, 52.0, 55.0, 86.0, 52.0, 23.0, 47.0, 53.0, 25.0]
2025-09-16 14:47:59,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 59 minutes, 40 seconds)
2025-09-16 14:49:54,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:49:55,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 216.97653 ± 141.555
2025-09-16 14:49:55,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [107.454155, 89.8235, 89.03498, 443.2734, 90.65207, 146.46912, 319.43436, 441.3184, 328.02527, 114.28006]
2025-09-16 14:49:55,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 18.0, 18.0, 97.0, 18.0, 28.0, 59.0, 97.0, 66.0, 22.0]
2025-09-16 14:49:55,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 49 seconds)
2025-09-16 14:51:50,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:51:51,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 238.07195 ± 149.584
2025-09-16 14:51:51,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [95.60789, 428.238, 261.1409, 128.91672, 517.69385, 95.45909, 95.67935, 280.32465, 369.70572, 107.953026]
2025-09-16 14:51:51,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 94.0, 52.0, 25.0, 96.0, 19.0, 19.0, 52.0, 68.0, 21.0]
2025-09-16 14:51:51,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 59 minutes, 17 seconds)
2025-09-16 14:53:46,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:53:47,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 217.91190 ± 130.551
2025-09-16 14:53:47,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [169.1151, 101.38139, 391.7, 379.95993, 423.59015, 89.95626, 108.036865, 289.24387, 103.29562, 122.83975]
2025-09-16 14:53:47,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 20.0, 72.0, 72.0, 81.0, 18.0, 21.0, 55.0, 20.0, 24.0]
2025-09-16 14:53:47,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 57 minutes, 25 seconds)
2025-09-16 14:55:41,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:55:42,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 237.07344 ± 104.057
2025-09-16 14:55:42,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [341.45132, 95.76206, 320.65057, 311.7472, 112.63266, 141.93419, 345.73373, 301.59827, 95.83564, 303.3888]
2025-09-16 14:55:42,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [68.0, 19.0, 60.0, 60.0, 22.0, 27.0, 64.0, 56.0, 19.0, 58.0]
2025-09-16 14:55:42,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 55 minutes, 31 seconds)
2025-09-16 14:57:38,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:57:38,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 239.56424 ± 152.545
2025-09-16 14:57:38,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [470.16888, 113.53226, 378.72385, 171.63963, 424.56082, 418.9864, 118.50466, 101.40878, 101.78723, 96.32989]
2025-09-16 14:57:38,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 22.0, 71.0, 33.0, 80.0, 77.0, 23.0, 20.0, 20.0, 19.0]
2025-09-16 14:57:38,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 53 minutes, 44 seconds)
2025-09-16 14:59:34,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 14:59:34,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 255.87578 ± 146.596
2025-09-16 14:59:34,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [304.64398, 139.78743, 373.53305, 414.07462, 96.055016, 502.83148, 95.30476, 377.69623, 130.79543, 124.03586]
2025-09-16 14:59:34,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [59.0, 27.0, 70.0, 81.0, 19.0, 98.0, 19.0, 68.0, 26.0, 24.0]
2025-09-16 14:59:34,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 51 minutes, 50 seconds)
2025-09-16 15:01:30,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:01:31,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 326.10666 ± 198.898
2025-09-16 15:01:31,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [624.5924, 104.13668, 347.12756, 113.24963, 602.42883, 141.09811, 301.9705, 397.38626, 96.61891, 532.45776]
2025-09-16 15:01:31,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [123.0, 21.0, 63.0, 22.0, 120.0, 27.0, 54.0, 74.0, 19.0, 101.0]
2025-09-16 15:01:31,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (326.11) for latency 18
2025-09-16 15:01:31,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 50 minutes, 13 seconds)
2025-09-16 15:03:26,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:03:26,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 213.44901 ± 157.212
2025-09-16 15:03:26,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [101.812225, 120.1549, 120.98112, 421.9181, 90.01552, 130.14362, 96.361595, 528.5015, 131.794, 392.8073]
2025-09-16 15:03:26,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 23.0, 23.0, 78.0, 18.0, 25.0, 19.0, 98.0, 26.0, 76.0]
2025-09-16 15:03:26,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 48 minutes, 6 seconds)
2025-09-16 15:05:22,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:05:23,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 267.14703 ± 145.163
2025-09-16 15:05:23,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [109.72205, 350.745, 89.26073, 103.0109, 487.3359, 101.030304, 392.7158, 413.1354, 351.87558, 272.63867]
2025-09-16 15:05:23,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 70.0, 18.0, 20.0, 90.0, 20.0, 74.0, 77.0, 64.0, 51.0]
2025-09-16 15:05:23,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 46 minutes, 28 seconds)
2025-09-16 15:07:19,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:07:19,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 207.42915 ± 131.891
2025-09-16 15:07:19,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [334.20392, 95.56958, 102.910934, 107.62999, 107.65056, 95.56037, 358.47614, 95.39331, 364.8798, 412.017]
2025-09-16 15:07:19,764 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [64.0, 19.0, 20.0, 21.0, 21.0, 19.0, 68.0, 19.0, 69.0, 76.0]
2025-09-16 15:07:19,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 44 minutes, 37 seconds)
2025-09-16 15:09:14,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:09:15,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 227.11551 ± 134.411
2025-09-16 15:09:15,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [250.17859, 371.33182, 433.84415, 96.259, 352.41736, 366.17694, 102.031395, 102.01444, 94.74333, 102.15792]
2025-09-16 15:09:15,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [49.0, 70.0, 82.0, 19.0, 64.0, 81.0, 20.0, 20.0, 19.0, 20.0]
2025-09-16 15:09:15,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 42 minutes, 34 seconds)
2025-09-16 15:11:10,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:11:11,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 211.45976 ± 165.399
2025-09-16 15:11:11,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [137.00577, 475.8656, 122.80173, 404.91998, 89.42267, 113.664696, 89.22106, 501.2374, 84.52841, 95.9305]
2025-09-16 15:11:11,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 102.0, 24.0, 74.0, 18.0, 22.0, 18.0, 97.0, 17.0, 19.0]
2025-09-16 15:11:11,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 40 minutes, 25 seconds)
2025-09-16 15:13:07,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:13:07,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 259.10730 ± 131.606
2025-09-16 15:13:07,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [95.82449, 96.76365, 409.0307, 350.1973, 117.91422, 338.34482, 398.32883, 333.78, 91.33386, 359.55487]
2025-09-16 15:13:07,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 19.0, 77.0, 66.0, 23.0, 64.0, 79.0, 71.0, 18.0, 63.0]
2025-09-16 15:13:07,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 38 minutes, 49 seconds)
2025-09-16 15:15:02,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:15:03,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 319.99142 ± 152.500
2025-09-16 15:15:03,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [157.99469, 96.69708, 341.85336, 247.19919, 439.5108, 89.26958, 469.80835, 451.29044, 389.82233, 516.46844]
2025-09-16 15:15:03,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 19.0, 68.0, 48.0, 86.0, 18.0, 101.0, 87.0, 71.0, 100.0]
2025-09-16 15:15:03,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 36 minutes, 40 seconds)
2025-09-16 15:17:01,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:17:01,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 220.92281 ± 155.592
2025-09-16 15:17:01,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [330.62997, 96.298805, 419.39752, 96.71702, 102.55001, 89.58359, 102.70137, 438.65924, 443.72836, 88.96215]
2025-09-16 15:17:01,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [66.0, 19.0, 82.0, 19.0, 20.0, 18.0, 20.0, 83.0, 95.0, 18.0]
2025-09-16 15:17:01,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 35 minutes, 14 seconds)
2025-09-16 15:18:55,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:18:56,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 246.29118 ± 165.804
2025-09-16 15:18:56,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [351.33, 119.39659, 553.9691, 246.73936, 89.70306, 101.1689, 89.79697, 343.04706, 472.09598, 95.66482]
2025-09-16 15:18:56,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [67.0, 23.0, 105.0, 48.0, 18.0, 20.0, 18.0, 66.0, 90.0, 19.0]
2025-09-16 15:18:56,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 33 minutes)
2025-09-16 15:20:51,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:20:51,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 243.62195 ± 193.121
2025-09-16 15:20:51,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [95.84921, 258.44858, 84.132195, 89.257385, 624.70325, 503.9256, 101.12264, 127.78097, 438.1291, 112.87068]
2025-09-16 15:20:51,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 56.0, 17.0, 18.0, 119.0, 94.0, 20.0, 25.0, 80.0, 22.0]
2025-09-16 15:20:51,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 30 minutes, 53 seconds)
2025-09-16 15:22:47,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:22:47,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 283.02823 ± 157.421
2025-09-16 15:22:47,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [102.81371, 118.001656, 299.38025, 124.49773, 106.15933, 335.74677, 537.4306, 315.11993, 516.46436, 374.6681]
2025-09-16 15:22:47,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 23.0, 57.0, 24.0, 21.0, 64.0, 104.0, 58.0, 96.0, 70.0]
2025-09-16 15:22:48,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 28 minutes, 56 seconds)
2025-09-16 15:24:42,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:24:43,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 237.20157 ± 185.144
2025-09-16 15:24:43,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [96.50795, 95.63367, 489.27872, 96.640366, 621.3297, 294.4432, 118.13557, 112.76979, 362.90292, 84.37379]
2025-09-16 15:24:43,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 19.0, 95.0, 19.0, 120.0, 63.0, 23.0, 22.0, 73.0, 17.0]
2025-09-16 15:24:43,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 26 minutes, 58 seconds)
2025-09-16 15:26:40,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:26:40,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 221.67220 ± 202.323
2025-09-16 15:26:40,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [102.269775, 101.70271, 96.86444, 119.19623, 101.6822, 96.56341, 363.9363, 375.48236, 117.842995, 741.18134]
2025-09-16 15:26:40,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 20.0, 19.0, 23.0, 20.0, 19.0, 67.0, 68.0, 23.0, 141.0]
2025-09-16 15:26:40,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 24 minutes, 41 seconds)
2025-09-16 15:28:36,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:28:36,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 245.44214 ± 165.562
2025-09-16 15:28:36,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [262.99896, 611.96313, 313.41672, 120.27448, 439.63824, 102.27714, 101.07001, 100.858925, 288.32062, 113.603355]
2025-09-16 15:28:36,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [52.0, 113.0, 67.0, 24.0, 82.0, 20.0, 20.0, 20.0, 54.0, 22.0]
2025-09-16 15:28:36,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 23 minutes, 8 seconds)
2025-09-16 15:30:32,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:30:33,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 263.65521 ± 129.974
2025-09-16 15:30:33,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [320.19354, 360.3047, 354.70245, 107.94119, 412.33215, 330.60358, 124.565796, 422.8706, 101.21068, 101.827446]
2025-09-16 15:30:33,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [61.0, 68.0, 68.0, 21.0, 84.0, 71.0, 24.0, 81.0, 20.0, 20.0]
2025-09-16 15:30:33,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 21 minutes, 30 seconds)
2025-09-16 15:32:28,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:32:29,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 271.83688 ± 235.074
2025-09-16 15:32:29,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [95.91932, 444.40012, 108.86631, 108.08215, 113.733444, 94.77867, 105.97798, 291.60162, 565.4398, 789.56946]
2025-09-16 15:32:29,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 81.0, 21.0, 21.0, 22.0, 19.0, 21.0, 53.0, 101.0, 159.0]
2025-09-16 15:32:29,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 19 minutes, 35 seconds)
2025-09-16 15:34:25,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:34:25,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 255.23206 ± 131.475
2025-09-16 15:34:25,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [122.77202, 360.43124, 123.3578, 84.26148, 357.75742, 405.06097, 392.4052, 405.4485, 124.8627, 175.96312]
2025-09-16 15:34:25,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 65.0, 24.0, 17.0, 73.0, 75.0, 75.0, 74.0, 24.0, 34.0]
2025-09-16 15:34:25,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 17 minutes, 46 seconds)
2025-09-16 15:36:21,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:36:22,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 263.03754 ± 152.458
2025-09-16 15:36:22,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [149.07278, 286.609, 415.70233, 135.95644, 124.00171, 443.4909, 347.39124, 107.63349, 95.99718, 524.52057]
2025-09-16 15:36:22,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 54.0, 78.0, 26.0, 24.0, 85.0, 63.0, 21.0, 19.0, 98.0]
2025-09-16 15:36:22,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 15 minutes, 41 seconds)
2025-09-16 15:38:17,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:38:18,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 295.72992 ± 149.778
2025-09-16 15:38:18,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [417.9721, 451.58942, 107.229385, 383.77283, 144.94388, 130.46017, 310.8193, 111.97906, 376.37576, 522.1573]
2025-09-16 15:38:18,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 86.0, 21.0, 83.0, 28.0, 25.0, 58.0, 22.0, 82.0, 96.0]
2025-09-16 15:38:18,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 13 minutes, 51 seconds)
2025-09-16 15:40:13,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:40:14,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 314.44571 ± 126.168
2025-09-16 15:40:14,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [95.57022, 387.5186, 482.9309, 321.3292, 123.29178, 196.83453, 325.04788, 448.40683, 383.77515, 379.75195]
2025-09-16 15:40:14,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 74.0, 89.0, 59.0, 24.0, 38.0, 60.0, 84.0, 71.0, 70.0]
2025-09-16 15:40:14,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 11 minutes, 48 seconds)
2025-09-16 15:42:10,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:42:11,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 213.35506 ± 130.194
2025-09-16 15:42:11,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [412.93533, 122.2697, 101.28356, 102.7614, 105.6587, 108.23655, 378.98993, 113.81753, 379.59235, 308.00558]
2025-09-16 15:42:11,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 24.0, 20.0, 20.0, 21.0, 21.0, 76.0, 22.0, 69.0, 60.0]
2025-09-16 15:42:11,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 9 minutes, 53 seconds)
2025-09-16 15:44:06,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:44:07,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 302.89441 ± 192.149
2025-09-16 15:44:07,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [304.54514, 374.54657, 94.992676, 96.066154, 108.45116, 372.8818, 95.48029, 491.69705, 685.3818, 404.9017]
2025-09-16 15:44:07,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [59.0, 69.0, 19.0, 19.0, 21.0, 69.0, 19.0, 90.0, 139.0, 75.0]
2025-09-16 15:44:07,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 7 minutes, 58 seconds)
2025-09-16 15:46:03,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:46:04,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 305.13544 ± 214.273
2025-09-16 15:46:04,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [551.50433, 374.19952, 102.59006, 318.22662, 107.791756, 96.14234, 554.1459, 687.3054, 140.78674, 118.66195]
2025-09-16 15:46:04,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [106.0, 70.0, 20.0, 58.0, 21.0, 19.0, 106.0, 130.0, 27.0, 23.0]
2025-09-16 15:46:04,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 6 minutes, 8 seconds)
2025-09-16 15:47:59,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:48:00,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 218.44247 ± 133.562
2025-09-16 15:48:00,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [101.8568, 253.28989, 260.54294, 393.26587, 101.53352, 84.430466, 96.40268, 90.24157, 375.73956, 427.12164]
2025-09-16 15:48:00,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 50.0, 48.0, 71.0, 20.0, 17.0, 19.0, 18.0, 69.0, 79.0]
2025-09-16 15:48:00,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 3 minutes, 59 seconds)
2025-09-16 15:49:54,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:49:55,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 307.56476 ± 164.559
2025-09-16 15:49:55,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [380.99158, 102.20966, 172.19295, 280.6586, 393.67776, 667.22565, 384.2705, 399.78665, 118.45137, 176.18275]
2025-09-16 15:49:55,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [70.0, 20.0, 34.0, 52.0, 77.0, 129.0, 72.0, 74.0, 23.0, 34.0]
2025-09-16 15:49:55,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 1 minute, 58 seconds)
2025-09-16 15:51:48,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:51:49,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 264.71307 ± 191.219
2025-09-16 15:51:49,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [95.70761, 107.809204, 427.80264, 168.58284, 435.72913, 564.6981, 100.47933, 542.7749, 102.24561, 101.301285]
2025-09-16 15:51:49,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 21.0, 80.0, 32.0, 79.0, 111.0, 20.0, 99.0, 20.0, 20.0]
2025-09-16 15:51:49,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 59 minutes, 28 seconds)
2025-09-16 15:53:43,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:53:43,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 155.75804 ± 80.360
2025-09-16 15:53:43,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [112.16444, 273.3114, 345.1916, 146.72256, 130.12405, 101.31603, 145.63367, 95.44637, 112.156944, 95.51336]
2025-09-16 15:53:43,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 62.0, 68.0, 28.0, 25.0, 20.0, 28.0, 19.0, 22.0, 19.0]
2025-09-16 15:53:43,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 57 minutes, 7 seconds)
2025-09-16 15:55:37,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:55:38,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 214.01579 ± 165.959
2025-09-16 15:55:38,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [101.43344, 577.374, 102.04231, 102.57502, 142.36075, 96.62691, 461.1538, 120.154785, 126.22731, 310.2095]
2025-09-16 15:55:38,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 107.0, 20.0, 20.0, 28.0, 19.0, 85.0, 24.0, 25.0, 57.0]
2025-09-16 15:55:38,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 54 minutes, 48 seconds)
2025-09-16 15:57:31,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:57:32,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 290.15668 ± 195.770
2025-09-16 15:57:32,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [583.2841, 100.445755, 404.7253, 531.9829, 117.873665, 108.838394, 102.16417, 507.6626, 89.40287, 355.1871]
2025-09-16 15:57:32,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [122.0, 20.0, 74.0, 101.0, 23.0, 21.0, 20.0, 97.0, 18.0, 78.0]
2025-09-16 15:57:32,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 52 minutes, 33 seconds)
2025-09-16 15:59:25,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 15:59:26,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 328.79221 ± 193.341
2025-09-16 15:59:26,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [392.7707, 113.03711, 597.6708, 404.8823, 145.10107, 488.54364, 315.3814, 620.8621, 107.81696, 101.85622]
2025-09-16 15:59:26,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [75.0, 22.0, 116.0, 85.0, 28.0, 91.0, 61.0, 117.0, 21.0, 20.0]
2025-09-16 15:59:26,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (328.79) for latency 18
2025-09-16 15:59:26,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 50 minutes, 18 seconds)
2025-09-16 16:01:20,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:01:21,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 236.53340 ± 115.670
2025-09-16 16:01:21,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [135.04834, 336.1688, 168.16673, 310.07285, 342.50623, 105.98652, 108.29132, 316.35754, 113.30365, 429.4321]
2025-09-16 16:01:21,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 64.0, 32.0, 58.0, 64.0, 21.0, 21.0, 61.0, 22.0, 77.0]
2025-09-16 16:01:21,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 48 minutes, 39 seconds)
2025-09-16 16:03:16,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:03:17,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 281.41492 ± 157.093
2025-09-16 16:03:17,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [301.7787, 106.44453, 89.45389, 404.90128, 519.64264, 107.13695, 96.48947, 439.6975, 370.98755, 377.6167]
2025-09-16 16:03:17,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [58.0, 21.0, 18.0, 76.0, 97.0, 21.0, 19.0, 88.0, 71.0, 68.0]
2025-09-16 16:03:17,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 47 minutes, 5 seconds)
2025-09-16 16:05:21,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:05:22,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 248.02466 ± 162.492
2025-09-16 16:05:22,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [107.61657, 469.31424, 342.00507, 95.24808, 96.23246, 89.14633, 332.52054, 119.45181, 280.3916, 548.31995]
2025-09-16 16:05:22,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 87.0, 63.0, 19.0, 19.0, 18.0, 61.0, 23.0, 55.0, 115.0]
2025-09-16 16:05:22,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 47 minutes)
2025-09-16 16:07:28,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:07:29,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 336.90277 ± 234.032
2025-09-16 16:07:29,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [488.51282, 96.72449, 102.014626, 411.251, 95.3912, 454.13937, 107.45142, 338.3728, 862.079, 413.0914]
2025-09-16 16:07:29,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 19.0, 20.0, 73.0, 19.0, 86.0, 21.0, 63.0, 182.0, 78.0]
2025-09-16 16:07:29,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (336.90) for latency 18
2025-09-16 16:07:29,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 47 minutes, 25 seconds)
2025-09-16 16:09:38,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:09:39,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 357.57239 ± 110.171
2025-09-16 16:09:39,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [347.4114, 391.32248, 376.9061, 407.3866, 401.55923, 455.21518, 455.17844, 150.2649, 144.35005, 446.12936]
2025-09-16 16:09:39,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [63.0, 74.0, 69.0, 74.0, 73.0, 86.0, 85.0, 30.0, 28.0, 84.0]
2025-09-16 16:09:39,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (357.57) for latency 18
2025-09-16 16:09:39,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 48 minutes, 21 seconds)
2025-09-16 16:11:48,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:11:49,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 284.71915 ± 130.324
2025-09-16 16:11:49,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [290.60037, 153.96713, 341.21432, 305.46982, 96.054245, 89.634575, 358.29297, 378.0625, 531.71643, 302.17886]
2025-09-16 16:11:49,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [53.0, 30.0, 66.0, 65.0, 19.0, 18.0, 69.0, 68.0, 108.0, 62.0]
2025-09-16 16:11:49,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 48 minutes, 57 seconds)
2025-09-16 16:13:54,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:13:55,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 273.19470 ± 130.393
2025-09-16 16:13:55,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [357.51865, 106.23724, 370.21857, 412.24872, 353.5724, 144.55765, 112.91731, 277.7153, 466.24786, 130.71346]
2025-09-16 16:13:55,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [66.0, 21.0, 67.0, 76.0, 66.0, 28.0, 22.0, 52.0, 90.0, 25.0]
2025-09-16 16:13:55,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 48 minutes, 29 seconds)
2025-09-16 16:16:06,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:16:07,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 295.77551 ± 159.685
2025-09-16 16:16:07,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [392.61273, 391.7744, 114.51772, 271.91107, 90.36444, 433.5576, 96.48766, 461.17084, 537.4085, 167.95024]
2025-09-16 16:16:07,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 73.0, 22.0, 51.0, 18.0, 95.0, 19.0, 89.0, 98.0, 32.0]
2025-09-16 16:16:07,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 47 minutes, 34 seconds)
2025-09-16 16:18:19,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:18:20,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 293.97610 ± 136.286
2025-09-16 16:18:20,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [123.48473, 427.19397, 289.3307, 100.319176, 361.7076, 484.12503, 354.46274, 277.90567, 95.52173, 425.70944]
2025-09-16 16:18:20,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 97.0, 58.0, 20.0, 76.0, 88.0, 69.0, 55.0, 19.0, 80.0]
2025-09-16 16:18:20,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 46 minutes, 22 seconds)
2025-09-16 16:20:29,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:20:30,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 283.35162 ± 152.493
2025-09-16 16:20:30,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [84.30117, 378.4559, 94.52737, 418.88266, 113.08325, 385.9005, 450.01312, 403.339, 404.60638, 100.40698]
2025-09-16 16:20:30,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [17.0, 75.0, 19.0, 81.0, 22.0, 74.0, 84.0, 74.0, 77.0, 20.0]
2025-09-16 16:20:30,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 44 minutes, 8 seconds)
2025-09-16 16:22:44,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:22:44,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 237.35721 ± 126.188
2025-09-16 16:22:44,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [413.6938, 415.76538, 130.47884, 113.7173, 328.8948, 95.87733, 316.31598, 108.873215, 320.25388, 129.70169]
2025-09-16 16:22:44,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 79.0, 25.0, 22.0, 61.0, 19.0, 58.0, 21.0, 59.0, 25.0]
2025-09-16 16:22:44,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 42 minutes, 37 seconds)
2025-09-16 16:24:55,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:24:56,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 304.06891 ± 138.585
2025-09-16 16:24:56,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [560.3974, 374.98068, 355.28625, 323.08154, 320.1281, 369.9866, 102.22506, 388.1346, 139.46382, 107.00508]
2025-09-16 16:24:56,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [107.0, 71.0, 65.0, 70.0, 60.0, 71.0, 20.0, 72.0, 27.0, 21.0]
2025-09-16 16:24:56,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 41 minutes, 18 seconds)
2025-09-16 16:27:08,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:27:08,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 281.54581 ± 166.606
2025-09-16 16:27:08,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [400.86346, 316.72226, 113.73232, 117.107155, 322.7139, 128.98961, 107.49581, 364.81143, 281.07642, 661.94586]
2025-09-16 16:27:08,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 60.0, 22.0, 23.0, 72.0, 25.0, 21.0, 81.0, 54.0, 125.0]
2025-09-16 16:27:08,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 39 minutes, 11 seconds)
2025-09-16 16:29:22,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:29:23,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 335.07587 ± 231.858
2025-09-16 16:29:23,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [570.825, 102.55779, 531.235, 757.94763, 421.21475, 478.4547, 135.4019, 101.59387, 143.89615, 107.63203]
2025-09-16 16:29:23,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [113.0, 20.0, 97.0, 138.0, 77.0, 87.0, 26.0, 20.0, 28.0, 21.0]
2025-09-16 16:29:23,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 37 minutes, 14 seconds)
2025-09-16 16:31:37,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:31:38,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 143.21031 ± 91.595
2025-09-16 16:31:38,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [96.23958, 116.24879, 133.67851, 155.4249, 411.08633, 89.75063, 89.216515, 127.77428, 117.19116, 95.49244]
2025-09-16 16:31:38,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 23.0, 26.0, 30.0, 74.0, 18.0, 18.0, 25.0, 23.0, 19.0]
2025-09-16 16:31:38,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 35 minutes, 41 seconds)
2025-09-16 16:33:51,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:33:52,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 251.22681 ± 150.955
2025-09-16 16:33:52,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [470.24097, 323.3429, 135.15324, 400.22592, 101.92621, 366.7981, 427.0286, 90.549446, 100.75527, 96.24731]
2025-09-16 16:33:52,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 61.0, 27.0, 74.0, 20.0, 65.0, 87.0, 18.0, 20.0, 19.0]
2025-09-16 16:33:52,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 33 minutes, 23 seconds)
2025-09-16 16:36:10,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:36:10,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 232.21384 ± 227.305
2025-09-16 16:36:10,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [107.167946, 101.96828, 326.36185, 89.91683, 754.7198, 559.5332, 96.526184, 90.58499, 106.17306, 89.18619]
2025-09-16 16:36:10,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 20.0, 60.0, 18.0, 137.0, 100.0, 19.0, 18.0, 21.0, 18.0]
2025-09-16 16:36:10,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 32 minutes, 13 seconds)
2025-09-16 16:38:26,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:38:26,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 287.90045 ± 186.585
2025-09-16 16:38:26,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [113.28039, 127.58185, 481.4276, 550.6413, 307.46774, 124.26886, 106.54256, 96.57997, 412.6926, 558.52136]
2025-09-16 16:38:26,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 25.0, 90.0, 105.0, 58.0, 24.0, 21.0, 19.0, 74.0, 102.0]
2025-09-16 16:38:26,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 30 minutes, 23 seconds)
2025-09-16 16:40:40,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:40:41,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 317.23492 ± 160.958
2025-09-16 16:40:41,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [89.76188, 476.65475, 339.6944, 496.5806, 90.21884, 527.09845, 364.2614, 317.71564, 373.00507, 97.35812]
2025-09-16 16:40:41,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 87.0, 63.0, 108.0, 18.0, 114.0, 66.0, 61.0, 69.0, 19.0]
2025-09-16 16:40:41,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 28 minutes, 5 seconds)
2025-09-16 16:42:53,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:42:53,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 250.91353 ± 157.138
2025-09-16 16:42:53,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [340.30655, 370.53696, 101.383865, 333.54752, 452.53403, 89.0124, 101.60969, 101.89781, 112.56862, 505.7379]
2025-09-16 16:42:53,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [64.0, 73.0, 20.0, 66.0, 85.0, 18.0, 20.0, 20.0, 22.0, 93.0]
2025-09-16 16:42:53,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 25 minutes, 35 seconds)
2025-09-16 16:45:08,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:45:09,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 275.79208 ± 236.504
2025-09-16 16:45:09,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [102.33471, 734.6428, 334.08582, 455.05487, 97.14068, 633.3967, 96.11498, 96.18399, 112.91446, 96.05184]
2025-09-16 16:45:09,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [20.0, 135.0, 62.0, 82.0, 19.0, 119.0, 19.0, 19.0, 22.0, 19.0]
2025-09-16 16:45:09,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 23 minutes, 31 seconds)
2025-09-16 16:47:25,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:47:26,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 271.27411 ± 138.900
2025-09-16 16:47:26,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [378.24194, 281.26608, 140.24146, 129.67117, 111.68499, 292.116, 308.29578, 408.17502, 118.34062, 544.7081]
2025-09-16 16:47:26,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [70.0, 55.0, 27.0, 25.0, 22.0, 54.0, 57.0, 87.0, 23.0, 103.0]
2025-09-16 16:47:26,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 21 minutes, 3 seconds)
2025-09-16 16:49:40,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:49:40,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 219.27177 ± 151.432
2025-09-16 16:49:40,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [90.068, 123.249245, 96.59004, 456.01764, 329.1264, 101.19895, 89.66364, 437.07867, 379.0305, 90.69484]
2025-09-16 16:49:40,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 24.0, 19.0, 86.0, 62.0, 20.0, 18.0, 84.0, 71.0, 18.0]
2025-09-16 16:49:40,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 18 minutes, 38 seconds)
2025-09-16 16:51:54,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:51:55,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 311.27240 ± 152.836
2025-09-16 16:51:55,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [204.83029, 157.39268, 440.10776, 365.5081, 389.89078, 372.5318, 95.800606, 543.72253, 90.53448, 452.40503]
2025-09-16 16:51:55,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [39.0, 30.0, 80.0, 74.0, 80.0, 71.0, 19.0, 99.0, 18.0, 82.0]
2025-09-16 16:51:55,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 16 minutes, 26 seconds)
2025-09-16 16:54:10,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:54:11,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 253.68938 ± 157.975
2025-09-16 16:54:11,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [353.36517, 123.15472, 385.22473, 88.96209, 318.0121, 102.62542, 95.974, 460.00778, 502.93604, 106.63151]
2025-09-16 16:54:11,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 24.0, 75.0, 18.0, 67.0, 20.0, 19.0, 98.0, 104.0, 21.0]
2025-09-16 16:54:11,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 14 minutes, 33 seconds)
2025-09-16 16:56:26,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:56:27,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 324.34750 ± 162.201
2025-09-16 16:56:27,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [310.9839, 482.0384, 330.35516, 476.73294, 111.058815, 132.9673, 347.6593, 327.17105, 108.04548, 616.46295]
2025-09-16 16:56:27,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [59.0, 97.0, 64.0, 91.0, 22.0, 26.0, 74.0, 61.0, 21.0, 119.0]
2025-09-16 16:56:27,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 12 minutes, 18 seconds)
2025-09-16 16:58:43,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 16:58:44,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 312.09650 ± 192.857
2025-09-16 16:58:44,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [111.479774, 128.38454, 324.41852, 107.44236, 602.89703, 95.41113, 316.37326, 368.76007, 424.154, 641.64435]
2025-09-16 16:58:44,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 25.0, 62.0, 21.0, 127.0, 19.0, 71.0, 82.0, 83.0, 121.0]
2025-09-16 16:58:44,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 10 minutes)
2025-09-16 17:01:00,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 17:01:01,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 377.44604 ± 169.019
2025-09-16 17:01:01,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [129.88776, 101.12502, 568.18805, 175.48808, 365.38757, 415.595, 536.3127, 486.42307, 537.2121, 458.84106]
2025-09-16 17:01:01,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 20.0, 103.0, 34.0, 66.0, 76.0, 99.0, 89.0, 112.0, 84.0]
2025-09-16 17:01:01,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (377.45) for latency 18
2025-09-16 17:01:01,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 8 minutes, 3 seconds)
2025-09-16 17:03:14,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 17:03:15,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 297.88562 ± 180.087
2025-09-16 17:03:15,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [334.25848, 500.82684, 352.31116, 570.98975, 107.35044, 101.33145, 111.27353, 531.4541, 268.91638, 100.1439]
2025-09-16 17:03:15,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [66.0, 97.0, 69.0, 105.0, 21.0, 20.0, 22.0, 98.0, 52.0, 20.0]
2025-09-16 17:03:15,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 5 minutes, 44 seconds)
2025-09-16 17:05:31,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 17:05:32,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 311.35397 ± 148.831
2025-09-16 17:05:32,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [401.54437, 402.52896, 352.55325, 96.842, 591.2144, 102.80549, 373.8083, 357.55896, 294.21716, 140.46666]
2025-09-16 17:05:32,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 86.0, 66.0, 19.0, 112.0, 20.0, 70.0, 65.0, 55.0, 27.0]
2025-09-16 17:05:32,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 3 minutes, 33 seconds)
2025-09-16 17:07:45,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 17:07:47,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 376.14960 ± 222.905
2025-09-16 17:07:47,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [499.81668, 117.874695, 444.2598, 309.8885, 106.87172, 102.61943, 540.49585, 329.60452, 847.10254, 462.9624]
2025-09-16 17:07:47,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 23.0, 80.0, 57.0, 21.0, 20.0, 105.0, 62.0, 170.0, 103.0]
2025-09-16 17:07:47,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 1 minute, 11 seconds)
2025-09-16 17:10:03,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 17:10:04,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 188.95000 ± 141.348
2025-09-16 17:10:04,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [95.41498, 461.58157, 89.811935, 107.75558, 129.9572, 111.15605, 90.69075, 437.2099, 281.46005, 84.46185]
2025-09-16 17:10:04,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 96.0, 18.0, 21.0, 25.0, 22.0, 18.0, 76.0, 53.0, 17.0]
2025-09-16 17:10:04,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 58 minutes, 56 seconds)
2025-09-16 17:12:20,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 17:12:20,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 272.96326 ± 182.322
2025-09-16 17:12:20,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [124.62142, 130.74156, 95.60271, 106.44795, 462.06503, 402.33246, 392.33224, 95.06245, 636.95715, 283.46942]
2025-09-16 17:12:20,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 25.0, 19.0, 21.0, 82.0, 73.0, 80.0, 19.0, 130.0, 54.0]
2025-09-16 17:12:20,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 56 minutes, 36 seconds)
2025-09-16 17:14:38,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 17:14:39,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 321.41226 ± 260.177
2025-09-16 17:14:39,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [112.23759, 102.710815, 705.9035, 375.9155, 95.685265, 89.66353, 538.6451, 806.19385, 280.1522, 107.01528]
2025-09-16 17:14:39,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 20.0, 136.0, 74.0, 19.0, 18.0, 100.0, 172.0, 58.0, 21.0]
2025-09-16 17:14:39,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 54 minutes, 41 seconds)
2025-09-16 17:16:53,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 17:16:54,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 212.55722 ± 151.214
2025-09-16 17:16:54,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [88.9085, 300.1309, 403.59802, 561.73584, 130.41803, 101.82388, 178.74808, 89.318184, 130.71216, 140.17867]
2025-09-16 17:16:54,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 59.0, 84.0, 119.0, 25.0, 20.0, 34.0, 18.0, 25.0, 28.0]
2025-09-16 17:16:54,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 52 minutes, 18 seconds)
2025-09-16 17:19:08,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 17:19:08,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 266.68033 ± 164.522
2025-09-16 17:19:08,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [380.73483, 446.68997, 101.63347, 347.67056, 94.92307, 96.943344, 123.7512, 148.99321, 357.85165, 567.6121]
2025-09-16 17:19:08,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [70.0, 80.0, 20.0, 71.0, 19.0, 19.0, 24.0, 29.0, 70.0, 101.0]
2025-09-16 17:19:08,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 50 minutes)
2025-09-16 17:21:20,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 17:21:21,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 290.08038 ± 192.567
2025-09-16 17:21:21,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [441.40622, 90.237625, 666.3432, 360.38464, 108.598305, 465.7489, 101.890434, 145.51572, 119.64635, 401.03253]
2025-09-16 17:21:21,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 18.0, 128.0, 65.0, 21.0, 99.0, 20.0, 28.0, 23.0, 86.0]
2025-09-16 17:21:21,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 47 minutes, 25 seconds)
2025-09-16 17:23:39,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 17:23:39,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 131.36284 ± 58.071
2025-09-16 17:23:39,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [96.00211, 111.812996, 119.252235, 125.1583, 293.9889, 96.30894, 89.09141, 167.27007, 106.66813, 108.07537]
2025-09-16 17:23:39,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 22.0, 23.0, 24.0, 61.0, 19.0, 18.0, 32.0, 21.0, 21.0]
2025-09-16 17:23:39,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 45 minutes, 15 seconds)
2025-09-16 17:25:56,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 17:25:57,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 290.56500 ± 180.335
2025-09-16 17:25:57,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [129.3647, 101.40275, 96.18239, 565.29175, 156.27345, 113.21664, 523.7337, 374.5162, 376.53183, 469.137]
2025-09-16 17:25:57,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 20.0, 19.0, 106.0, 30.0, 22.0, 93.0, 76.0, 69.0, 83.0]
2025-09-16 17:25:57,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 42 minutes, 55 seconds)
2025-09-16 17:28:11,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 17:28:12,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 361.99951 ± 341.651
2025-09-16 17:28:12,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [124.98819, 970.0389, 685.3046, 119.84406, 95.74248, 425.59897, 96.53217, 95.60163, 904.39685, 101.947365]
2025-09-16 17:28:12,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 185.0, 126.0, 23.0, 19.0, 86.0, 19.0, 19.0, 176.0, 20.0]
2025-09-16 17:28:12,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 40 minutes, 41 seconds)
2025-09-16 17:30:28,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 17:30:29,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 455.22061 ± 214.277
2025-09-16 17:30:29,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [834.81964, 380.20334, 139.61435, 470.86246, 692.30273, 590.37805, 440.15952, 436.65472, 478.05725, 89.15372]
2025-09-16 17:30:29,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [173.0, 71.0, 27.0, 101.0, 133.0, 129.0, 82.0, 80.0, 87.0, 18.0]
2025-09-16 17:30:29,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (455.22) for latency 18
2025-09-16 17:30:29,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 38 minutes, 34 seconds)
2025-09-16 17:32:43,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 17:32:45,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 417.18594 ± 256.102
2025-09-16 17:32:45,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [502.82083, 360.07404, 253.37105, 450.30423, 95.16175, 335.9042, 1004.33307, 651.74194, 428.32306, 89.8255]
2025-09-16 17:32:45,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [91.0, 79.0, 50.0, 100.0, 19.0, 64.0, 180.0, 127.0, 94.0, 18.0]
2025-09-16 17:32:45,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 36 minutes, 26 seconds)
2025-09-16 17:34:59,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 17:35:00,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 262.30255 ± 178.027
2025-09-16 17:35:00,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [96.10319, 101.92348, 463.6954, 582.89276, 84.124084, 84.22892, 365.3004, 327.60495, 113.838486, 403.31412]
2025-09-16 17:35:00,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 20.0, 87.0, 108.0, 17.0, 17.0, 70.0, 60.0, 22.0, 81.0]
2025-09-16 17:35:00,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 34 minutes, 1 second)
2025-09-16 17:37:17,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 17:37:18,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 359.19064 ± 166.386
2025-09-16 17:37:18,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [411.68307, 369.9342, 135.63881, 144.6217, 441.67813, 113.807495, 651.4327, 456.4956, 483.6976, 382.91757]
2025-09-16 17:37:18,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 68.0, 26.0, 28.0, 88.0, 22.0, 127.0, 86.0, 102.0, 77.0]
2025-09-16 17:37:18,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 31 minutes, 48 seconds)
2025-09-16 17:39:37,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 17:39:38,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 489.99677 ± 237.096
2025-09-16 17:39:38,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [316.05814, 320.2722, 469.49188, 522.88367, 636.3403, 398.41025, 490.90622, 1081.1449, 522.67505, 141.78525]
2025-09-16 17:39:38,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [62.0, 59.0, 84.0, 114.0, 113.0, 81.0, 108.0, 205.0, 98.0, 27.0]
2025-09-16 17:39:38,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (490.00) for latency 18
2025-09-16 17:39:38,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 29 minutes, 43 seconds)
2025-09-16 17:41:55,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 17:41:55,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 282.09491 ± 192.999
2025-09-16 17:41:55,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [340.78796, 538.8692, 95.88405, 89.3436, 387.83914, 96.24773, 107.820404, 106.906815, 490.25778, 566.9927]
2025-09-16 17:41:55,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [64.0, 99.0, 19.0, 18.0, 82.0, 19.0, 21.0, 21.0, 92.0, 102.0]
2025-09-16 17:41:55,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 27 minutes, 26 seconds)
2025-09-16 17:44:08,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 17:44:09,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 255.35933 ± 220.520
2025-09-16 17:44:09,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [95.15094, 95.80482, 722.79297, 555.7153, 102.37191, 110.483604, 97.12607, 244.81635, 433.10355, 96.227715]
2025-09-16 17:44:09,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [19.0, 19.0, 132.0, 100.0, 20.0, 22.0, 19.0, 50.0, 80.0, 19.0]
2025-09-16 17:44:09,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 25 minutes, 5 seconds)
2025-09-16 17:46:23,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 17:46:24,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 247.19156 ± 157.171
2025-09-16 17:46:24,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [320.78015, 402.7199, 100.84394, 582.01025, 117.97137, 107.389404, 286.43774, 100.84834, 335.64374, 117.27079]
2025-09-16 17:46:24,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [59.0, 74.0, 20.0, 101.0, 23.0, 21.0, 55.0, 20.0, 60.0, 23.0]
2025-09-16 17:46:24,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 22 minutes, 47 seconds)
2025-09-16 17:48:41,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 17:48:42,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 317.72354 ± 147.243
2025-09-16 17:48:42,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [418.2967, 312.08832, 382.2666, 128.29604, 619.39844, 317.17935, 129.60124, 131.06047, 368.91193, 370.13647]
2025-09-16 17:48:42,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 58.0, 72.0, 25.0, 108.0, 58.0, 25.0, 25.0, 66.0, 68.0]
2025-09-16 17:48:42,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 20 minutes, 29 seconds)
2025-09-16 17:50:59,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 17:51:00,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 243.55823 ± 151.566
2025-09-16 17:51:00,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [89.212875, 163.48595, 378.64334, 137.70264, 456.38907, 90.54887, 153.21431, 465.68124, 102.47676, 398.2271]
2025-09-16 17:51:00,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 32.0, 68.0, 27.0, 85.0, 18.0, 29.0, 84.0, 20.0, 71.0]
2025-09-16 17:51:00,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 18 minutes, 10 seconds)
2025-09-16 17:53:15,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 17:53:17,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 383.95306 ± 276.970
2025-09-16 17:53:17,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [90.70086, 101.00309, 107.161514, 89.75788, 575.2166, 748.5225, 549.0673, 430.92383, 284.60953, 862.56757]
2025-09-16 17:53:17,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 20.0, 21.0, 18.0, 106.0, 139.0, 101.0, 80.0, 55.0, 166.0]
2025-09-16 17:53:17,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 15 minutes, 53 seconds)
2025-09-16 17:55:29,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 17:55:30,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 379.91455 ± 223.992
2025-09-16 17:55:30,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [89.75481, 568.7028, 107.34244, 511.9238, 134.6715, 406.43164, 441.25067, 851.7947, 354.99548, 332.27728]
2025-09-16 17:55:30,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 117.0, 21.0, 94.0, 26.0, 78.0, 83.0, 178.0, 64.0, 63.0]
2025-09-16 17:55:30,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 13 minutes, 37 seconds)
2025-09-16 17:57:43,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 17:57:44,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 264.05615 ± 195.330
2025-09-16 17:57:44,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [90.78986, 123.30184, 365.7873, 161.74742, 455.99893, 118.22408, 121.04579, 420.22897, 95.62482, 687.8125]
2025-09-16 17:57:44,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [18.0, 25.0, 67.0, 31.0, 81.0, 23.0, 24.0, 77.0, 19.0, 143.0]
2025-09-16 17:57:44,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 11 minutes, 20 seconds)
2025-09-16 17:59:55,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 17:59:56,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 220.43564 ± 138.638
2025-09-16 17:59:56,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [387.00934, 406.0932, 102.489235, 320.07706, 118.26078, 433.86234, 89.15522, 108.42101, 119.27439, 119.71397]
2025-09-16 17:59:56,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [72.0, 89.0, 20.0, 57.0, 23.0, 80.0, 18.0, 21.0, 23.0, 23.0]
2025-09-16 17:59:56,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes, 59 seconds)
2025-09-16 18:02:10,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 18:02:11,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 281.13318 ± 188.677
2025-09-16 18:02:11,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [511.83853, 326.89688, 84.16599, 124.99256, 106.27913, 451.80017, 448.35568, 102.14175, 565.6587, 89.20233]
2025-09-16 18:02:11,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [112.0, 60.0, 17.0, 24.0, 21.0, 80.0, 83.0, 20.0, 121.0, 18.0]
2025-09-16 18:02:11,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 42 seconds)
2025-09-16 18:04:30,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 18:04:31,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 293.50638 ± 182.365
2025-09-16 18:04:31,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [476.09866, 124.97442, 113.042404, 489.46548, 404.35764, 509.0525, 490.25067, 124.18671, 112.8863, 90.749]
2025-09-16 18:04:31,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [96.0, 24.0, 22.0, 103.0, 76.0, 94.0, 102.0, 24.0, 22.0, 18.0]
2025-09-16 18:04:31,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 29 seconds)
2025-09-16 18:06:46,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 18:06:48,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 428.89838 ± 231.187
2025-09-16 18:06:48,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [112.971695, 561.9704, 502.26328, 792.7546, 548.18695, 329.91837, 497.8837, 128.72601, 124.887, 689.42194]
2025-09-16 18:06:48,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 104.0, 89.0, 146.0, 98.0, 61.0, 91.0, 25.0, 24.0, 126.0]
2025-09-16 18:06:48,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 15 seconds)
2025-09-16 18:09:01,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 18:09:02,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 328.71112 ± 197.424
2025-09-16 18:09:02,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [108.18048, 381.47882, 105.41677, 102.69925, 525.8459, 95.22388, 431.29892, 572.05334, 366.24564, 598.668]
2025-09-16 18:09:02,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [21.0, 84.0, 21.0, 20.0, 92.0, 19.0, 81.0, 103.0, 68.0, 109.0]
2025-09-16 18:09:02,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1251 [DEBUG]: Training session finished
