2025-09-16 12:39:28,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.100-delay_15
2025-09-16 12:39:28,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.100-delay_15
2025-09-16 12:39:28,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'15': <latency_env.delayed_mdp.ConstantDelay object at 0x1491365b0a10>}
2025-09-16 12:39:28,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 12:39:28,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 12:39:28,833 baseline-bpql-noisepromille100-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=631, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 12:39:28,834 baseline-bpql-noisepromille100-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 12:39:30,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 12:39:30,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 12:41:18,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:41:19,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 346.06085 ± 98.767
2025-09-16 12:41:19,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [552.31976, 343.3134, 420.3816, 386.06277, 303.70728, 322.5308, 136.15146, 355.8943, 316.7573, 323.48965]
2025-09-16 12:41:19,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [107.0, 63.0, 79.0, 77.0, 58.0, 70.0, 26.0, 68.0, 61.0, 68.0]
2025-09-16 12:41:19,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (346.06) for latency 15
2025-09-16 12:41:19,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 59 minutes, 24 seconds)
2025-09-16 12:43:16,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:43:17,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 297.09360 ± 149.225
2025-09-16 12:43:17,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [140.10457, 134.68597, 297.52777, 582.8953, 129.8284, 479.27032, 402.54343, 294.04208, 339.10498, 170.93301]
2025-09-16 12:43:17,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 26.0, 57.0, 117.0, 25.0, 87.0, 75.0, 62.0, 64.0, 33.0]
2025-09-16 12:43:17,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 5 minutes, 6 seconds)
2025-09-16 12:45:13,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:45:14,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 352.11032 ± 151.706
2025-09-16 12:45:14,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [502.55756, 226.55919, 342.8521, 303.84384, 117.637726, 172.74837, 450.04068, 643.92456, 434.4695, 326.4698]
2025-09-16 12:45:14,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 46.0, 64.0, 62.0, 23.0, 34.0, 94.0, 124.0, 91.0, 67.0]
2025-09-16 12:45:14,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (352.11) for latency 15
2025-09-16 12:45:14,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 5 minutes, 34 seconds)
2025-09-16 12:47:12,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:47:13,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 405.43683 ± 154.047
2025-09-16 12:47:13,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [696.9903, 463.93246, 170.499, 342.73596, 433.3517, 469.9679, 427.20557, 146.3725, 371.1965, 532.1162]
2025-09-16 12:47:13,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [144.0, 85.0, 33.0, 65.0, 84.0, 88.0, 81.0, 28.0, 69.0, 102.0]
2025-09-16 12:47:13,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (405.44) for latency 15
2025-09-16 12:47:13,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 5 minutes, 7 seconds)
2025-09-16 12:49:11,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:49:12,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 325.78168 ± 105.906
2025-09-16 12:49:12,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [350.47055, 499.34515, 210.18379, 325.0783, 340.44098, 357.3698, 188.6303, 411.24597, 150.03079, 425.02148]
2025-09-16 12:49:12,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 105.0, 43.0, 73.0, 66.0, 75.0, 36.0, 82.0, 29.0, 81.0]
2025-09-16 12:49:12,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 4 minutes, 9 seconds)
2025-09-16 12:51:08,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:51:09,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 264.38315 ± 124.294
2025-09-16 12:51:09,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [423.5258, 112.93165, 390.83005, 371.8746, 227.60364, 152.64983, 145.65158, 436.37976, 263.48322, 118.90147]
2025-09-16 12:51:09,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 22.0, 76.0, 68.0, 46.0, 29.0, 28.0, 95.0, 52.0, 23.0]
2025-09-16 12:51:09,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 5 minutes)
2025-09-16 12:53:08,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:53:09,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 368.73770 ± 139.274
2025-09-16 12:53:09,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [319.20108, 331.28586, 500.01462, 433.5412, 481.34134, 388.43283, 611.02783, 150.6352, 152.05809, 319.8388]
2025-09-16 12:53:09,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [58.0, 61.0, 112.0, 82.0, 90.0, 76.0, 111.0, 29.0, 29.0, 63.0]
2025-09-16 12:53:09,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 3 minutes, 36 seconds)
2025-09-16 12:55:06,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:55:07,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 334.50552 ± 141.507
2025-09-16 12:55:07,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [499.89548, 157.39842, 489.15292, 134.90782, 141.0592, 400.30142, 296.45602, 319.55893, 508.77036, 397.55502]
2025-09-16 12:55:07,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 31.0, 93.0, 26.0, 28.0, 87.0, 57.0, 62.0, 97.0, 84.0]
2025-09-16 12:55:07,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 1 minute, 47 seconds)
2025-09-16 12:57:04,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:57:05,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 414.62344 ± 184.858
2025-09-16 12:57:05,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [557.6685, 113.86058, 670.3881, 614.9158, 129.8432, 343.78284, 555.3648, 453.13046, 425.8354, 281.4446]
2025-09-16 12:57:05,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [106.0, 22.0, 133.0, 120.0, 25.0, 67.0, 104.0, 94.0, 80.0, 52.0]
2025-09-16 12:57:05,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (414.62) for latency 15
2025-09-16 12:57:05,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 59 minutes, 44 seconds)
2025-09-16 12:59:04,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:59:05,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 359.52109 ± 119.542
2025-09-16 12:59:05,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [322.9066, 505.45786, 399.32037, 356.93674, 322.2255, 547.2033, 199.50362, 437.49448, 368.88385, 135.27844]
2025-09-16 12:59:05,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [61.0, 98.0, 74.0, 76.0, 72.0, 105.0, 38.0, 84.0, 80.0, 26.0]
2025-09-16 12:59:05,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 57 minutes, 56 seconds)
2025-09-16 13:01:02,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:01:03,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 446.36346 ± 149.633
2025-09-16 13:01:03,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [391.8995, 336.8437, 118.46522, 420.81558, 482.50653, 448.9925, 680.0533, 508.85968, 427.39017, 647.8087]
2025-09-16 13:01:03,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 63.0, 23.0, 77.0, 89.0, 98.0, 142.0, 110.0, 93.0, 123.0]
2025-09-16 13:01:03,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (446.36) for latency 15
2025-09-16 13:01:03,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 56 minutes, 17 seconds)
2025-09-16 13:03:02,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:03:03,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 418.27692 ± 61.244
2025-09-16 13:03:03,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [360.7743, 499.61548, 352.20593, 437.4106, 510.6145, 456.6896, 470.28214, 341.1039, 357.15833, 396.91394]
2025-09-16 13:03:03,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [67.0, 94.0, 62.0, 81.0, 110.0, 97.0, 97.0, 71.0, 69.0, 72.0]
2025-09-16 13:03:03,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 54 minutes, 11 seconds)
2025-09-16 13:05:00,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:05:01,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 368.03354 ± 155.963
2025-09-16 13:05:01,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [446.42703, 101.54394, 395.99417, 644.92584, 252.17609, 359.04272, 479.47015, 437.7879, 427.87692, 135.09079]
2025-09-16 13:05:01,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 20.0, 82.0, 128.0, 48.0, 66.0, 93.0, 84.0, 81.0, 26.0]
2025-09-16 13:05:01,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 52 minutes, 16 seconds)
2025-09-16 13:06:59,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:07:00,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 331.30148 ± 163.053
2025-09-16 13:07:00,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [161.9082, 551.6771, 365.2391, 124.23896, 395.3821, 151.0015, 437.6979, 487.9615, 129.48648, 508.4219]
2025-09-16 13:07:00,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 117.0, 66.0, 24.0, 75.0, 29.0, 82.0, 95.0, 25.0, 99.0]
2025-09-16 13:07:00,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 50 minutes, 26 seconds)
2025-09-16 13:08:58,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:08:59,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 305.51562 ± 142.004
2025-09-16 13:08:59,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [409.37704, 458.5724, 135.17665, 107.96311, 208.62488, 360.3282, 378.087, 462.97507, 431.3117, 102.73991]
2025-09-16 13:08:59,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 87.0, 26.0, 21.0, 40.0, 73.0, 69.0, 98.0, 81.0, 20.0]
2025-09-16 13:08:59,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 48 minutes, 26 seconds)
2025-09-16 13:10:58,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:10:59,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 379.05850 ± 112.441
2025-09-16 13:10:59,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [370.2244, 360.45575, 157.98715, 442.696, 471.5823, 347.8472, 314.78397, 331.40762, 375.5156, 618.0849]
2025-09-16 13:10:59,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 67.0, 30.0, 83.0, 89.0, 62.0, 61.0, 65.0, 72.0, 115.0]
2025-09-16 13:10:59,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 46 minutes, 39 seconds)
2025-09-16 13:12:56,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:12:58,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 383.44165 ± 193.543
2025-09-16 13:12:58,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [746.5805, 493.51984, 139.9322, 577.90356, 432.2983, 133.8257, 331.54486, 463.15707, 376.54938, 139.10512]
2025-09-16 13:12:58,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [145.0, 92.0, 27.0, 104.0, 83.0, 26.0, 63.0, 87.0, 71.0, 27.0]
2025-09-16 13:12:58,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 44 minutes, 32 seconds)
2025-09-16 13:14:55,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:14:56,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 442.30341 ± 146.875
2025-09-16 13:14:56,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [834.5652, 355.9633, 408.48984, 260.00092, 520.48486, 375.15915, 411.60632, 483.65695, 370.26254, 402.84515]
2025-09-16 13:14:56,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [169.0, 68.0, 81.0, 51.0, 96.0, 68.0, 78.0, 90.0, 68.0, 73.0]
2025-09-16 13:14:56,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 42 minutes, 35 seconds)
2025-09-16 13:16:54,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:16:55,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 395.91208 ± 127.482
2025-09-16 13:16:55,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [534.1362, 178.88695, 429.0323, 408.09833, 473.21494, 458.29037, 352.59775, 151.73195, 546.82825, 426.30405]
2025-09-16 13:16:55,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [112.0, 34.0, 94.0, 76.0, 102.0, 86.0, 67.0, 29.0, 103.0, 89.0]
2025-09-16 13:16:55,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 40 minutes, 47 seconds)
2025-09-16 13:18:53,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:18:54,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 405.10837 ± 109.943
2025-09-16 13:18:54,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [426.5038, 625.3916, 472.8016, 469.076, 375.02264, 208.84062, 348.83533, 294.779, 479.17896, 350.65427]
2025-09-16 13:18:54,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 115.0, 91.0, 87.0, 69.0, 40.0, 65.0, 55.0, 90.0, 65.0]
2025-09-16 13:18:54,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 38 minutes, 36 seconds)
2025-09-16 13:20:52,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:20:53,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 419.24664 ± 71.063
2025-09-16 13:20:53,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [336.55847, 439.4668, 559.9584, 443.13153, 358.86896, 493.05008, 374.75998, 316.58054, 415.39542, 454.69592]
2025-09-16 13:20:53,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [64.0, 79.0, 106.0, 85.0, 66.0, 91.0, 68.0, 62.0, 84.0, 85.0]
2025-09-16 13:20:53,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 36 minutes, 37 seconds)
2025-09-16 13:22:51,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:22:52,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 391.13974 ± 242.888
2025-09-16 13:22:52,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [119.45, 510.2667, 426.478, 161.75604, 460.5264, 139.23837, 140.34927, 629.9268, 899.9052, 423.50037]
2025-09-16 13:22:52,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 104.0, 81.0, 31.0, 87.0, 27.0, 27.0, 118.0, 186.0, 90.0]
2025-09-16 13:22:52,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 34 minutes, 36 seconds)
2025-09-16 13:24:50,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:24:52,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 492.12827 ± 128.516
2025-09-16 13:24:52,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [596.1022, 833.6411, 489.03748, 444.6976, 484.99835, 425.84412, 383.4772, 444.36612, 450.1566, 368.962]
2025-09-16 13:24:52,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 158.0, 87.0, 82.0, 87.0, 79.0, 72.0, 83.0, 95.0, 67.0]
2025-09-16 13:24:52,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (492.13) for latency 15
2025-09-16 13:24:52,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 32 minutes, 52 seconds)
2025-09-16 13:26:49,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:26:51,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 428.45963 ± 115.153
2025-09-16 13:26:51,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [379.11496, 437.95013, 529.10754, 526.7268, 547.06647, 332.12622, 391.55014, 155.32837, 529.5362, 456.08957]
2025-09-16 13:26:51,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [70.0, 83.0, 105.0, 114.0, 101.0, 70.0, 74.0, 30.0, 101.0, 88.0]
2025-09-16 13:26:51,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 30 minutes, 44 seconds)
2025-09-16 13:28:48,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:28:50,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 439.34106 ± 96.559
2025-09-16 13:28:50,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [535.7336, 358.86807, 465.9508, 318.3282, 396.2141, 415.17334, 311.78207, 499.23987, 453.76642, 638.35406]
2025-09-16 13:28:50,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [118.0, 78.0, 85.0, 61.0, 81.0, 84.0, 59.0, 92.0, 86.0, 131.0]
2025-09-16 13:28:50,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 28 minutes, 56 seconds)
2025-09-16 13:30:48,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:30:49,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 438.83856 ± 112.573
2025-09-16 13:30:49,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [473.97922, 536.10614, 435.78625, 411.8542, 463.2652, 427.97116, 533.02606, 481.04523, 122.81935, 502.53314]
2025-09-16 13:30:49,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [96.0, 118.0, 79.0, 76.0, 86.0, 79.0, 99.0, 95.0, 24.0, 97.0]
2025-09-16 13:30:49,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 26 minutes, 58 seconds)
2025-09-16 13:32:47,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:32:48,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 409.67535 ± 146.585
2025-09-16 13:32:48,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [323.4521, 376.9682, 150.10818, 409.3089, 324.7452, 372.77112, 740.584, 391.42056, 465.5966, 541.79846]
2025-09-16 13:32:48,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [61.0, 68.0, 29.0, 78.0, 63.0, 82.0, 139.0, 74.0, 86.0, 100.0]
2025-09-16 13:32:48,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 25 minutes, 4 seconds)
2025-09-16 13:34:46,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:34:48,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 482.92145 ± 97.302
2025-09-16 13:34:48,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [411.26666, 486.56165, 424.68713, 541.95593, 363.33395, 396.30557, 660.1807, 558.43134, 384.8167, 601.6742]
2025-09-16 13:34:48,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 90.0, 80.0, 101.0, 71.0, 82.0, 129.0, 115.0, 78.0, 116.0]
2025-09-16 13:34:48,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 23 minutes)
2025-09-16 13:36:46,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:36:47,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 389.85803 ± 153.570
2025-09-16 13:36:47,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [609.237, 129.59921, 410.09912, 550.8019, 450.59286, 361.0255, 118.82765, 372.30838, 382.447, 513.6416]
2025-09-16 13:36:47,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [128.0, 25.0, 75.0, 106.0, 83.0, 81.0, 23.0, 68.0, 73.0, 102.0]
2025-09-16 13:36:47,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 21 minutes, 12 seconds)
2025-09-16 13:38:46,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:38:47,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 487.16635 ± 162.052
2025-09-16 13:38:47,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [501.48447, 388.84464, 361.58212, 400.01776, 392.07907, 469.58215, 718.7364, 408.0944, 867.75446, 363.48828]
2025-09-16 13:38:47,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 76.0, 68.0, 74.0, 81.0, 88.0, 151.0, 76.0, 165.0, 67.0]
2025-09-16 13:38:47,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 19 minutes, 21 seconds)
2025-09-16 13:40:45,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:40:46,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 449.13681 ± 161.556
2025-09-16 13:40:46,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [390.72916, 422.3841, 816.3868, 483.57397, 501.89117, 382.63095, 128.82364, 369.85974, 493.16577, 501.92303]
2025-09-16 13:40:46,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 77.0, 164.0, 88.0, 95.0, 70.0, 25.0, 77.0, 92.0, 92.0]
2025-09-16 13:40:46,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 17 minutes, 18 seconds)
2025-09-16 13:42:45,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:42:46,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 425.47821 ± 140.533
2025-09-16 13:42:46,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [449.37497, 677.6856, 386.16473, 307.5494, 435.3713, 381.03568, 459.40686, 455.33728, 578.1221, 124.734566]
2025-09-16 13:42:46,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 131.0, 72.0, 62.0, 80.0, 76.0, 85.0, 85.0, 108.0, 24.0]
2025-09-16 13:42:46,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 15 minutes, 25 seconds)
2025-09-16 13:44:44,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:44:45,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 405.80661 ± 210.503
2025-09-16 13:44:45,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [393.50784, 155.5639, 486.3321, 557.4022, 480.02902, 188.42624, 146.59357, 883.6066, 411.72772, 354.87674]
2025-09-16 13:44:45,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 30.0, 99.0, 119.0, 90.0, 36.0, 28.0, 176.0, 77.0, 73.0]
2025-09-16 13:44:45,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 13 minutes, 25 seconds)
2025-09-16 13:46:44,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:46:45,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 459.54413 ± 128.452
2025-09-16 13:46:45,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [463.3268, 429.07736, 414.45065, 393.2802, 751.76904, 334.1855, 593.96674, 545.32385, 341.63632, 328.42456]
2025-09-16 13:46:45,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 78.0, 74.0, 73.0, 146.0, 61.0, 113.0, 102.0, 64.0, 62.0]
2025-09-16 13:46:45,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 11 minutes, 29 seconds)
2025-09-16 13:48:42,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:48:44,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 530.18677 ± 125.107
2025-09-16 13:48:44,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [605.74005, 811.5807, 493.78796, 679.0118, 525.13214, 453.39716, 393.08182, 423.77997, 487.67657, 428.67923]
2025-09-16 13:48:44,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [111.0, 156.0, 90.0, 143.0, 96.0, 83.0, 71.0, 77.0, 91.0, 81.0]
2025-09-16 13:48:44,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (530.19) for latency 15
2025-09-16 13:48:44,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 9 minutes, 18 seconds)
2025-09-16 13:50:43,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:50:44,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 462.42026 ± 252.596
2025-09-16 13:50:44,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [682.6652, 475.55334, 557.08856, 462.40335, 386.5039, 1015.36664, 109.05607, 113.487465, 318.8607, 503.21747]
2025-09-16 13:50:44,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [143.0, 85.0, 102.0, 91.0, 72.0, 194.0, 21.0, 22.0, 63.0, 93.0]
2025-09-16 13:50:44,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 7 minutes, 35 seconds)
2025-09-16 13:52:42,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:52:43,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 460.99902 ± 124.148
2025-09-16 13:52:43,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [400.40738, 487.41568, 558.03174, 565.21606, 479.84424, 539.70636, 114.238266, 505.28513, 477.50006, 482.34512]
2025-09-16 13:52:43,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 92.0, 119.0, 111.0, 92.0, 104.0, 22.0, 92.0, 99.0, 89.0]
2025-09-16 13:52:43,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 5 minutes, 28 seconds)
2025-09-16 13:54:42,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:54:43,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 442.34882 ± 118.781
2025-09-16 13:54:43,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [464.8039, 511.28024, 552.58795, 517.2618, 505.16513, 434.7975, 385.09293, 414.2325, 119.59342, 518.6728]
2025-09-16 13:54:43,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 111.0, 103.0, 96.0, 93.0, 96.0, 69.0, 76.0, 23.0, 95.0]
2025-09-16 13:54:43,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 3 minutes, 39 seconds)
2025-09-16 13:56:42,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:56:43,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 420.06427 ± 151.026
2025-09-16 13:56:43,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [553.3965, 379.77896, 357.62888, 335.45142, 387.9481, 326.07495, 162.41672, 432.64844, 508.4695, 756.82904]
2025-09-16 13:56:43,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 70.0, 64.0, 64.0, 72.0, 60.0, 31.0, 79.0, 97.0, 145.0]
2025-09-16 13:56:43,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 1 minute, 38 seconds)
2025-09-16 13:58:41,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 13:58:42,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 452.08414 ± 133.104
2025-09-16 13:58:42,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [598.38055, 527.85156, 577.5458, 487.94754, 465.85745, 120.12775, 473.29916, 464.43494, 495.24777, 310.14883]
2025-09-16 13:58:42,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [110.0, 110.0, 111.0, 89.0, 85.0, 23.0, 100.0, 87.0, 94.0, 61.0]
2025-09-16 13:58:42,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 59 minutes, 36 seconds)
2025-09-16 14:00:41,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:00:42,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 460.52155 ± 76.135
2025-09-16 14:00:42,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [428.0485, 426.1608, 669.15735, 468.76157, 420.96545, 406.91708, 492.14957, 448.65536, 379.67734, 464.72205]
2025-09-16 14:00:42,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 78.0, 125.0, 90.0, 77.0, 73.0, 90.0, 92.0, 75.0, 99.0]
2025-09-16 14:00:42,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 57 minutes, 30 seconds)
2025-09-16 14:02:40,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:02:42,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 501.03436 ± 210.553
2025-09-16 14:02:42,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [632.1649, 450.18018, 536.1278, 118.94817, 387.88205, 996.3828, 393.9709, 487.9828, 446.32367, 560.38043]
2025-09-16 14:02:42,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [122.0, 84.0, 100.0, 23.0, 72.0, 198.0, 72.0, 105.0, 87.0, 117.0]
2025-09-16 14:02:42,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 55 minutes, 38 seconds)
2025-09-16 14:04:39,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:04:41,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 581.62573 ± 108.218
2025-09-16 14:04:41,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [521.88684, 584.5437, 477.73868, 632.5812, 496.62448, 690.9669, 819.976, 426.3296, 586.9348, 578.6754]
2025-09-16 14:04:41,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 108.0, 99.0, 119.0, 95.0, 143.0, 153.0, 93.0, 124.0, 109.0]
2025-09-16 14:04:41,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (581.63) for latency 15
2025-09-16 14:04:41,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 53 minutes, 33 seconds)
2025-09-16 14:06:39,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:06:41,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 468.25308 ± 188.477
2025-09-16 14:06:41,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [652.2857, 131.52304, 709.21027, 566.16425, 623.316, 460.2833, 440.58524, 529.9862, 135.95676, 433.2209]
2025-09-16 14:06:41,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [122.0, 25.0, 133.0, 102.0, 115.0, 98.0, 81.0, 97.0, 26.0, 96.0]
2025-09-16 14:06:41,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 51 minutes, 30 seconds)
2025-09-16 14:08:39,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:08:41,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 502.76157 ± 77.815
2025-09-16 14:08:41,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [610.552, 424.00406, 511.9361, 349.17935, 547.68146, 558.2008, 567.5728, 525.8687, 523.6476, 408.973]
2025-09-16 14:08:41,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [114.0, 77.0, 98.0, 63.0, 119.0, 107.0, 112.0, 96.0, 96.0, 80.0]
2025-09-16 14:08:41,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 49 minutes, 45 seconds)
2025-09-16 14:10:40,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:10:41,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 486.85214 ± 154.805
2025-09-16 14:10:41,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [503.4296, 685.43134, 633.0693, 551.0272, 582.8996, 605.5059, 390.1318, 390.0136, 391.24646, 135.76645]
2025-09-16 14:10:41,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 133.0, 118.0, 102.0, 106.0, 118.0, 75.0, 71.0, 74.0, 26.0]
2025-09-16 14:10:41,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 47 minutes, 54 seconds)
2025-09-16 14:12:38,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:12:40,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 533.58667 ± 169.833
2025-09-16 14:12:40,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [569.8535, 546.4746, 566.26624, 588.54974, 649.17035, 464.07867, 463.073, 585.0251, 102.66247, 800.71277]
2025-09-16 14:12:40,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [105.0, 98.0, 117.0, 108.0, 125.0, 95.0, 83.0, 117.0, 20.0, 150.0]
2025-09-16 14:12:40,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 45 minutes, 41 seconds)
2025-09-16 14:14:39,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:14:41,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 464.29190 ± 128.490
2025-09-16 14:14:41,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [490.8291, 556.10626, 497.50348, 485.668, 394.31677, 478.22592, 150.37859, 394.14465, 669.1536, 526.5921]
2025-09-16 14:14:41,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [105.0, 106.0, 90.0, 101.0, 73.0, 85.0, 29.0, 72.0, 139.0, 97.0]
2025-09-16 14:14:41,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 43 minutes, 55 seconds)
2025-09-16 14:16:39,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:16:40,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 507.35483 ± 95.469
2025-09-16 14:16:40,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [629.7135, 357.75757, 588.20557, 419.18634, 586.30725, 513.9651, 403.47632, 575.73663, 593.45734, 405.7425]
2025-09-16 14:16:40,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [133.0, 74.0, 110.0, 79.0, 122.0, 98.0, 75.0, 115.0, 109.0, 85.0]
2025-09-16 14:16:40,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 41 minutes, 57 seconds)
2025-09-16 14:18:38,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:18:40,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 595.71521 ± 116.537
2025-09-16 14:18:40,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [529.1156, 485.54605, 828.92365, 686.9235, 709.0281, 466.06424, 601.4846, 640.2563, 569.60175, 440.2088]
2025-09-16 14:18:40,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [101.0, 88.0, 164.0, 129.0, 139.0, 98.0, 109.0, 121.0, 108.0, 95.0]
2025-09-16 14:18:40,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (595.72) for latency 15
2025-09-16 14:18:40,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 39 minutes, 54 seconds)
2025-09-16 14:20:39,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:20:40,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 641.01379 ± 238.618
2025-09-16 14:20:40,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [394.21924, 1276.4786, 458.55194, 515.1871, 541.18317, 546.06836, 796.49536, 684.61584, 539.9466, 657.39154]
2025-09-16 14:20:40,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [72.0, 235.0, 83.0, 93.0, 96.0, 100.0, 151.0, 125.0, 110.0, 124.0]
2025-09-16 14:20:40,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (641.01) for latency 15
2025-09-16 14:20:40,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 37 minutes, 51 seconds)
2025-09-16 14:22:39,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:22:40,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 449.19385 ± 129.282
2025-09-16 14:22:40,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [487.09637, 473.17737, 608.01013, 500.61392, 407.57312, 342.20892, 125.4153, 576.72217, 472.5318, 498.58942]
2025-09-16 14:22:40,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 87.0, 118.0, 91.0, 74.0, 64.0, 24.0, 106.0, 87.0, 92.0]
2025-09-16 14:22:40,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 36 minutes, 5 seconds)
2025-09-16 14:24:40,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:24:42,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 587.89374 ± 102.424
2025-09-16 14:24:42,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [676.04205, 415.65585, 603.16, 584.5281, 387.95535, 707.6307, 553.91455, 644.6563, 644.3623, 661.0329]
2025-09-16 14:24:42,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [131.0, 75.0, 111.0, 107.0, 74.0, 143.0, 110.0, 119.0, 123.0, 142.0]
2025-09-16 14:24:42,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 34 minutes, 13 seconds)
2025-09-16 14:26:39,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:26:41,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 467.41025 ± 133.054
2025-09-16 14:26:41,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [659.6426, 427.60175, 393.4657, 494.94858, 468.56738, 570.6469, 425.12164, 598.8629, 487.42996, 147.81485]
2025-09-16 14:26:41,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [119.0, 91.0, 84.0, 89.0, 83.0, 118.0, 78.0, 109.0, 103.0, 28.0]
2025-09-16 14:26:41,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 32 minutes, 5 seconds)
2025-09-16 14:28:39,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:28:41,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 515.07159 ± 151.251
2025-09-16 14:28:41,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [441.1003, 596.7006, 594.757, 114.51417, 512.3535, 678.48627, 461.45718, 631.6186, 596.2964, 523.4315]
2025-09-16 14:28:41,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 108.0, 112.0, 22.0, 95.0, 128.0, 83.0, 134.0, 107.0, 95.0]
2025-09-16 14:28:41,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 30 minutes, 6 seconds)
2025-09-16 14:30:39,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:30:41,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 542.81854 ± 56.241
2025-09-16 14:30:41,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [597.48944, 523.48145, 571.42615, 659.1934, 529.0132, 432.31372, 550.05615, 522.7742, 513.58093, 528.85626]
2025-09-16 14:30:41,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [117.0, 102.0, 117.0, 129.0, 114.0, 80.0, 98.0, 105.0, 93.0, 108.0]
2025-09-16 14:30:41,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 28 minutes, 2 seconds)
2025-09-16 14:32:40,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:32:41,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 494.86484 ± 150.046
2025-09-16 14:32:41,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [762.114, 465.7516, 580.6775, 428.45825, 568.6356, 135.66034, 478.4653, 556.12054, 521.44226, 451.32263]
2025-09-16 14:32:41,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [149.0, 86.0, 117.0, 78.0, 114.0, 26.0, 87.0, 105.0, 99.0, 84.0]
2025-09-16 14:32:41,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 26 minutes, 8 seconds)
2025-09-16 14:34:41,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:34:42,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 569.06732 ± 129.166
2025-09-16 14:34:42,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [415.0455, 900.9805, 572.3245, 492.97107, 560.0459, 443.158, 498.63193, 561.5117, 609.24725, 636.75684]
2025-09-16 14:34:42,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 173.0, 104.0, 88.0, 101.0, 92.0, 101.0, 104.0, 130.0, 122.0]
2025-09-16 14:34:42,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 24 minutes, 3 seconds)
2025-09-16 14:36:41,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:36:42,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 524.48279 ± 184.242
2025-09-16 14:36:42,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [515.8841, 564.39294, 621.1845, 546.8975, 125.10478, 552.62836, 347.1928, 580.2728, 502.62692, 888.6429]
2025-09-16 14:36:42,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 109.0, 120.0, 104.0, 24.0, 101.0, 64.0, 113.0, 91.0, 168.0]
2025-09-16 14:36:42,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 22 minutes, 12 seconds)
2025-09-16 14:38:40,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:38:42,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 590.43683 ± 142.404
2025-09-16 14:38:42,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [594.6756, 533.027, 717.7883, 518.6015, 511.8491, 475.79105, 567.2764, 849.5621, 780.12164, 355.67517]
2025-09-16 14:38:42,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [111.0, 108.0, 130.0, 90.0, 94.0, 102.0, 124.0, 159.0, 154.0, 69.0]
2025-09-16 14:38:42,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 20 minutes, 9 seconds)
2025-09-16 14:40:41,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:40:43,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 606.67914 ± 190.103
2025-09-16 14:40:43,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [836.9471, 602.1503, 179.88771, 578.792, 607.41754, 581.31573, 507.5116, 503.44037, 847.4768, 821.85187]
2025-09-16 14:40:43,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [159.0, 111.0, 34.0, 106.0, 129.0, 101.0, 100.0, 93.0, 161.0, 148.0]
2025-09-16 14:40:43,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 18 minutes, 18 seconds)
2025-09-16 14:42:42,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:42:43,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 561.03564 ± 272.956
2025-09-16 14:42:43,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [804.58075, 587.1431, 913.0206, 533.3947, 597.6572, 119.54524, 668.18884, 108.45287, 899.5837, 378.78967]
2025-09-16 14:42:43,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [161.0, 108.0, 161.0, 115.0, 122.0, 23.0, 124.0, 21.0, 167.0, 73.0]
2025-09-16 14:42:43,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 16 minutes, 14 seconds)
2025-09-16 14:44:43,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:44:44,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 615.40863 ± 156.791
2025-09-16 14:44:44,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [480.35953, 570.21954, 498.20737, 733.7861, 677.4796, 486.30856, 462.59415, 537.2098, 984.353, 723.5685]
2025-09-16 14:44:44,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [91.0, 103.0, 92.0, 130.0, 124.0, 84.0, 82.0, 96.0, 189.0, 134.0]
2025-09-16 14:44:44,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 14 minutes, 14 seconds)
2025-09-16 14:46:43,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:46:45,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 698.30066 ± 231.311
2025-09-16 14:46:45,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [484.59647, 1090.5704, 407.5829, 620.24677, 942.57043, 355.2233, 638.4304, 763.4813, 741.54645, 938.7582]
2025-09-16 14:46:45,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 207.0, 75.0, 114.0, 192.0, 64.0, 112.0, 144.0, 153.0, 171.0]
2025-09-16 14:46:45,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (698.30) for latency 15
2025-09-16 14:46:45,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 12 minutes, 17 seconds)
2025-09-16 14:48:45,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:48:47,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 645.71625 ± 272.065
2025-09-16 14:48:47,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [636.00037, 407.35553, 381.02835, 967.3453, 752.8134, 427.19138, 571.29254, 1286.0123, 500.41296, 527.71045]
2025-09-16 14:48:47,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [110.0, 75.0, 77.0, 185.0, 151.0, 77.0, 124.0, 249.0, 90.0, 95.0]
2025-09-16 14:48:47,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 10 minutes, 35 seconds)
2025-09-16 14:50:45,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:50:48,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 704.51385 ± 286.465
2025-09-16 14:50:48,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [717.8025, 1123.6981, 627.84, 508.74777, 900.52466, 129.15384, 621.0451, 634.12787, 1164.5216, 617.6775]
2025-09-16 14:50:48,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [131.0, 221.0, 120.0, 107.0, 168.0, 25.0, 121.0, 128.0, 224.0, 125.0]
2025-09-16 14:50:48,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (704.51) for latency 15
2025-09-16 14:50:48,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 8 minutes, 30 seconds)
2025-09-16 14:52:46,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:52:48,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 573.98401 ± 228.436
2025-09-16 14:52:48,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [420.664, 552.5968, 562.2429, 542.318, 562.28326, 979.77563, 114.2696, 515.0945, 916.207, 574.3886]
2025-09-16 14:52:48,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 111.0, 112.0, 106.0, 106.0, 195.0, 22.0, 91.0, 163.0, 112.0]
2025-09-16 14:52:48,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 6 minutes, 31 seconds)
2025-09-16 14:54:47,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:54:49,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 674.26587 ± 178.291
2025-09-16 14:54:49,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [523.3657, 816.17957, 525.82385, 1103.2977, 716.95776, 554.6904, 800.65216, 576.90497, 595.9879, 528.79865]
2025-09-16 14:54:49,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 149.0, 95.0, 207.0, 137.0, 103.0, 169.0, 105.0, 120.0, 99.0]
2025-09-16 14:54:49,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 4 minutes, 28 seconds)
2025-09-16 14:56:50,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:56:52,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 646.30444 ± 176.614
2025-09-16 14:56:52,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [765.635, 785.30597, 511.38797, 818.1744, 912.86633, 451.95633, 674.64056, 713.4086, 346.51566, 483.154]
2025-09-16 14:56:52,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [138.0, 146.0, 90.0, 146.0, 165.0, 88.0, 121.0, 128.0, 68.0, 88.0]
2025-09-16 14:56:52,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 2 minutes, 42 seconds)
2025-09-16 14:58:49,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 14:58:51,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 611.98529 ± 164.709
2025-09-16 14:58:51,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [672.0474, 363.06415, 665.2752, 579.6689, 745.53107, 914.7175, 766.61804, 460.4179, 541.93036, 410.58292]
2025-09-16 14:58:51,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [125.0, 78.0, 127.0, 124.0, 138.0, 174.0, 142.0, 88.0, 106.0, 86.0]
2025-09-16 14:58:51,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 23 seconds)
2025-09-16 15:00:50,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:00:52,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 738.40356 ± 236.649
2025-09-16 15:00:52,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [612.33453, 614.41437, 598.02704, 796.95844, 412.4782, 1109.2478, 593.6854, 863.94055, 592.88605, 1190.0629]
2025-09-16 15:00:52,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 120.0, 106.0, 163.0, 76.0, 216.0, 106.0, 179.0, 115.0, 253.0]
2025-09-16 15:00:52,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (738.40) for latency 15
2025-09-16 15:00:52,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 58 minutes, 28 seconds)
2025-09-16 15:02:51,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:02:53,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 633.39417 ± 259.298
2025-09-16 15:02:53,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [674.764, 1143.1902, 499.18732, 560.1549, 500.57822, 540.8213, 444.5044, 1066.4725, 642.1605, 262.108]
2025-09-16 15:02:53,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [120.0, 211.0, 90.0, 107.0, 91.0, 111.0, 81.0, 197.0, 137.0, 50.0]
2025-09-16 15:02:53,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 56 minutes, 28 seconds)
2025-09-16 15:04:54,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:04:56,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 772.77789 ± 210.176
2025-09-16 15:04:56,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [931.5286, 733.9393, 687.9985, 669.87933, 1258.108, 458.84595, 704.7188, 957.0893, 649.09406, 676.57654]
2025-09-16 15:04:56,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [172.0, 149.0, 126.0, 128.0, 241.0, 86.0, 125.0, 189.0, 123.0, 137.0]
2025-09-16 15:04:56,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (772.78) for latency 15
2025-09-16 15:04:56,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 54 minutes, 37 seconds)
2025-09-16 15:06:55,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:06:57,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 704.89502 ± 381.847
2025-09-16 15:06:57,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [147.11725, 585.4415, 1588.3293, 440.268, 706.16266, 1046.9062, 647.86487, 886.0549, 349.53464, 651.27124]
2025-09-16 15:06:57,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 101.0, 297.0, 77.0, 126.0, 189.0, 127.0, 163.0, 63.0, 118.0]
2025-09-16 15:06:57,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 52 minutes, 28 seconds)
2025-09-16 15:08:56,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:08:59,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 903.34753 ± 321.194
2025-09-16 15:08:59,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [889.3612, 1066.8685, 1252.552, 583.05524, 282.22067, 635.77075, 1381.6819, 1099.726, 755.5965, 1086.6425]
2025-09-16 15:08:59,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [184.0, 200.0, 256.0, 105.0, 51.0, 125.0, 274.0, 218.0, 136.0, 212.0]
2025-09-16 15:08:59,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (903.35) for latency 15
2025-09-16 15:08:59,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 50 minutes, 38 seconds)
2025-09-16 15:10:58,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:11:00,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 761.89221 ± 198.880
2025-09-16 15:11:00,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [991.52136, 567.6964, 576.3994, 791.7928, 967.3328, 706.72815, 1162.1772, 598.52234, 659.7696, 596.9819]
2025-09-16 15:11:00,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [186.0, 105.0, 102.0, 174.0, 182.0, 129.0, 230.0, 104.0, 123.0, 127.0]
2025-09-16 15:11:00,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 48 minutes, 37 seconds)
2025-09-16 15:13:01,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:13:03,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 771.55096 ± 297.942
2025-09-16 15:13:03,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1413.5878, 728.8782, 839.69836, 432.14117, 465.44873, 650.2805, 1108.5558, 965.2021, 530.4364, 581.281]
2025-09-16 15:13:03,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [251.0, 129.0, 152.0, 78.0, 83.0, 126.0, 195.0, 196.0, 96.0, 102.0]
2025-09-16 15:13:03,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 46 minutes, 44 seconds)
2025-09-16 15:15:01,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:15:03,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 868.68408 ± 346.463
2025-09-16 15:15:03,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [902.9564, 585.01697, 697.7629, 1647.3358, 350.65298, 824.40924, 669.991, 747.9378, 1025.8508, 1234.9266]
2025-09-16 15:15:03,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [163.0, 106.0, 127.0, 307.0, 66.0, 159.0, 122.0, 138.0, 191.0, 237.0]
2025-09-16 15:15:03,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 44 minutes, 32 seconds)
2025-09-16 15:17:03,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:17:05,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 879.85431 ± 337.565
2025-09-16 15:17:05,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1387.6228, 374.9424, 490.93973, 1003.86957, 1408.8434, 826.165, 1060.9158, 951.7716, 522.74963, 770.72327]
2025-09-16 15:17:05,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [266.0, 78.0, 92.0, 193.0, 285.0, 150.0, 200.0, 177.0, 94.0, 161.0]
2025-09-16 15:17:05,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 42 minutes, 33 seconds)
2025-09-16 15:19:06,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:19:07,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 579.65363 ± 413.341
2025-09-16 15:19:07,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1064.5975, 290.68646, 125.05651, 527.0421, 1160.1194, 635.08484, 95.77032, 1208.0497, 120.004585, 570.12506]
2025-09-16 15:19:07,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [201.0, 55.0, 24.0, 97.0, 210.0, 119.0, 19.0, 232.0, 23.0, 102.0]
2025-09-16 15:19:07,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 40 minutes, 34 seconds)
2025-09-16 15:21:07,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:21:09,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 699.42133 ± 359.051
2025-09-16 15:21:09,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [745.11176, 932.52814, 1395.7443, 117.484474, 612.3289, 156.15002, 862.2562, 736.67413, 520.58826, 915.34674]
2025-09-16 15:21:09,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [144.0, 186.0, 246.0, 23.0, 114.0, 30.0, 157.0, 143.0, 94.0, 163.0]
2025-09-16 15:21:09,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 38 minutes, 33 seconds)
2025-09-16 15:23:08,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:23:11,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 837.69904 ± 308.796
2025-09-16 15:23:11,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1027.519, 811.26996, 762.17554, 754.4439, 1181.4647, 594.27686, 462.31793, 1539.3694, 605.992, 638.16113]
2025-09-16 15:23:11,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [183.0, 147.0, 140.0, 160.0, 217.0, 107.0, 83.0, 273.0, 122.0, 128.0]
2025-09-16 15:23:11,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 36 minutes, 28 seconds)
2025-09-16 15:25:10,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:25:12,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 673.54358 ± 236.506
2025-09-16 15:25:12,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [130.44456, 931.85864, 787.87396, 949.5965, 706.6673, 451.1096, 687.89844, 562.64636, 643.1152, 884.2249]
2025-09-16 15:25:12,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 182.0, 138.0, 186.0, 126.0, 96.0, 129.0, 103.0, 116.0, 158.0]
2025-09-16 15:25:12,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 34 minutes, 27 seconds)
2025-09-16 15:27:13,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:27:15,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 926.36072 ± 361.765
2025-09-16 15:27:15,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1350.7184, 1309.6548, 518.46155, 738.3983, 1135.4501, 892.6418, 1070.9688, 146.97975, 855.00476, 1245.3295]
2025-09-16 15:27:15,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [250.0, 247.0, 104.0, 139.0, 228.0, 175.0, 205.0, 28.0, 183.0, 233.0]
2025-09-16 15:27:15,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (926.36) for latency 15
2025-09-16 15:27:15,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 32 minutes, 32 seconds)
2025-09-16 15:29:15,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:29:18,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 747.36230 ± 419.537
2025-09-16 15:29:18,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [686.4966, 1038.3558, 626.5202, 394.90848, 126.21988, 1452.8728, 125.26669, 911.41644, 1206.0837, 905.4826]
2025-09-16 15:29:18,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [121.0, 195.0, 111.0, 73.0, 24.0, 270.0, 24.0, 183.0, 216.0, 180.0]
2025-09-16 15:29:18,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 30 minutes, 30 seconds)
2025-09-16 15:31:15,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:31:17,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 650.78357 ± 317.545
2025-09-16 15:31:17,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [353.52658, 455.51068, 484.6488, 344.55463, 1191.0477, 597.5949, 603.13934, 1312.5089, 492.29623, 673.0078]
2025-09-16 15:31:17,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [69.0, 85.0, 83.0, 61.0, 222.0, 127.0, 111.0, 274.0, 86.0, 119.0]
2025-09-16 15:31:17,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 28 minutes, 22 seconds)
2025-09-16 15:33:17,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:33:20,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 966.26483 ± 402.459
2025-09-16 15:33:20,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [654.2051, 1154.4078, 635.80725, 1184.3627, 1163.0188, 151.7467, 776.93604, 1364.8445, 952.92163, 1624.3989]
2025-09-16 15:33:20,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 214.0, 109.0, 227.0, 215.0, 29.0, 164.0, 263.0, 189.0, 303.0]
2025-09-16 15:33:20,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (966.26) for latency 15
2025-09-16 15:33:20,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 26 minutes, 22 seconds)
2025-09-16 15:35:22,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:35:25,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1064.71997 ± 594.714
2025-09-16 15:35:25,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [981.9535, 856.88385, 2480.2937, 1204.5338, 835.7311, 677.12585, 1603.259, 1111.5889, 770.1068, 125.72292]
2025-09-16 15:35:25,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [182.0, 159.0, 471.0, 218.0, 176.0, 125.0, 285.0, 198.0, 144.0, 24.0]
2025-09-16 15:35:25,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (1064.72) for latency 15
2025-09-16 15:35:25,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 24 minutes, 31 seconds)
2025-09-16 15:37:22,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:37:24,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 725.16418 ± 299.657
2025-09-16 15:37:24,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [493.5596, 1014.3009, 617.8926, 103.37399, 613.8461, 936.8772, 630.2168, 674.0266, 948.63605, 1218.9122]
2025-09-16 15:37:24,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 183.0, 119.0, 20.0, 119.0, 171.0, 117.0, 121.0, 174.0, 218.0]
2025-09-16 15:37:24,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 22 minutes, 18 seconds)
2025-09-16 15:39:25,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:39:28,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 964.99677 ± 414.931
2025-09-16 15:39:28,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1652.1997, 982.49243, 330.98233, 1341.9645, 556.58325, 1411.6471, 658.73236, 1096.9408, 510.04764, 1108.3777]
2025-09-16 15:39:28,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [315.0, 180.0, 58.0, 241.0, 106.0, 254.0, 122.0, 225.0, 98.0, 196.0]
2025-09-16 15:39:28,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 20 minutes, 20 seconds)
2025-09-16 15:41:28,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:41:31,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1170.87878 ± 470.616
2025-09-16 15:41:31,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [996.48236, 902.05994, 1202.0228, 932.8387, 1257.7216, 2247.578, 454.70612, 768.64526, 1350.9617, 1595.7714]
2025-09-16 15:41:31,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [194.0, 169.0, 215.0, 171.0, 236.0, 414.0, 79.0, 161.0, 256.0, 298.0]
2025-09-16 15:41:31,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (1170.88) for latency 15
2025-09-16 15:41:31,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 18 minutes, 24 seconds)
2025-09-16 15:43:30,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:43:32,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 782.33398 ± 233.038
2025-09-16 15:43:32,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [911.56055, 410.68423, 725.1177, 722.19073, 923.16254, 1062.7661, 968.215, 628.5082, 1067.4479, 403.68713]
2025-09-16 15:43:32,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [180.0, 73.0, 132.0, 143.0, 168.0, 221.0, 186.0, 136.0, 218.0, 85.0]
2025-09-16 15:43:32,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 16 minutes, 20 seconds)
2025-09-16 15:45:32,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:45:35,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1023.61877 ± 417.436
2025-09-16 15:45:35,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [734.4178, 582.53394, 895.84076, 881.46136, 1008.6816, 661.2523, 697.17993, 1935.491, 1568.8054, 1270.5239]
2025-09-16 15:45:35,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [139.0, 124.0, 168.0, 151.0, 175.0, 123.0, 140.0, 348.0, 271.0, 229.0]
2025-09-16 15:45:35,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 14 minutes, 14 seconds)
2025-09-16 15:47:35,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:47:37,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 873.81805 ± 212.667
2025-09-16 15:47:37,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [978.80444, 940.3536, 956.65326, 789.3135, 483.65463, 1050.6178, 1260.402, 916.3407, 763.9047, 598.13666]
2025-09-16 15:47:37,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [179.0, 175.0, 198.0, 172.0, 90.0, 191.0, 239.0, 170.0, 136.0, 108.0]
2025-09-16 15:47:37,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 12 minutes, 16 seconds)
2025-09-16 15:49:35,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:49:37,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 750.37006 ± 392.931
2025-09-16 15:49:37,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [508.642, 465.16064, 578.771, 514.6531, 885.0952, 1162.6144, 471.6648, 373.42902, 843.2027, 1700.4678]
2025-09-16 15:49:37,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 83.0, 104.0, 100.0, 163.0, 245.0, 89.0, 72.0, 182.0, 323.0]
2025-09-16 15:49:37,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 10 minutes, 9 seconds)
2025-09-16 15:51:36,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:51:39,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1004.49023 ± 720.464
2025-09-16 15:51:39,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1265.4929, 445.57285, 2778.1938, 671.2361, 694.9945, 816.2231, 1672.8226, 1009.9673, 115.16941, 575.2299]
2025-09-16 15:51:39,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [228.0, 77.0, 535.0, 121.0, 128.0, 142.0, 304.0, 192.0, 22.0, 103.0]
2025-09-16 15:51:39,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes, 6 seconds)
2025-09-16 15:53:36,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:53:39,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1046.48071 ± 656.456
2025-09-16 15:53:39,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [637.07043, 565.201, 1062.72, 471.94998, 2766.2214, 521.726, 904.2115, 1065.7058, 877.8037, 1592.1982]
2025-09-16 15:53:39,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [113.0, 101.0, 215.0, 102.0, 495.0, 104.0, 156.0, 194.0, 172.0, 298.0]
2025-09-16 15:53:39,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 4 seconds)
2025-09-16 15:55:37,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:55:40,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1099.40210 ± 517.532
2025-09-16 15:55:40,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [740.53815, 1012.3582, 1295.1141, 1194.005, 2311.3005, 1051.4215, 740.0727, 1576.9293, 409.44882, 662.8306]
2025-09-16 15:55:40,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [158.0, 197.0, 227.0, 242.0, 427.0, 196.0, 131.0, 297.0, 70.0, 116.0]
2025-09-16 15:55:40,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 1 second)
2025-09-16 15:57:35,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:57:38,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1235.32605 ± 563.624
2025-09-16 15:57:38,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [739.9223, 921.73157, 754.39374, 763.33234, 1761.1646, 719.72253, 2544.4902, 1187.824, 1468.2004, 1492.4797]
2025-09-16 15:57:38,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [132.0, 178.0, 136.0, 140.0, 333.0, 131.0, 476.0, 217.0, 266.0, 289.0]
2025-09-16 15:57:38,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (1235.33) for latency 15
2025-09-16 15:57:38,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes)
2025-09-16 15:59:34,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 15:59:37,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1054.97119 ± 646.017
2025-09-16 15:59:37,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [437.60376, 1060.9224, 1261.7988, 1131.7177, 582.633, 2492.924, 130.10413, 862.2313, 1766.6289, 823.14703]
2025-09-16 15:59:37,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 204.0, 244.0, 214.0, 115.0, 454.0, 25.0, 172.0, 342.0, 155.0]
2025-09-16 15:59:37,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1251 [DEBUG]: Training session finished
