2025-09-16 14:44:05,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.075-delay_21
2025-09-16 14:44:05,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.075-delay_21
2025-09-16 14:44:05,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'21': <latency_env.delayed_mdp.ConstantDelay object at 0x14fdaed68750>}
2025-09-16 14:44:05,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 14:44:05,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 14:44:05,880 baseline-bpql-noisepromille75-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=733, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 14:44:05,880 baseline-bpql-noisepromille75-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 14:44:07,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 14:44:07,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 14:46:05,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 14:46:05,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 233.35757 ± 130.151
2025-09-16 14:46:05,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [130.2605, 150.81725, 388.68878, 395.72134, 119.10794, 114.43498, 125.922104, 402.08948, 382.6111, 123.922226]
2025-09-16 14:46:05,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 29.0, 74.0, 78.0, 23.0, 22.0, 24.0, 84.0, 76.0, 24.0]
2025-09-16 14:46:05,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (233.36) for latency 21
2025-09-16 14:46:05,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 14 minutes, 58 seconds)
2025-09-16 14:48:12,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 14:48:13,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 266.82104 ± 140.697
2025-09-16 14:48:13,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [157.45853, 353.42227, 398.70404, 145.81345, 532.04895, 113.64991, 343.93594, 125.06488, 135.0828, 363.02982]
2025-09-16 14:48:13,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 71.0, 76.0, 28.0, 109.0, 22.0, 64.0, 24.0, 26.0, 71.0]
2025-09-16 14:48:13,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (266.82) for latency 21
2025-09-16 14:48:13,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 20 minutes, 47 seconds)
2025-09-16 14:50:17,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 14:50:18,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 323.08414 ± 163.527
2025-09-16 14:50:18,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [159.11128, 391.04388, 593.24915, 130.02759, 369.58618, 466.88486, 134.93016, 378.3334, 125.498146, 482.17676]
2025-09-16 14:50:18,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 77.0, 114.0, 25.0, 69.0, 96.0, 26.0, 82.0, 24.0, 93.0]
2025-09-16 14:50:18,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (323.08) for latency 21
2025-09-16 14:50:18,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 19 minutes, 59 seconds)
2025-09-16 14:52:20,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 14:52:21,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 188.29984 ± 93.125
2025-09-16 14:52:21,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [177.86906, 389.96255, 130.1067, 350.03345, 114.57588, 151.7095, 169.96603, 125.27659, 138.27455, 135.22409]
2025-09-16 14:52:21,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 76.0, 25.0, 72.0, 22.0, 29.0, 33.0, 24.0, 27.0, 26.0]
2025-09-16 14:52:21,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 17 minutes, 30 seconds)
2025-09-16 14:54:24,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 14:54:25,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 247.85197 ± 123.314
2025-09-16 14:54:25,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [454.8079, 168.00531, 485.5349, 316.52832, 130.51933, 133.81242, 160.66348, 176.17424, 201.73401, 250.73988]
2025-09-16 14:54:25,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 33.0, 96.0, 60.0, 25.0, 26.0, 31.0, 34.0, 39.0, 47.0]
2025-09-16 14:54:25,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 15 minutes, 32 seconds)
2025-09-16 14:56:27,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 14:56:28,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 216.39510 ± 90.439
2025-09-16 14:56:28,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [185.62746, 165.02826, 356.47733, 125.05543, 302.99973, 374.05515, 156.51, 135.46857, 129.91565, 232.81343]
2025-09-16 14:56:28,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [36.0, 32.0, 67.0, 24.0, 58.0, 76.0, 30.0, 26.0, 25.0, 45.0]
2025-09-16 14:56:28,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 15 minutes)
2025-09-16 14:58:30,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 14:58:31,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 315.06677 ± 208.096
2025-09-16 14:58:31,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [188.66843, 373.67978, 120.133804, 469.07364, 113.81443, 727.5187, 235.40762, 119.694305, 603.6852, 198.99187]
2025-09-16 14:58:31,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [37.0, 73.0, 23.0, 95.0, 22.0, 150.0, 47.0, 23.0, 127.0, 40.0]
2025-09-16 14:58:31,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 11 minutes, 41 seconds)
2025-09-16 15:00:33,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:00:34,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 335.41638 ± 132.671
2025-09-16 15:00:34,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [356.89557, 320.94476, 145.07532, 433.77356, 380.4393, 406.61636, 123.99371, 229.47577, 591.3708, 365.5786]
2025-09-16 15:00:34,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [68.0, 60.0, 28.0, 90.0, 74.0, 82.0, 24.0, 46.0, 118.0, 69.0]
2025-09-16 15:00:34,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (335.42) for latency 21
2025-09-16 15:00:34,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 8 minutes, 44 seconds)
2025-09-16 15:02:36,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:02:37,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 360.15143 ± 135.872
2025-09-16 15:02:37,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [403.30524, 118.69382, 284.10898, 491.22098, 374.12146, 390.5913, 124.11368, 433.96262, 541.289, 440.10718]
2025-09-16 15:02:37,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 23.0, 54.0, 105.0, 70.0, 73.0, 24.0, 82.0, 103.0, 99.0]
2025-09-16 15:02:37,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (360.15) for latency 21
2025-09-16 15:02:37,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 3 hours, 6 minutes, 51 seconds)
2025-09-16 15:04:40,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:04:41,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 295.15118 ± 40.290
2025-09-16 15:04:41,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [281.22803, 256.05045, 356.6059, 363.42145, 286.59042, 225.38675, 295.3896, 302.43216, 315.2459, 269.16113]
2025-09-16 15:04:41,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [57.0, 53.0, 65.0, 74.0, 58.0, 47.0, 56.0, 63.0, 61.0, 54.0]
2025-09-16 15:04:41,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 3 hours, 4 minutes, 58 seconds)
2025-09-16 15:06:45,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:06:46,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 357.86157 ± 83.480
2025-09-16 15:06:46,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [376.95218, 363.41507, 349.77167, 474.5904, 333.72406, 328.50696, 395.66028, 396.19928, 419.72406, 140.07175]
2025-09-16 15:06:46,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [70.0, 69.0, 65.0, 89.0, 61.0, 61.0, 74.0, 74.0, 81.0, 27.0]
2025-09-16 15:06:46,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 3 hours, 3 minutes, 26 seconds)
2025-09-16 15:08:47,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:08:48,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 394.41766 ± 149.631
2025-09-16 15:08:48,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [467.12015, 457.91724, 124.47169, 370.83923, 384.8886, 604.8779, 524.823, 134.52669, 502.65533, 372.05713]
2025-09-16 15:08:48,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [91.0, 84.0, 24.0, 70.0, 72.0, 115.0, 101.0, 26.0, 107.0, 70.0]
2025-09-16 15:08:48,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (394.42) for latency 21
2025-09-16 15:08:48,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 3 hours, 57 seconds)
2025-09-16 15:10:51,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:10:51,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 249.31827 ± 137.627
2025-09-16 15:10:51,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [151.08128, 528.61017, 140.06334, 130.83911, 129.29503, 413.59683, 358.70605, 161.10983, 324.99283, 154.88791]
2025-09-16 15:10:51,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 105.0, 27.0, 25.0, 25.0, 76.0, 68.0, 31.0, 62.0, 30.0]
2025-09-16 15:10:51,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 59 minutes, 5 seconds)
2025-09-16 15:12:54,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:12:55,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 300.54712 ± 173.889
2025-09-16 15:12:55,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [124.37192, 125.518135, 422.39325, 176.43901, 145.17819, 450.42657, 446.61102, 124.679955, 360.73163, 629.1216]
2025-09-16 15:12:55,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 24.0, 81.0, 34.0, 28.0, 82.0, 81.0, 24.0, 77.0, 121.0]
2025-09-16 15:12:55,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 57 minutes, 3 seconds)
2025-09-16 15:14:55,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:14:56,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 268.20145 ± 125.982
2025-09-16 15:14:56,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [177.26971, 373.05, 449.45303, 409.07993, 129.83975, 153.55017, 407.4846, 135.5446, 141.3716, 305.37103]
2025-09-16 15:14:56,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 70.0, 82.0, 75.0, 25.0, 30.0, 77.0, 26.0, 27.0, 58.0]
2025-09-16 15:14:56,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 54 minutes, 14 seconds)
2025-09-16 15:16:58,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:16:59,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 328.15326 ± 147.753
2025-09-16 15:16:59,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [521.04266, 390.44644, 160.29694, 508.21423, 144.02023, 159.36618, 370.8457, 161.85855, 470.7135, 394.7282]
2025-09-16 15:16:59,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 72.0, 31.0, 94.0, 28.0, 31.0, 68.0, 31.0, 91.0, 83.0]
2025-09-16 15:16:59,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 51 minutes, 39 seconds)
2025-09-16 15:19:01,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:19:02,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 346.39343 ± 125.682
2025-09-16 15:19:02,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [150.67265, 171.98103, 197.10503, 369.9512, 538.6342, 387.942, 500.47858, 384.69623, 387.2119, 375.2615]
2025-09-16 15:19:02,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 33.0, 38.0, 68.0, 103.0, 71.0, 92.0, 72.0, 71.0, 71.0]
2025-09-16 15:19:02,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 49 minutes, 48 seconds)
2025-09-16 15:21:05,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:21:06,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 383.93945 ± 130.885
2025-09-16 15:21:06,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [145.1231, 494.95547, 453.43146, 396.32825, 398.06778, 134.88283, 416.16223, 473.8667, 543.9817, 382.5949]
2025-09-16 15:21:06,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 94.0, 83.0, 73.0, 73.0, 26.0, 76.0, 90.0, 102.0, 74.0]
2025-09-16 15:21:06,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 48 minutes, 1 second)
2025-09-16 15:23:11,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:23:12,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 422.22690 ± 223.814
2025-09-16 15:23:12,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1002.08234, 130.1237, 442.3909, 367.42856, 441.75864, 155.93372, 455.5744, 406.83173, 448.9458, 371.1991]
2025-09-16 15:23:12,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [201.0, 25.0, 80.0, 73.0, 81.0, 30.0, 86.0, 75.0, 83.0, 68.0]
2025-09-16 15:23:12,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (422.23) for latency 21
2025-09-16 15:23:12,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 46 minutes, 48 seconds)
2025-09-16 15:25:17,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:25:18,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 353.65717 ± 156.138
2025-09-16 15:25:18,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [339.9801, 143.96263, 541.6584, 130.00336, 469.84787, 454.7637, 440.60886, 119.53937, 519.9744, 376.23312]
2025-09-16 15:25:18,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [64.0, 28.0, 113.0, 25.0, 89.0, 86.0, 81.0, 23.0, 98.0, 70.0]
2025-09-16 15:25:18,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 45 minutes, 46 seconds)
2025-09-16 15:27:23,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:27:24,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 253.39401 ± 144.479
2025-09-16 15:27:24,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [230.26958, 148.2769, 124.40169, 378.54218, 130.4676, 395.57663, 513.63525, 108.65873, 394.97385, 109.13773]
2025-09-16 15:27:24,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [44.0, 29.0, 24.0, 70.0, 25.0, 75.0, 98.0, 21.0, 72.0, 21.0]
2025-09-16 15:27:24,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 44 minutes, 27 seconds)
2025-09-16 15:29:29,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:29:30,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 254.27280 ± 137.503
2025-09-16 15:29:30,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [419.46252, 177.0389, 139.11925, 506.24902, 140.51082, 374.82053, 364.73648, 130.01947, 134.44983, 156.32126]
2025-09-16 15:29:30,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 34.0, 27.0, 93.0, 27.0, 79.0, 66.0, 25.0, 26.0, 30.0]
2025-09-16 15:29:30,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 43 minutes, 16 seconds)
2025-09-16 15:31:34,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:31:35,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 325.59027 ± 197.807
2025-09-16 15:31:35,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [155.89276, 124.96346, 560.80914, 135.15466, 427.717, 163.72765, 646.97705, 383.45483, 119.61934, 537.5868]
2025-09-16 15:31:35,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 24.0, 103.0, 26.0, 79.0, 32.0, 123.0, 70.0, 23.0, 112.0]
2025-09-16 15:31:35,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 41 minutes, 19 seconds)
2025-09-16 15:33:36,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:33:38,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 364.97379 ± 150.514
2025-09-16 15:33:38,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [134.79272, 465.5271, 472.63953, 144.80649, 375.52243, 385.3958, 166.36954, 564.4028, 442.5539, 497.7273]
2025-09-16 15:33:38,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 86.0, 88.0, 28.0, 71.0, 73.0, 32.0, 122.0, 87.0, 92.0]
2025-09-16 15:33:38,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 38 minutes, 22 seconds)
2025-09-16 15:35:40,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:35:41,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 295.19525 ± 169.450
2025-09-16 15:35:41,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [119.63715, 606.3618, 125.54817, 334.67697, 477.4275, 465.23068, 166.07413, 154.75023, 135.48291, 366.76303]
2025-09-16 15:35:41,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 114.0, 24.0, 63.0, 91.0, 87.0, 32.0, 30.0, 26.0, 68.0]
2025-09-16 15:35:41,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 35 minutes, 44 seconds)
2025-09-16 15:37:43,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:37:44,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 319.33832 ± 151.635
2025-09-16 15:37:44,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [164.73744, 170.63303, 402.9805, 593.63074, 392.9932, 438.02524, 114.21117, 145.21463, 408.1566, 362.8009]
2025-09-16 15:37:44,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 33.0, 75.0, 112.0, 74.0, 83.0, 22.0, 28.0, 76.0, 70.0]
2025-09-16 15:37:44,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 33 minutes, 5 seconds)
2025-09-16 15:39:46,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:39:47,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 413.26083 ± 206.735
2025-09-16 15:39:47,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [562.1103, 742.43524, 589.8253, 410.33133, 415.62805, 539.54724, 130.28009, 118.61283, 135.41429, 488.42352]
2025-09-16 15:39:47,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [106.0, 145.0, 110.0, 78.0, 76.0, 111.0, 25.0, 23.0, 26.0, 89.0]
2025-09-16 15:39:47,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 30 minutes, 8 seconds)
2025-09-16 15:41:49,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:41:50,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 366.24915 ± 153.567
2025-09-16 15:41:50,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [181.24414, 400.06796, 402.14676, 434.88763, 146.77728, 204.24214, 370.42868, 437.58273, 704.9892, 380.12482]
2025-09-16 15:41:50,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 75.0, 74.0, 80.0, 28.0, 39.0, 67.0, 83.0, 133.0, 75.0]
2025-09-16 15:41:50,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 27 minutes, 45 seconds)
2025-09-16 15:43:53,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:43:53,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 275.39853 ± 169.566
2025-09-16 15:43:53,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [135.51152, 467.7081, 440.2427, 160.55205, 149.5643, 135.56227, 125.045425, 123.514946, 532.7308, 483.55313]
2025-09-16 15:43:53,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 88.0, 82.0, 31.0, 29.0, 26.0, 24.0, 24.0, 99.0, 90.0]
2025-09-16 15:43:53,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 25 minutes, 44 seconds)
2025-09-16 15:45:55,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:45:56,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 342.08926 ± 219.151
2025-09-16 15:45:56,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [148.43665, 135.71172, 614.60565, 144.91078, 488.1945, 135.41153, 135.51587, 429.3677, 745.0203, 443.71793]
2025-09-16 15:45:56,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 26.0, 116.0, 28.0, 91.0, 26.0, 26.0, 80.0, 143.0, 86.0]
2025-09-16 15:45:56,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 23 minutes, 30 seconds)
2025-09-16 15:47:56,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:47:57,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 365.50653 ± 202.262
2025-09-16 15:47:57,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [478.0354, 515.6886, 129.04236, 133.1618, 135.01398, 505.874, 124.345085, 638.18225, 606.50385, 389.218]
2025-09-16 15:47:57,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 95.0, 25.0, 26.0, 26.0, 92.0, 24.0, 126.0, 113.0, 71.0]
2025-09-16 15:47:57,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 20 minutes, 55 seconds)
2025-09-16 15:49:57,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:49:59,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 421.17471 ± 146.532
2025-09-16 15:49:59,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [421.65085, 146.29398, 574.7607, 449.69492, 570.64655, 406.41815, 161.54375, 504.41327, 421.51685, 554.80804]
2025-09-16 15:49:59,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 28.0, 107.0, 92.0, 107.0, 75.0, 31.0, 100.0, 80.0, 102.0]
2025-09-16 15:49:59,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 18 minutes, 37 seconds)
2025-09-16 15:51:59,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:52:00,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 441.41437 ± 161.934
2025-09-16 15:52:00,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [540.0404, 358.16815, 125.27275, 527.5536, 490.92993, 635.0538, 565.8401, 516.59766, 167.50621, 487.18112]
2025-09-16 15:52:00,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [101.0, 68.0, 24.0, 99.0, 91.0, 119.0, 107.0, 97.0, 32.0, 90.0]
2025-09-16 15:52:00,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (441.41) for latency 21
2025-09-16 15:52:00,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 16 minutes, 11 seconds)
2025-09-16 15:54:01,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:54:02,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 292.63306 ± 149.835
2025-09-16 15:54:02,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [150.3079, 151.00288, 455.8362, 491.90228, 133.55986, 473.70093, 145.39783, 396.10007, 377.757, 150.76534]
2025-09-16 15:54:02,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 29.0, 99.0, 91.0, 26.0, 87.0, 28.0, 73.0, 72.0, 29.0]
2025-09-16 15:54:02,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 13 minutes, 47 seconds)
2025-09-16 15:56:01,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:56:02,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 321.41852 ± 177.939
2025-09-16 15:56:02,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [648.71674, 202.28174, 331.30673, 114.372536, 448.20087, 119.74687, 496.80035, 180.60518, 485.37228, 186.78166]
2025-09-16 15:56:02,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [121.0, 39.0, 63.0, 22.0, 82.0, 23.0, 109.0, 35.0, 102.0, 36.0]
2025-09-16 15:56:02,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 11 minutes, 25 seconds)
2025-09-16 15:58:02,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:58:04,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 405.23737 ± 219.877
2025-09-16 15:58:04,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [144.75194, 175.73737, 691.25336, 178.3285, 494.27075, 734.58276, 558.6802, 124.310684, 497.44528, 453.0127]
2025-09-16 15:58:04,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 34.0, 148.0, 34.0, 94.0, 149.0, 107.0, 24.0, 92.0, 86.0]
2025-09-16 15:58:04,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 9 minutes, 27 seconds)
2025-09-16 16:00:03,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:00:04,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 280.82074 ± 183.443
2025-09-16 16:00:04,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [515.7369, 124.50197, 124.848305, 139.35806, 454.8668, 619.41205, 125.32691, 160.41762, 391.82736, 151.91171]
2025-09-16 16:00:04,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [96.0, 24.0, 24.0, 27.0, 86.0, 116.0, 24.0, 31.0, 75.0, 29.0]
2025-09-16 16:00:04,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 7 minutes, 4 seconds)
2025-09-16 16:02:04,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:02:06,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 421.18866 ± 186.867
2025-09-16 16:02:06,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [464.62952, 602.7624, 153.67906, 141.02356, 161.03671, 463.00113, 572.7513, 470.78482, 667.5298, 514.68823]
2025-09-16 16:02:06,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 112.0, 30.0, 27.0, 31.0, 87.0, 105.0, 88.0, 140.0, 97.0]
2025-09-16 16:02:06,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 5 minutes, 9 seconds)
2025-09-16 16:04:04,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:04:05,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 421.56763 ± 171.418
2025-09-16 16:04:05,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [337.6773, 470.63907, 583.91626, 657.021, 166.52856, 571.6784, 125.6441, 359.11685, 375.4981, 567.95667]
2025-09-16 16:04:05,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [62.0, 85.0, 107.0, 118.0, 32.0, 108.0, 24.0, 78.0, 71.0, 107.0]
2025-09-16 16:04:05,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 2 minutes, 45 seconds)
2025-09-16 16:06:03,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:06:04,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 401.44739 ± 176.301
2025-09-16 16:06:04,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [500.71582, 398.94827, 658.9095, 516.01013, 440.0496, 152.24184, 565.43964, 476.95374, 125.10571, 180.09958]
2025-09-16 16:06:04,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [105.0, 75.0, 126.0, 95.0, 82.0, 29.0, 105.0, 90.0, 24.0, 35.0]
2025-09-16 16:06:04,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 16 seconds)
2025-09-16 16:08:02,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:08:04,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 444.69061 ± 183.084
2025-09-16 16:08:04,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [455.70038, 571.3084, 139.68332, 417.71378, 472.25735, 714.22327, 445.34433, 483.71585, 638.2573, 108.70193]
2025-09-16 16:08:04,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 128.0, 27.0, 78.0, 87.0, 133.0, 83.0, 90.0, 115.0, 21.0]
2025-09-16 16:08:04,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (444.69) for latency 21
2025-09-16 16:08:04,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 58 minutes)
2025-09-16 16:09:59,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:10:00,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 345.42264 ± 187.709
2025-09-16 16:10:00,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [135.42097, 156.88936, 133.65373, 468.24213, 135.53255, 388.9647, 404.63925, 474.46634, 716.5615, 439.85577]
2025-09-16 16:10:00,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 30.0, 26.0, 86.0, 26.0, 71.0, 74.0, 87.0, 131.0, 91.0]
2025-09-16 16:10:00,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 55 minutes, 20 seconds)
2025-09-16 16:11:57,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:11:58,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 365.12268 ± 151.049
2025-09-16 16:11:58,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [448.07654, 456.20267, 505.1604, 463.68347, 145.62363, 131.21906, 509.20963, 399.29227, 139.23837, 453.5208]
2025-09-16 16:11:58,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 92.0, 92.0, 86.0, 28.0, 25.0, 97.0, 75.0, 27.0, 84.0]
2025-09-16 16:11:58,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 52 minutes, 30 seconds)
2025-09-16 16:13:55,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:13:56,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 331.75238 ± 149.383
2025-09-16 16:13:56,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [425.16776, 156.40498, 380.6696, 203.77727, 604.2328, 375.58212, 459.77698, 411.24872, 140.2085, 160.45529]
2025-09-16 16:13:56,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 30.0, 69.0, 40.0, 117.0, 69.0, 85.0, 76.0, 27.0, 31.0]
2025-09-16 16:13:56,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 50 minutes, 19 seconds)
2025-09-16 16:15:57,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:15:58,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 324.34741 ± 188.575
2025-09-16 16:15:58,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [167.43571, 183.2527, 628.3944, 544.96454, 124.98604, 284.26828, 408.4794, 580.10724, 196.96477, 124.62093]
2025-09-16 16:15:58,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 35.0, 117.0, 100.0, 24.0, 55.0, 90.0, 126.0, 38.0, 24.0]
2025-09-16 16:15:58,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 49 minutes, 2 seconds)
2025-09-16 16:17:55,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:17:56,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 384.87555 ± 208.648
2025-09-16 16:17:56,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [534.34576, 186.89946, 156.9525, 780.68616, 436.97437, 528.954, 497.69058, 136.05608, 454.71915, 135.4773]
2025-09-16 16:17:56,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [113.0, 36.0, 30.0, 154.0, 81.0, 97.0, 93.0, 26.0, 88.0, 26.0]
2025-09-16 16:17:56,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 46 minutes, 36 seconds)
2025-09-16 16:19:54,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:19:55,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 391.28870 ± 195.954
2025-09-16 16:19:55,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [378.50476, 536.84784, 554.2894, 573.71765, 114.283, 129.2936, 473.00507, 377.57172, 113.357765, 662.0164]
2025-09-16 16:19:55,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 98.0, 107.0, 112.0, 22.0, 25.0, 92.0, 78.0, 22.0, 125.0]
2025-09-16 16:19:55,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 45 minutes, 4 seconds)
2025-09-16 16:21:52,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:21:53,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 369.34439 ± 165.303
2025-09-16 16:21:53,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [130.6323, 365.96265, 119.014206, 130.1034, 448.5942, 519.2149, 461.70697, 528.3237, 461.36646, 528.5252]
2025-09-16 16:21:53,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 68.0, 23.0, 25.0, 83.0, 98.0, 87.0, 99.0, 87.0, 99.0]
2025-09-16 16:21:53,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 43 minutes, 13 seconds)
2025-09-16 16:23:50,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:23:51,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 342.69760 ± 150.372
2025-09-16 16:23:51,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [483.34348, 443.26425, 377.34906, 140.22375, 535.68066, 410.1496, 114.286095, 119.62423, 360.4282, 442.62668]
2025-09-16 16:23:51,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [91.0, 82.0, 70.0, 27.0, 101.0, 74.0, 22.0, 23.0, 70.0, 84.0]
2025-09-16 16:23:51,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 41 minutes, 9 seconds)
2025-09-16 16:25:49,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:25:50,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 374.89001 ± 198.626
2025-09-16 16:25:50,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [135.02705, 464.70847, 161.41566, 521.0471, 473.9729, 745.63495, 492.43546, 170.47952, 442.7653, 141.41347]
2025-09-16 16:25:50,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 94.0, 31.0, 98.0, 88.0, 154.0, 92.0, 33.0, 83.0, 27.0]
2025-09-16 16:25:50,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 38 minutes, 34 seconds)
2025-09-16 16:27:48,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:27:49,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 308.23599 ± 172.616
2025-09-16 16:27:49,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [119.03924, 504.07626, 518.97754, 155.23477, 426.07016, 382.59244, 140.5368, 544.2215, 145.48425, 146.12708]
2025-09-16 16:27:49,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 92.0, 97.0, 30.0, 78.0, 73.0, 27.0, 113.0, 28.0, 28.0]
2025-09-16 16:27:49,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 36 minutes, 48 seconds)
2025-09-16 16:29:48,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:29:49,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 232.57556 ± 143.061
2025-09-16 16:29:49,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [124.95562, 402.9409, 205.94777, 505.7398, 146.6158, 425.8471, 135.20926, 130.02414, 140.31406, 108.161385]
2025-09-16 16:29:49,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 82.0, 39.0, 108.0, 28.0, 77.0, 26.0, 25.0, 27.0, 21.0]
2025-09-16 16:29:49,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 34 minutes, 58 seconds)
2025-09-16 16:31:50,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:31:52,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 539.71619 ± 139.863
2025-09-16 16:31:52,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [451.7749, 765.2865, 835.1109, 401.23413, 447.08505, 536.3947, 531.663, 431.99573, 439.67264, 556.944]
2025-09-16 16:31:52,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 154.0, 161.0, 73.0, 83.0, 113.0, 109.0, 96.0, 94.0, 119.0]
2025-09-16 16:31:52,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (539.72) for latency 21
2025-09-16 16:31:52,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 33 minutes, 44 seconds)
2025-09-16 16:33:50,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:33:52,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 428.35098 ± 161.457
2025-09-16 16:33:52,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [421.79648, 515.4886, 125.05877, 659.9096, 488.83966, 129.74164, 504.763, 489.53458, 494.5597, 453.81805]
2025-09-16 16:33:52,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 107.0, 24.0, 124.0, 91.0, 25.0, 92.0, 87.0, 91.0, 95.0]
2025-09-16 16:33:52,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 32 minutes, 2 seconds)
2025-09-16 16:35:49,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:35:50,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 402.12207 ± 233.352
2025-09-16 16:35:50,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [144.01474, 581.89716, 119.50422, 785.9681, 554.5342, 130.78746, 470.8138, 141.09218, 486.72208, 605.8868]
2025-09-16 16:35:50,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 111.0, 23.0, 156.0, 104.0, 25.0, 88.0, 27.0, 92.0, 112.0]
2025-09-16 16:35:50,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 30 minutes, 3 seconds)
2025-09-16 16:37:46,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:37:48,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 462.48822 ± 178.363
2025-09-16 16:37:48,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [124.5025, 443.51355, 548.4914, 422.7722, 690.48425, 171.20653, 443.35684, 556.7024, 655.6971, 568.1556]
2025-09-16 16:37:48,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 83.0, 107.0, 77.0, 126.0, 33.0, 82.0, 105.0, 122.0, 107.0]
2025-09-16 16:37:48,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 27 minutes, 52 seconds)
2025-09-16 16:39:45,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:39:46,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 391.39032 ± 202.971
2025-09-16 16:39:46,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [597.958, 419.73022, 713.9605, 108.781746, 540.8865, 108.65822, 391.59787, 119.63278, 455.29443, 457.40317]
2025-09-16 16:39:46,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [113.0, 77.0, 135.0, 21.0, 118.0, 21.0, 70.0, 23.0, 88.0, 84.0]
2025-09-16 16:39:46,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 25 minutes, 37 seconds)
2025-09-16 16:41:43,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:41:44,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 264.79105 ± 298.621
2025-09-16 16:41:44,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [113.26821, 188.72511, 129.35388, 119.734436, 1090.4773, 109.07414, 512.56085, 141.36697, 124.92349, 118.42581]
2025-09-16 16:41:44,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [22.0, 36.0, 25.0, 23.0, 212.0, 21.0, 94.0, 27.0, 24.0, 23.0]
2025-09-16 16:41:44,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 22 minutes, 57 seconds)
2025-09-16 16:43:42,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:43:43,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 288.83575 ± 176.392
2025-09-16 16:43:43,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [652.008, 420.1646, 187.30788, 145.7813, 119.31632, 493.2004, 173.41852, 125.541275, 184.0208, 387.5985]
2025-09-16 16:43:43,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [128.0, 88.0, 36.0, 28.0, 23.0, 93.0, 33.0, 24.0, 35.0, 69.0]
2025-09-16 16:43:43,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 20 minutes, 47 seconds)
2025-09-16 16:45:41,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:45:42,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 307.01370 ± 208.533
2025-09-16 16:45:42,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [606.7663, 146.35521, 129.627, 657.17017, 463.73438, 141.33124, 489.77185, 133.96204, 129.48604, 171.93265]
2025-09-16 16:45:42,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 28.0, 25.0, 121.0, 87.0, 27.0, 91.0, 26.0, 25.0, 33.0]
2025-09-16 16:45:42,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 18 minutes, 52 seconds)
2025-09-16 16:47:38,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:47:39,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 461.79938 ± 296.239
2025-09-16 16:47:39,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [505.3255, 703.98175, 141.01505, 377.21494, 515.37805, 950.2743, 176.08966, 208.4374, 915.6538, 124.62295]
2025-09-16 16:47:39,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 129.0, 27.0, 69.0, 95.0, 189.0, 34.0, 40.0, 176.0, 24.0]
2025-09-16 16:47:39,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 16 minutes, 54 seconds)
2025-09-16 16:49:37,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:49:38,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 431.82700 ± 326.758
2025-09-16 16:49:38,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [627.0416, 145.9331, 502.02502, 160.05476, 130.76723, 495.67337, 150.85463, 145.31216, 1056.7002, 903.908]
2025-09-16 16:49:38,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [121.0, 28.0, 94.0, 31.0, 25.0, 93.0, 29.0, 28.0, 202.0, 174.0]
2025-09-16 16:49:38,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 14 minutes, 57 seconds)
2025-09-16 16:51:36,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:51:37,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 465.55777 ± 356.588
2025-09-16 16:51:37,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [615.12885, 542.93054, 408.96176, 725.32025, 141.35439, 130.2219, 1332.068, 151.27525, 140.996, 467.32098]
2025-09-16 16:51:37,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [112.0, 97.0, 74.0, 132.0, 27.0, 25.0, 252.0, 29.0, 27.0, 84.0]
2025-09-16 16:51:37,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 13 minutes, 7 seconds)
2025-09-16 16:53:35,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:53:36,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 396.70697 ± 228.972
2025-09-16 16:53:36,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [124.20504, 124.67193, 704.5163, 130.88219, 151.55392, 423.72153, 610.91174, 629.4752, 455.28363, 611.8479]
2025-09-16 16:53:36,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 24.0, 130.0, 25.0, 29.0, 78.0, 114.0, 119.0, 83.0, 110.0]
2025-09-16 16:53:36,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 11 minutes, 10 seconds)
2025-09-16 16:55:33,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:55:34,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 356.75726 ± 189.561
2025-09-16 16:55:34,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [495.87018, 380.3914, 702.8544, 156.94838, 139.56662, 119.59569, 511.24533, 428.97095, 167.90213, 464.22784]
2025-09-16 16:55:34,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 74.0, 132.0, 30.0, 27.0, 23.0, 107.0, 76.0, 32.0, 84.0]
2025-09-16 16:55:34,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 9 minutes, 6 seconds)
2025-09-16 16:57:31,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:57:32,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 413.47467 ± 206.342
2025-09-16 16:57:32,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [619.3172, 125.647804, 135.13983, 626.1164, 435.3118, 387.85666, 450.81168, 119.358215, 565.11914, 670.06805]
2025-09-16 16:57:32,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [113.0, 24.0, 26.0, 131.0, 80.0, 78.0, 80.0, 23.0, 103.0, 130.0]
2025-09-16 16:57:32,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 7 minutes, 7 seconds)
2025-09-16 16:59:29,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:59:31,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 462.61407 ± 168.148
2025-09-16 16:59:31,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [534.3013, 152.3256, 150.09052, 553.10333, 577.5635, 476.44037, 543.022, 689.28516, 430.81912, 519.1894]
2025-09-16 16:59:31,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [102.0, 29.0, 29.0, 106.0, 102.0, 98.0, 100.0, 128.0, 80.0, 95.0]
2025-09-16 16:59:31,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 5 minutes, 12 seconds)
2025-09-16 17:01:28,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:01:29,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 337.73917 ± 252.769
2025-09-16 17:01:29,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [130.57872, 168.04501, 140.83745, 108.93215, 763.92267, 686.9772, 580.55853, 521.05615, 125.515884, 150.96785]
2025-09-16 17:01:29,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 32.0, 27.0, 21.0, 139.0, 139.0, 111.0, 97.0, 24.0, 29.0]
2025-09-16 17:01:29,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 3 minutes, 7 seconds)
2025-09-16 17:03:25,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:03:27,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 547.41168 ± 229.286
2025-09-16 17:03:27,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [385.67416, 1053.2897, 611.267, 505.005, 550.17334, 511.01547, 386.93298, 663.4994, 682.79504, 124.464874]
2025-09-16 17:03:27,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [71.0, 199.0, 119.0, 93.0, 100.0, 96.0, 72.0, 123.0, 140.0, 24.0]
2025-09-16 17:03:27,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (547.41) for latency 21
2025-09-16 17:03:27,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 1 minute, 4 seconds)
2025-09-16 17:05:24,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:05:25,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 422.28497 ± 293.534
2025-09-16 17:05:25,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [530.29156, 123.656815, 114.9437, 597.86914, 590.699, 114.440674, 618.36444, 125.11226, 1040.759, 366.713]
2025-09-16 17:05:25,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 24.0, 22.0, 113.0, 105.0, 22.0, 114.0, 24.0, 199.0, 83.0]
2025-09-16 17:05:25,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 59 minutes, 7 seconds)
2025-09-16 17:07:23,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:07:24,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 476.82285 ± 287.487
2025-09-16 17:07:24,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [618.37335, 166.38792, 650.24756, 804.45197, 459.0103, 626.5567, 970.03345, 172.39256, 171.24712, 129.52715]
2025-09-16 17:07:24,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [122.0, 32.0, 136.0, 148.0, 83.0, 119.0, 183.0, 33.0, 33.0, 25.0]
2025-09-16 17:07:24,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 57 minutes, 16 seconds)
2025-09-16 17:09:23,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:09:24,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 437.13300 ± 250.691
2025-09-16 17:09:24,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [539.14496, 187.42686, 863.8628, 612.7647, 145.77734, 146.82756, 662.238, 614.8595, 462.44644, 135.98195]
2025-09-16 17:09:24,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 36.0, 180.0, 122.0, 28.0, 28.0, 122.0, 118.0, 85.0, 26.0]
2025-09-16 17:09:24,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 55 minutes, 23 seconds)
2025-09-16 17:11:20,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:11:22,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 456.19907 ± 181.101
2025-09-16 17:11:22,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [602.9083, 427.52155, 464.99213, 406.77313, 528.4821, 728.96716, 589.05206, 134.8916, 533.128, 145.27469]
2025-09-16 17:11:22,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 79.0, 85.0, 73.0, 105.0, 130.0, 109.0, 26.0, 101.0, 28.0]
2025-09-16 17:11:22,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 53 minutes, 20 seconds)
2025-09-16 17:13:20,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:13:21,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 462.90225 ± 253.210
2025-09-16 17:13:21,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [680.607, 129.71568, 335.14273, 715.5203, 428.6087, 148.4239, 738.43097, 489.9335, 145.75323, 816.8864]
2025-09-16 17:13:21,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [130.0, 25.0, 60.0, 132.0, 81.0, 29.0, 151.0, 95.0, 28.0, 161.0]
2025-09-16 17:13:21,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 51 minutes, 30 seconds)
2025-09-16 17:15:17,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:15:19,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 544.72223 ± 277.411
2025-09-16 17:15:19,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [129.98877, 656.93164, 674.09247, 566.4262, 696.32635, 157.62718, 779.18835, 1007.35156, 586.9254, 192.36444]
2025-09-16 17:15:19,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 135.0, 120.0, 112.0, 137.0, 30.0, 144.0, 198.0, 109.0, 37.0]
2025-09-16 17:15:19,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 49 minutes, 27 seconds)
2025-09-16 17:17:18,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:17:19,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 511.84625 ± 221.950
2025-09-16 17:17:19,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [500.44247, 653.57745, 779.49304, 774.6622, 412.73422, 156.55412, 605.7352, 436.4046, 678.5866, 120.27249]
2025-09-16 17:17:19,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 122.0, 153.0, 164.0, 76.0, 30.0, 125.0, 81.0, 124.0, 23.0]
2025-09-16 17:17:19,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 47 minutes, 36 seconds)
2025-09-16 17:19:16,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:19:18,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 576.07886 ± 358.924
2025-09-16 17:19:18,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [637.6624, 841.94324, 125.7728, 474.3522, 1330.365, 662.80066, 783.8494, 145.05173, 622.34375, 136.64693]
2025-09-16 17:19:18,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [117.0, 165.0, 24.0, 87.0, 268.0, 117.0, 144.0, 28.0, 115.0, 26.0]
2025-09-16 17:19:18,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (576.08) for latency 21
2025-09-16 17:19:18,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 45 minutes, 31 seconds)
2025-09-16 17:21:15,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:21:17,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 549.92126 ± 169.966
2025-09-16 17:21:17,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [524.6585, 577.63306, 135.45433, 584.4045, 530.24286, 829.60724, 615.2326, 502.25745, 489.47855, 710.2441]
2025-09-16 17:21:17,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [111.0, 104.0, 26.0, 110.0, 98.0, 172.0, 120.0, 94.0, 91.0, 131.0]
2025-09-16 17:21:17,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 43 minutes, 39 seconds)
2025-09-16 17:23:13,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:23:14,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 453.69156 ± 317.584
2025-09-16 17:23:14,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [155.29353, 653.7869, 129.40128, 125.71889, 544.3481, 463.74652, 578.9431, 561.6904, 1183.6893, 140.29758]
2025-09-16 17:23:14,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 119.0, 25.0, 24.0, 102.0, 85.0, 101.0, 106.0, 220.0, 27.0]
2025-09-16 17:23:14,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 41 minutes, 31 seconds)
2025-09-16 17:25:13,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:25:14,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 433.97198 ± 314.380
2025-09-16 17:25:14,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [418.64453, 653.08344, 119.87891, 140.43979, 123.636475, 665.27026, 477.98657, 1150.8345, 454.67203, 135.27338]
2025-09-16 17:25:14,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [75.0, 122.0, 23.0, 27.0, 24.0, 120.0, 87.0, 210.0, 83.0, 26.0]
2025-09-16 17:25:14,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 39 minutes, 39 seconds)
2025-09-16 17:27:11,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:27:12,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 540.98309 ± 297.322
2025-09-16 17:27:12,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [720.846, 509.529, 176.4358, 145.39352, 1120.1183, 679.4712, 672.4697, 607.8261, 652.1852, 125.55632]
2025-09-16 17:27:12,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [139.0, 90.0, 34.0, 28.0, 228.0, 128.0, 127.0, 115.0, 122.0, 24.0]
2025-09-16 17:27:12,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 37 minutes, 33 seconds)
2025-09-16 17:29:10,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:29:11,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 582.31445 ± 226.837
2025-09-16 17:29:11,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [872.47565, 969.15485, 473.22058, 556.56, 619.02374, 467.22263, 427.19937, 188.85548, 436.1891, 813.24304]
2025-09-16 17:29:11,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [180.0, 179.0, 104.0, 111.0, 114.0, 96.0, 79.0, 36.0, 80.0, 147.0]
2025-09-16 17:29:11,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (582.31) for latency 21
2025-09-16 17:29:11,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 35 minutes, 36 seconds)
2025-09-16 17:31:08,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:31:10,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 449.97812 ± 196.904
2025-09-16 17:31:10,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [458.56003, 474.25824, 170.55336, 163.48407, 559.7473, 767.0249, 518.9399, 192.71143, 595.7387, 598.7631]
2025-09-16 17:31:10,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 86.0, 33.0, 31.0, 110.0, 144.0, 99.0, 37.0, 116.0, 105.0]
2025-09-16 17:31:10,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 33 minutes, 35 seconds)
2025-09-16 17:33:07,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:33:09,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 585.70575 ± 214.420
2025-09-16 17:33:09,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [170.97343, 782.44824, 402.19882, 810.3996, 942.3347, 468.38702, 523.7845, 694.1118, 496.47098, 565.949]
2025-09-16 17:33:09,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 146.0, 75.0, 151.0, 180.0, 84.0, 100.0, 137.0, 92.0, 100.0]
2025-09-16 17:33:09,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (585.71) for latency 21
2025-09-16 17:33:09,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 31 minutes, 43 seconds)
2025-09-16 17:35:08,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:35:10,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 648.91943 ± 246.689
2025-09-16 17:35:10,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [441.5006, 456.1388, 488.38315, 425.39487, 1175.7184, 670.4072, 501.79156, 940.2564, 873.57355, 516.02954]
2025-09-16 17:35:10,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 82.0, 108.0, 94.0, 238.0, 136.0, 95.0, 177.0, 175.0, 92.0]
2025-09-16 17:35:10,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (648.92) for latency 21
2025-09-16 17:35:10,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 29 minutes, 47 seconds)
2025-09-16 17:37:07,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:37:09,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 645.80530 ± 495.875
2025-09-16 17:37:09,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [119.02145, 136.63884, 1621.1288, 999.1161, 650.0968, 123.2488, 144.80159, 723.65576, 1199.8533, 740.4916]
2025-09-16 17:37:09,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [23.0, 26.0, 304.0, 191.0, 114.0, 24.0, 28.0, 134.0, 238.0, 154.0]
2025-09-16 17:37:09,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 27 minutes, 49 seconds)
2025-09-16 17:39:06,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:39:08,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 394.71548 ± 259.625
2025-09-16 17:39:08,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [586.3786, 150.93823, 136.30925, 639.95886, 566.04694, 145.36317, 617.21936, 820.5074, 165.89398, 118.53872]
2025-09-16 17:39:08,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [107.0, 29.0, 26.0, 118.0, 103.0, 28.0, 115.0, 153.0, 32.0, 23.0]
2025-09-16 17:39:08,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 25 minutes, 49 seconds)
2025-09-16 17:41:05,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:41:06,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 467.79678 ± 296.752
2025-09-16 17:41:06,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [360.67136, 136.42047, 884.9981, 893.27386, 166.31291, 500.4107, 171.59486, 637.9932, 775.82745, 150.46477]
2025-09-16 17:41:06,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [69.0, 26.0, 176.0, 176.0, 32.0, 89.0, 33.0, 119.0, 157.0, 29.0]
2025-09-16 17:41:06,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 23 minutes, 51 seconds)
2025-09-16 17:43:05,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:43:07,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 724.89746 ± 349.635
2025-09-16 17:43:07,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [847.54047, 952.42487, 944.96954, 160.48712, 150.38553, 729.2214, 1143.1338, 346.89923, 945.6666, 1028.2463]
2025-09-16 17:43:07,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [152.0, 170.0, 170.0, 31.0, 29.0, 147.0, 212.0, 74.0, 171.0, 176.0]
2025-09-16 17:43:07,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (724.90) for latency 21
2025-09-16 17:43:07,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 21 minutes, 56 seconds)
2025-09-16 17:45:04,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:45:06,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 585.23187 ± 282.489
2025-09-16 17:45:06,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [767.7235, 124.26556, 481.5125, 603.90424, 486.5146, 1058.7098, 124.72695, 608.21814, 755.20984, 841.5339]
2025-09-16 17:45:06,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [135.0, 24.0, 87.0, 107.0, 91.0, 203.0, 24.0, 107.0, 148.0, 149.0]
2025-09-16 17:45:06,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 19 minutes, 51 seconds)
2025-09-16 17:47:02,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:47:04,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 538.25110 ± 399.753
2025-09-16 17:47:04,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [146.57878, 699.09973, 870.2243, 961.1853, 606.5818, 129.43213, 1312.2804, 135.92819, 114.48246, 406.71823]
2025-09-16 17:47:04,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 130.0, 162.0, 178.0, 110.0, 25.0, 244.0, 26.0, 22.0, 77.0]
2025-09-16 17:47:04,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 17 minutes, 51 seconds)
2025-09-16 17:49:02,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:49:04,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 602.10254 ± 311.045
2025-09-16 17:49:04,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [770.79785, 1028.3884, 124.991135, 162.57635, 150.87325, 725.598, 721.63196, 843.2491, 704.5062, 788.4124]
2025-09-16 17:49:04,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [147.0, 208.0, 24.0, 31.0, 29.0, 129.0, 134.0, 164.0, 125.0, 152.0]
2025-09-16 17:49:04,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 15 minutes, 53 seconds)
2025-09-16 17:51:01,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:51:03,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 670.18005 ± 288.115
2025-09-16 17:51:03,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [134.81642, 445.38373, 782.8088, 715.7832, 1067.0872, 715.9613, 380.78766, 1090.7666, 853.7965, 514.6092]
2025-09-16 17:51:03,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 82.0, 162.0, 132.0, 199.0, 135.0, 72.0, 204.0, 152.0, 110.0]
2025-09-16 17:51:03,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 13 minutes, 55 seconds)
2025-09-16 17:53:02,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:53:04,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 636.30511 ± 491.624
2025-09-16 17:53:04,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [916.3972, 669.0173, 135.02563, 797.5028, 855.9267, 790.4534, 134.71358, 140.92775, 1761.4203, 161.66667]
2025-09-16 17:53:04,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [169.0, 141.0, 26.0, 146.0, 160.0, 136.0, 26.0, 27.0, 337.0, 31.0]
2025-09-16 17:53:04,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 11 minutes, 55 seconds)
2025-09-16 17:55:00,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:55:02,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 560.58118 ± 292.316
2025-09-16 17:55:02,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [631.41547, 1035.221, 587.96, 161.6178, 140.58496, 807.82697, 587.223, 827.2156, 649.58466, 177.16193]
2025-09-16 17:55:02,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [115.0, 198.0, 106.0, 31.0, 27.0, 155.0, 105.0, 176.0, 117.0, 34.0]
2025-09-16 17:55:02,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes, 56 seconds)
2025-09-16 17:56:59,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:57:01,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 591.38531 ± 334.479
2025-09-16 17:57:01,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [982.7933, 863.61865, 184.13483, 512.3346, 114.45553, 103.47639, 532.3439, 887.6852, 941.2461, 791.7647]
2025-09-16 17:57:01,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [183.0, 157.0, 35.0, 93.0, 22.0, 20.0, 94.0, 169.0, 178.0, 163.0]
2025-09-16 17:57:01,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 57 seconds)
2025-09-16 17:58:58,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:59:00,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 635.78137 ± 317.531
2025-09-16 17:59:00,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [579.48285, 538.27924, 708.5101, 175.6032, 582.7899, 997.88763, 151.28732, 1126.2944, 480.79462, 1016.88416]
2025-09-16 17:59:00,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [122.0, 95.0, 125.0, 34.0, 109.0, 194.0, 29.0, 207.0, 87.0, 189.0]
2025-09-16 17:59:00,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 58 seconds)
2025-09-16 18:00:57,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:00:59,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 706.68713 ± 362.166
2025-09-16 18:00:59,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [644.2545, 114.07694, 124.37205, 1100.0144, 822.1942, 622.54816, 973.63824, 717.04126, 636.97375, 1311.7576]
2025-09-16 18:00:59,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [125.0, 22.0, 24.0, 204.0, 148.0, 125.0, 177.0, 125.0, 126.0, 229.0]
2025-09-16 18:00:59,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 58 seconds)
2025-09-16 18:02:57,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:02:59,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 863.92639 ± 490.546
2025-09-16 18:02:59,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [760.7124, 971.3576, 637.52185, 613.42645, 1292.7269, 853.4849, 491.80902, 2053.272, 830.4311, 134.52194]
2025-09-16 18:02:59,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [144.0, 184.0, 112.0, 111.0, 242.0, 155.0, 88.0, 393.0, 155.0, 26.0]
2025-09-16 18:02:59,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (863.93) for latency 21
2025-09-16 18:02:59,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 59 seconds)
2025-09-16 18:04:56,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:04:57,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 608.18542 ± 394.133
2025-09-16 18:04:57,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [751.9994, 124.08353, 659.76996, 141.10312, 1457.8635, 647.1333, 578.7961, 618.66254, 134.92358, 967.5193]
2025-09-16 18:04:57,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [136.0, 24.0, 117.0, 27.0, 276.0, 127.0, 122.0, 113.0, 26.0, 191.0]
2025-09-16 18:04:57,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1251 [DEBUG]: Training session finished
