2025-09-16 12:10:52,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.200-delay_9
2025-09-16 12:10:52,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.200-delay_9
2025-09-16 12:10:52,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'9': <latency_env.delayed_mdp.ConstantDelay object at 0x14999c230790>}
2025-09-16 12:10:52,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 12:10:52,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 12:10:52,165 baseline-bpql-noisepromille200-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=529, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 12:10:52,166 baseline-bpql-noisepromille200-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 12:10:54,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 12:10:54,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 12:12:39,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:12:40,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 344.58902 ± 119.355
2025-09-16 12:12:40,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [378.62793, 405.3826, 271.30743, 652.3069, 225.371, 306.42798, 307.7086, 233.22354, 270.19522, 395.339]
2025-09-16 12:12:40,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [71.0, 75.0, 50.0, 138.0, 42.0, 57.0, 56.0, 43.0, 51.0, 73.0]
2025-09-16 12:12:40,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (344.59) for latency 9
2025-09-16 12:12:40,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 54 minutes, 56 seconds)
2025-09-16 12:14:35,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:14:36,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 375.39676 ± 71.988
2025-09-16 12:14:36,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [364.72537, 308.95972, 306.6611, 403.76495, 277.0362, 488.52628, 368.90106, 371.49002, 512.0698, 351.83292]
2025-09-16 12:14:36,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [71.0, 57.0, 56.0, 76.0, 51.0, 99.0, 72.0, 70.0, 101.0, 67.0]
2025-09-16 12:14:36,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (375.40) for latency 9
2025-09-16 12:14:36,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 1 minute, 11 seconds)
2025-09-16 12:16:30,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:16:30,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 328.09494 ± 93.183
2025-09-16 12:16:30,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [240.07942, 479.25928, 121.06533, 377.68054, 295.2193, 408.79428, 312.0465, 377.96375, 350.1693, 318.67163]
2025-09-16 12:16:30,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [45.0, 94.0, 24.0, 69.0, 54.0, 76.0, 57.0, 70.0, 64.0, 59.0]
2025-09-16 12:16:30,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 1 minute, 23 seconds)
2025-09-16 12:18:26,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:18:27,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 374.41785 ± 180.822
2025-09-16 12:18:27,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [327.87253, 282.61642, 377.2419, 267.7292, 313.42307, 299.80066, 306.8774, 438.6503, 892.3789, 237.58823]
2025-09-16 12:18:27,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [60.0, 58.0, 71.0, 50.0, 59.0, 65.0, 56.0, 83.0, 176.0, 51.0]
2025-09-16 12:18:27,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 1 minute, 15 seconds)
2025-09-16 12:20:22,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:20:23,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 379.00391 ± 77.370
2025-09-16 12:20:23,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [459.4579, 375.9974, 242.33862, 317.12476, 347.53455, 333.71277, 542.0113, 380.33847, 377.09537, 414.42795]
2025-09-16 12:20:23,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 67.0, 45.0, 59.0, 76.0, 73.0, 109.0, 72.0, 68.0, 93.0]
2025-09-16 12:20:23,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (379.00) for latency 9
2025-09-16 12:20:23,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 8 seconds)
2025-09-16 12:22:18,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:22:19,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 386.37997 ± 94.755
2025-09-16 12:22:19,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [427.95312, 286.32428, 601.8382, 324.78836, 352.89774, 488.3152, 395.38083, 381.81577, 334.3279, 270.1585]
2025-09-16 12:22:19,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 52.0, 129.0, 72.0, 72.0, 92.0, 77.0, 80.0, 65.0, 50.0]
2025-09-16 12:22:19,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (386.38) for latency 9
2025-09-16 12:22:19,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 1 minute, 23 seconds)
2025-09-16 12:24:14,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:24:15,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 475.68817 ± 195.388
2025-09-16 12:24:15,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [500.64438, 283.05392, 830.0143, 724.95264, 558.31744, 286.84515, 584.0033, 340.1961, 462.29202, 186.56244]
2025-09-16 12:24:15,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [108.0, 52.0, 177.0, 148.0, 108.0, 55.0, 111.0, 62.0, 87.0, 38.0]
2025-09-16 12:24:15,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (475.69) for latency 9
2025-09-16 12:24:15,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 59 minutes, 44 seconds)
2025-09-16 12:26:11,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:26:12,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 348.02325 ± 54.371
2025-09-16 12:26:12,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [240.91808, 297.748, 341.92233, 381.08298, 401.29626, 395.79102, 358.2826, 428.47952, 338.01056, 296.7012]
2025-09-16 12:26:12,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [47.0, 56.0, 62.0, 73.0, 90.0, 82.0, 71.0, 76.0, 65.0, 60.0]
2025-09-16 12:26:12,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 58 minutes, 21 seconds)
2025-09-16 12:28:07,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:28:08,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 358.14542 ± 59.015
2025-09-16 12:28:08,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [316.82953, 455.13397, 261.75354, 329.68073, 410.61185, 391.50098, 407.86746, 276.07556, 380.4243, 351.57605]
2025-09-16 12:28:08,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [58.0, 84.0, 48.0, 61.0, 79.0, 74.0, 76.0, 50.0, 73.0, 64.0]
2025-09-16 12:28:08,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 56 minutes, 15 seconds)
2025-09-16 12:30:04,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:30:05,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 412.52432 ± 92.252
2025-09-16 12:30:05,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [368.46222, 339.20926, 290.7307, 384.70822, 476.88464, 350.1324, 368.28937, 594.4406, 402.07028, 550.3155]
2025-09-16 12:30:05,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 61.0, 54.0, 71.0, 90.0, 64.0, 66.0, 119.0, 73.0, 102.0]
2025-09-16 12:30:05,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 54 minutes, 34 seconds)
2025-09-16 12:32:00,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:32:01,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 389.43207 ± 114.766
2025-09-16 12:32:01,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [311.8107, 316.8637, 316.61545, 354.106, 344.17548, 638.672, 276.6191, 313.92828, 480.4988, 541.0313]
2025-09-16 12:32:01,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [58.0, 58.0, 57.0, 69.0, 62.0, 134.0, 52.0, 60.0, 103.0, 118.0]
2025-09-16 12:32:01,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 52 minutes, 42 seconds)
2025-09-16 12:33:57,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:33:58,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 388.03833 ± 111.764
2025-09-16 12:33:58,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [400.3719, 233.47589, 490.63232, 638.534, 390.90994, 413.1239, 250.9395, 395.0888, 300.50128, 366.80597]
2025-09-16 12:33:58,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 44.0, 101.0, 128.0, 71.0, 88.0, 47.0, 72.0, 55.0, 68.0]
2025-09-16 12:33:58,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 50 minutes, 52 seconds)
2025-09-16 12:35:54,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:35:55,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 477.75644 ± 176.016
2025-09-16 12:35:55,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [342.28415, 559.38745, 397.86526, 350.21643, 457.69666, 462.4394, 901.9474, 659.8729, 338.18124, 307.67386]
2025-09-16 12:35:55,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [64.0, 110.0, 73.0, 79.0, 84.0, 87.0, 173.0, 123.0, 60.0, 57.0]
2025-09-16 12:35:55,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (477.76) for latency 9
2025-09-16 12:35:55,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 49 minutes, 8 seconds)
2025-09-16 12:37:51,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:37:52,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 421.94693 ± 75.057
2025-09-16 12:37:52,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [478.88715, 442.8247, 490.71957, 351.21927, 555.8644, 356.98145, 343.52567, 493.0595, 360.63416, 345.75323]
2025-09-16 12:37:52,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 81.0, 99.0, 64.0, 103.0, 65.0, 63.0, 91.0, 64.0, 62.0]
2025-09-16 12:37:52,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 47 minutes, 32 seconds)
2025-09-16 12:39:49,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:39:50,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 412.85303 ± 92.095
2025-09-16 12:39:50,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [338.68045, 512.97485, 555.2437, 292.98038, 380.06104, 356.40582, 523.0677, 382.2365, 484.3912, 302.48853]
2025-09-16 12:39:50,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [61.0, 102.0, 103.0, 53.0, 69.0, 65.0, 97.0, 81.0, 90.0, 54.0]
2025-09-16 12:39:50,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 45 minutes, 55 seconds)
2025-09-16 12:41:46,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:41:48,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 491.88876 ± 120.179
2025-09-16 12:41:48,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [461.18372, 753.38745, 648.7417, 507.92694, 454.16876, 372.9392, 432.93124, 351.5185, 540.00885, 396.08167]
2025-09-16 12:41:48,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 139.0, 136.0, 113.0, 97.0, 73.0, 96.0, 64.0, 103.0, 74.0]
2025-09-16 12:41:48,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (491.89) for latency 9
2025-09-16 12:41:48,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 44 minutes, 19 seconds)
2025-09-16 12:43:44,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:43:45,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 466.02768 ± 156.984
2025-09-16 12:43:45,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [286.85748, 862.01526, 543.1391, 462.53815, 439.5843, 511.89728, 302.0025, 474.763, 458.8914, 318.5884]
2025-09-16 12:43:45,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [55.0, 158.0, 113.0, 83.0, 82.0, 92.0, 56.0, 86.0, 87.0, 58.0]
2025-09-16 12:43:45,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 42 minutes, 28 seconds)
2025-09-16 12:45:42,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:45:43,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 506.00113 ± 89.048
2025-09-16 12:45:43,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [502.64743, 447.63342, 426.92636, 424.33127, 584.5236, 548.138, 581.059, 370.77277, 681.7922, 492.1872]
2025-09-16 12:45:43,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 82.0, 79.0, 84.0, 119.0, 102.0, 125.0, 68.0, 133.0, 96.0]
2025-09-16 12:45:43,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (506.00) for latency 9
2025-09-16 12:45:43,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 40 minutes, 40 seconds)
2025-09-16 12:47:40,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:47:41,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 436.84790 ± 100.878
2025-09-16 12:47:41,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [470.90027, 464.34506, 520.5711, 369.24774, 409.0057, 469.4672, 481.16815, 225.93755, 343.93195, 613.9042]
2025-09-16 12:47:41,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [88.0, 94.0, 98.0, 67.0, 77.0, 85.0, 93.0, 42.0, 68.0, 129.0]
2025-09-16 12:47:41,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 38 minutes, 58 seconds)
2025-09-16 12:49:36,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:49:37,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 462.29913 ± 103.660
2025-09-16 12:49:37,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [422.9548, 385.85718, 270.99252, 683.51416, 490.6001, 504.11932, 407.74243, 419.89706, 539.2847, 498.0288]
2025-09-16 12:49:37,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [87.0, 70.0, 53.0, 130.0, 91.0, 95.0, 72.0, 76.0, 107.0, 109.0]
2025-09-16 12:49:37,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 36 minutes, 36 seconds)
2025-09-16 12:51:36,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:51:37,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 593.97180 ± 153.027
2025-09-16 12:51:37,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [436.82587, 680.7958, 840.8568, 442.9248, 422.25577, 384.27374, 684.5284, 657.7878, 616.8488, 772.6203]
2025-09-16 12:51:37,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 133.0, 159.0, 95.0, 75.0, 73.0, 137.0, 139.0, 133.0, 156.0]
2025-09-16 12:51:37,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (593.97) for latency 9
2025-09-16 12:51:37,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 35 minutes, 18 seconds)
2025-09-16 12:53:34,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:53:35,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 579.91644 ± 128.475
2025-09-16 12:53:35,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [673.69476, 769.98517, 479.5627, 553.4821, 286.88144, 581.0124, 540.13696, 556.7122, 708.8664, 648.8304]
2025-09-16 12:53:35,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [130.0, 144.0, 92.0, 104.0, 54.0, 104.0, 98.0, 119.0, 138.0, 124.0]
2025-09-16 12:53:35,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 33 minutes, 22 seconds)
2025-09-16 12:55:31,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:55:32,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 458.76807 ± 125.027
2025-09-16 12:55:32,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [247.68042, 421.76297, 281.60373, 441.1282, 577.19885, 631.36774, 570.45374, 414.12122, 592.8644, 409.49927]
2025-09-16 12:55:32,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [45.0, 76.0, 53.0, 87.0, 126.0, 132.0, 119.0, 76.0, 108.0, 75.0]
2025-09-16 12:55:32,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 31 minutes, 13 seconds)
2025-09-16 12:57:28,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:57:29,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 471.15521 ± 103.540
2025-09-16 12:57:29,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [472.47845, 354.23303, 401.12665, 318.88123, 418.79767, 470.24033, 609.0122, 439.6343, 646.2398, 580.9088]
2025-09-16 12:57:29,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [100.0, 65.0, 73.0, 57.0, 77.0, 87.0, 109.0, 78.0, 120.0, 121.0]
2025-09-16 12:57:29,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 28 minutes, 57 seconds)
2025-09-16 12:59:26,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:59:28,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 602.46295 ± 180.324
2025-09-16 12:59:28,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [859.81696, 796.4612, 562.26587, 647.44775, 378.25388, 692.48724, 788.64325, 581.2234, 305.91016, 412.1195]
2025-09-16 12:59:28,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [180.0, 148.0, 106.0, 128.0, 83.0, 148.0, 146.0, 107.0, 63.0, 74.0]
2025-09-16 12:59:28,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (602.46) for latency 9
2025-09-16 12:59:28,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 27 minutes, 36 seconds)
2025-09-16 13:01:24,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:01:26,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 563.95789 ± 105.013
2025-09-16 13:01:26,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [678.00745, 568.38715, 636.8185, 579.1757, 705.2408, 427.22495, 614.2246, 348.13736, 503.0398, 579.3229]
2025-09-16 13:01:26,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [128.0, 106.0, 117.0, 106.0, 128.0, 79.0, 115.0, 66.0, 102.0, 103.0]
2025-09-16 13:01:26,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 25 minutes, 5 seconds)
2025-09-16 13:03:22,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:03:24,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 573.17572 ± 94.411
2025-09-16 13:03:24,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [530.02814, 513.9023, 680.86523, 487.61002, 501.161, 506.3517, 635.3194, 630.00653, 476.48846, 770.0247]
2025-09-16 13:03:24,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [95.0, 94.0, 140.0, 90.0, 94.0, 92.0, 134.0, 117.0, 99.0, 164.0]
2025-09-16 13:03:24,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 23 minutes, 10 seconds)
2025-09-16 13:05:22,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:05:23,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 590.87250 ± 215.701
2025-09-16 13:05:23,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [740.04315, 701.0606, 653.7811, 618.8113, 389.10962, 330.4569, 351.74683, 1055.766, 396.44135, 671.5081]
2025-09-16 13:05:23,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [138.0, 143.0, 123.0, 112.0, 71.0, 60.0, 64.0, 197.0, 72.0, 143.0]
2025-09-16 13:05:23,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 21 minutes, 53 seconds)
2025-09-16 13:07:20,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:07:22,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 634.66913 ± 208.867
2025-09-16 13:07:22,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [896.9996, 720.3363, 456.4191, 645.77985, 526.3402, 1063.169, 338.62314, 664.60693, 425.77652, 608.64075]
2025-09-16 13:07:22,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [174.0, 134.0, 96.0, 121.0, 98.0, 219.0, 74.0, 125.0, 92.0, 125.0]
2025-09-16 13:07:22,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (634.67) for latency 9
2025-09-16 13:07:22,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 20 minutes, 11 seconds)
2025-09-16 13:09:18,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:09:19,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 617.65332 ± 178.835
2025-09-16 13:09:19,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [409.67206, 748.15674, 654.7757, 681.7249, 467.30334, 943.6923, 590.96576, 832.4863, 463.74808, 384.0077]
2025-09-16 13:09:19,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 143.0, 123.0, 145.0, 87.0, 172.0, 123.0, 162.0, 85.0, 84.0]
2025-09-16 13:09:19,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 17 minutes, 57 seconds)
2025-09-16 13:11:16,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:11:18,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 601.01483 ± 204.758
2025-09-16 13:11:18,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [452.84753, 603.6845, 494.44687, 980.1506, 383.7685, 493.78494, 376.34998, 949.9253, 571.9728, 703.2177]
2025-09-16 13:11:18,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 112.0, 90.0, 185.0, 69.0, 94.0, 69.0, 181.0, 104.0, 130.0]
2025-09-16 13:11:18,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 16 minutes, 9 seconds)
2025-09-16 13:13:14,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:13:15,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 595.73547 ± 146.727
2025-09-16 13:13:15,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [724.9139, 747.41943, 502.96097, 500.6117, 667.72876, 836.9778, 622.6259, 580.962, 441.90482, 331.2501]
2025-09-16 13:13:15,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [136.0, 147.0, 94.0, 92.0, 133.0, 160.0, 110.0, 103.0, 96.0, 60.0]
2025-09-16 13:13:16,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 14 minutes, 11 seconds)
2025-09-16 13:15:13,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:15:15,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 558.60486 ± 87.062
2025-09-16 13:15:15,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [580.3745, 383.2291, 501.87262, 486.73276, 483.09283, 593.4692, 596.06354, 637.3008, 674.344, 649.56866]
2025-09-16 13:15:15,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [126.0, 68.0, 109.0, 86.0, 106.0, 116.0, 107.0, 120.0, 130.0, 120.0]
2025-09-16 13:15:15,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 12 minutes, 2 seconds)
2025-09-16 13:17:11,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:17:13,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 608.01874 ± 134.499
2025-09-16 13:17:13,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [790.42224, 669.40375, 827.029, 421.8542, 586.3824, 482.3079, 580.20337, 418.19897, 698.3306, 606.0553]
2025-09-16 13:17:13,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [150.0, 127.0, 150.0, 75.0, 111.0, 98.0, 107.0, 73.0, 132.0, 108.0]
2025-09-16 13:17:13,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 10 minutes)
2025-09-16 13:19:09,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:19:11,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 678.44580 ± 179.021
2025-09-16 13:19:11,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [612.3865, 1110.6404, 421.57495, 515.4376, 816.74896, 667.79016, 750.12964, 598.68304, 689.50653, 601.5598]
2025-09-16 13:19:11,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [115.0, 201.0, 81.0, 95.0, 152.0, 129.0, 141.0, 129.0, 126.0, 112.0]
2025-09-16 13:19:11,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (678.45) for latency 9
2025-09-16 13:19:11,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 8 minutes, 14 seconds)
2025-09-16 13:21:08,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:21:10,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 661.56628 ± 203.738
2025-09-16 13:21:10,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [630.08466, 584.1348, 949.90204, 421.44534, 464.53683, 806.077, 503.6459, 1076.0261, 607.4872, 572.3229]
2025-09-16 13:21:10,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [112.0, 107.0, 188.0, 89.0, 89.0, 155.0, 99.0, 205.0, 113.0, 110.0]
2025-09-16 13:21:10,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 6 minutes, 22 seconds)
2025-09-16 13:23:07,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:23:09,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 674.62311 ± 180.444
2025-09-16 13:23:09,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [629.91296, 580.75323, 976.7169, 409.82162, 971.68274, 825.2452, 522.97614, 594.4807, 560.3548, 674.2861]
2025-09-16 13:23:09,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [118.0, 104.0, 185.0, 89.0, 177.0, 149.0, 95.0, 111.0, 100.0, 121.0]
2025-09-16 13:23:09,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 4 minutes, 37 seconds)
2025-09-16 13:25:08,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:25:09,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 667.02441 ± 195.988
2025-09-16 13:25:09,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [385.75314, 461.41675, 802.89124, 590.26404, 795.6748, 656.3938, 676.8654, 848.04675, 1027.9458, 424.99274]
2025-09-16 13:25:09,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [69.0, 86.0, 146.0, 110.0, 145.0, 140.0, 126.0, 177.0, 203.0, 89.0]
2025-09-16 13:25:09,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 2 minutes, 52 seconds)
2025-09-16 13:27:05,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:27:07,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 920.76807 ± 447.480
2025-09-16 13:27:07,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [823.67206, 339.17883, 464.56696, 702.49567, 1134.889, 1148.5527, 904.6213, 1604.5172, 408.6545, 1676.5326]
2025-09-16 13:27:07,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [153.0, 70.0, 81.0, 139.0, 209.0, 218.0, 163.0, 314.0, 78.0, 328.0]
2025-09-16 13:27:07,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (920.77) for latency 9
2025-09-16 13:27:07,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 53 seconds)
2025-09-16 13:29:03,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:29:06,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 856.92950 ± 310.968
2025-09-16 13:29:06,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [970.7819, 740.1928, 677.677, 1636.3737, 542.784, 795.46375, 676.9567, 1167.6233, 741.00507, 620.43616]
2025-09-16 13:29:06,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [177.0, 155.0, 125.0, 325.0, 116.0, 151.0, 129.0, 214.0, 144.0, 110.0]
2025-09-16 13:29:06,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 58 minutes, 54 seconds)
2025-09-16 13:31:06,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:31:08,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 530.84442 ± 141.180
2025-09-16 13:31:08,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [605.7079, 467.06558, 520.94275, 770.569, 629.90375, 407.38953, 419.63828, 740.26495, 348.2304, 398.7325]
2025-09-16 13:31:08,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [111.0, 85.0, 95.0, 139.0, 119.0, 70.0, 91.0, 150.0, 65.0, 72.0]
2025-09-16 13:31:08,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 57 minutes, 29 seconds)
2025-09-16 13:33:02,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:33:04,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 749.80682 ± 238.815
2025-09-16 13:33:04,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1089.0869, 852.5966, 962.6154, 432.93176, 712.448, 828.8433, 1078.985, 559.7382, 568.6885, 412.135]
2025-09-16 13:33:04,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [203.0, 161.0, 176.0, 94.0, 132.0, 150.0, 195.0, 102.0, 121.0, 74.0]
2025-09-16 13:33:04,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 55 minutes, 1 second)
2025-09-16 13:35:01,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:35:03,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 928.49512 ± 216.555
2025-09-16 13:35:03,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [566.25336, 764.9876, 1018.86945, 1078.9547, 1394.8488, 864.19934, 708.9078, 968.49805, 901.3487, 1018.08325]
2025-09-16 13:35:03,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 148.0, 189.0, 206.0, 261.0, 161.0, 127.0, 188.0, 165.0, 188.0]
2025-09-16 13:35:03,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (928.50) for latency 9
2025-09-16 13:35:03,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 52 minutes, 47 seconds)
2025-09-16 13:36:59,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:37:01,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 906.34259 ± 376.015
2025-09-16 13:37:01,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1532.5931, 559.13367, 777.1234, 408.9275, 589.63135, 891.1612, 1049.6338, 1369.9999, 556.38916, 1328.8328]
2025-09-16 13:37:01,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [289.0, 100.0, 152.0, 70.0, 108.0, 165.0, 189.0, 250.0, 118.0, 249.0]
2025-09-16 13:37:02,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 50 minutes, 56 seconds)
2025-09-16 13:38:58,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:39:00,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 794.68927 ± 309.974
2025-09-16 13:39:00,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [822.3166, 779.9581, 1215.4862, 353.4303, 471.95828, 1181.7057, 601.0906, 510.63983, 752.77454, 1257.5321]
2025-09-16 13:39:00,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [159.0, 142.0, 231.0, 64.0, 93.0, 219.0, 108.0, 98.0, 140.0, 250.0]
2025-09-16 13:39:00,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 49 minutes)
2025-09-16 13:40:58,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:40:59,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 683.40289 ± 259.514
2025-09-16 13:40:59,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [472.73077, 728.6535, 1293.169, 843.65094, 454.01123, 464.44992, 744.0946, 636.27136, 831.5934, 365.4043]
2025-09-16 13:40:59,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 142.0, 239.0, 161.0, 86.0, 85.0, 135.0, 131.0, 153.0, 66.0]
2025-09-16 13:40:59,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 46 minutes, 32 seconds)
2025-09-16 13:42:58,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:43:00,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 646.41248 ± 149.007
2025-09-16 13:43:00,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [354.38654, 636.18884, 743.81805, 567.0072, 799.8908, 740.7624, 557.09344, 773.08514, 466.53955, 825.3532]
2025-09-16 13:43:00,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 118.0, 134.0, 102.0, 147.0, 135.0, 104.0, 145.0, 91.0, 152.0]
2025-09-16 13:43:00,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 45 minutes, 18 seconds)
2025-09-16 13:44:56,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:44:58,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 852.18213 ± 352.528
2025-09-16 13:44:58,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1151.1592, 1254.0569, 780.16156, 1491.0492, 438.08975, 730.9252, 335.17032, 939.1515, 890.49255, 511.56458]
2025-09-16 13:44:58,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [221.0, 235.0, 141.0, 275.0, 97.0, 132.0, 60.0, 173.0, 167.0, 92.0]
2025-09-16 13:44:58,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 43 minutes, 12 seconds)
2025-09-16 13:46:57,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:46:59,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 754.64709 ± 224.201
2025-09-16 13:46:59,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [716.1007, 602.5733, 631.3983, 1266.732, 843.53827, 1069.9237, 576.1005, 582.01385, 635.93585, 622.1542]
2025-09-16 13:46:59,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [142.0, 110.0, 132.0, 234.0, 153.0, 198.0, 102.0, 115.0, 117.0, 111.0]
2025-09-16 13:46:59,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 41 minutes, 32 seconds)
2025-09-16 13:48:55,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:48:56,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 647.47937 ± 202.021
2025-09-16 13:48:56,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [872.69055, 597.4328, 610.31946, 733.96014, 538.4041, 182.8938, 710.36694, 918.5519, 512.34454, 797.8289]
2025-09-16 13:48:56,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [170.0, 125.0, 111.0, 137.0, 103.0, 34.0, 131.0, 170.0, 92.0, 148.0]
2025-09-16 13:48:56,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 39 minutes, 21 seconds)
2025-09-16 13:50:53,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:50:54,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 693.44623 ± 255.017
2025-09-16 13:50:54,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [869.4789, 593.8767, 659.68866, 444.5537, 792.424, 383.77747, 394.34427, 767.36255, 750.2726, 1278.6835]
2025-09-16 13:50:54,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [163.0, 116.0, 141.0, 102.0, 155.0, 75.0, 70.0, 149.0, 134.0, 243.0]
2025-09-16 13:50:54,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 37 minutes, 11 seconds)
2025-09-16 13:52:52,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:52:54,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 790.42853 ± 286.633
2025-09-16 13:52:54,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [980.72797, 1289.0122, 409.10638, 1009.5189, 466.0386, 1125.2318, 765.51764, 665.3251, 728.96716, 464.83936]
2025-09-16 13:52:54,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [180.0, 263.0, 76.0, 202.0, 83.0, 220.0, 138.0, 127.0, 136.0, 90.0]
2025-09-16 13:52:54,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 35 minutes, 6 seconds)
2025-09-16 13:54:52,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:54:54,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 854.74969 ± 404.040
2025-09-16 13:54:54,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1083.4009, 1632.9287, 1181.393, 803.1959, 507.6191, 120.533165, 880.3661, 502.48834, 1108.8649, 726.70734]
2025-09-16 13:54:54,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [221.0, 316.0, 222.0, 142.0, 112.0, 23.0, 170.0, 97.0, 205.0, 144.0]
2025-09-16 13:54:54,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 33 minutes, 18 seconds)
2025-09-16 13:56:51,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:56:53,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 806.08826 ± 168.119
2025-09-16 13:56:53,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1142.9857, 838.3244, 768.1691, 915.8789, 850.18774, 880.1905, 854.48456, 609.45123, 496.3941, 704.81696]
2025-09-16 13:56:53,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [210.0, 169.0, 141.0, 166.0, 157.0, 165.0, 156.0, 110.0, 96.0, 128.0]
2025-09-16 13:56:53,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 31 minutes, 3 seconds)
2025-09-16 13:58:50,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:58:52,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 828.39453 ± 348.258
2025-09-16 13:58:52,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1418.8627, 1379.0557, 834.43915, 877.7319, 814.41174, 488.45413, 860.4177, 340.67316, 888.90765, 380.99173]
2025-09-16 13:58:52,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [280.0, 258.0, 155.0, 163.0, 164.0, 88.0, 178.0, 66.0, 173.0, 80.0]
2025-09-16 13:58:52,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 29 minutes, 21 seconds)
2025-09-16 14:00:52,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:00:55,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1357.52917 ± 396.971
2025-09-16 14:00:55,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [727.3867, 919.1032, 1674.1786, 1556.4363, 1101.9618, 1485.4464, 1444.1624, 2134.2583, 1528.4303, 1003.9281]
2025-09-16 14:00:55,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [138.0, 170.0, 309.0, 293.0, 220.0, 275.0, 260.0, 422.0, 285.0, 192.0]
2025-09-16 14:00:55,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (1357.53) for latency 9
2025-09-16 14:00:55,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 28 minutes, 5 seconds)
2025-09-16 14:02:51,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:02:53,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 817.27942 ± 407.031
2025-09-16 14:02:53,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [692.6367, 878.0716, 491.3685, 469.07562, 1918.7993, 642.8307, 550.33374, 828.023, 635.60815, 1066.0471]
2025-09-16 14:02:53,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [127.0, 176.0, 97.0, 100.0, 351.0, 138.0, 117.0, 151.0, 126.0, 202.0]
2025-09-16 14:02:53,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 25 minutes, 47 seconds)
2025-09-16 14:04:53,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:04:56,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1192.84058 ± 490.225
2025-09-16 14:04:56,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1826.4432, 436.58853, 1427.4258, 1517.862, 565.1201, 991.1987, 1650.7504, 1498.5332, 526.38464, 1488.0999]
2025-09-16 14:04:56,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [346.0, 78.0, 265.0, 285.0, 123.0, 185.0, 308.0, 271.0, 108.0, 284.0]
2025-09-16 14:04:56,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 24 minutes, 16 seconds)
2025-09-16 14:06:50,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:06:53,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1007.05334 ± 240.688
2025-09-16 14:06:53,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [714.8287, 1059.5742, 958.2271, 941.54803, 980.76086, 1516.6042, 1026.8811, 646.38007, 1301.6425, 924.086]
2025-09-16 14:06:53,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [133.0, 194.0, 180.0, 174.0, 181.0, 302.0, 201.0, 114.0, 246.0, 171.0]
2025-09-16 14:06:53,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 22 minutes, 1 second)
2025-09-16 14:08:53,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:08:56,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1006.22430 ± 352.197
2025-09-16 14:08:56,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [712.4053, 1401.739, 1068.3142, 242.02393, 983.3983, 967.8044, 1641.4442, 963.98535, 1049.6549, 1031.4739]
2025-09-16 14:08:56,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [137.0, 270.0, 200.0, 44.0, 184.0, 176.0, 306.0, 177.0, 195.0, 186.0]
2025-09-16 14:08:56,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 20 minutes, 32 seconds)
2025-09-16 14:10:50,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:10:53,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 915.47577 ± 388.783
2025-09-16 14:10:53,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [310.7553, 589.2093, 1320.2814, 464.9243, 1174.1017, 1085.001, 838.46545, 1114.0636, 659.7454, 1598.2097]
2025-09-16 14:10:53,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [55.0, 110.0, 243.0, 82.0, 248.0, 209.0, 151.0, 213.0, 132.0, 297.0]
2025-09-16 14:10:53,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 17 minutes, 40 seconds)
2025-09-16 14:12:51,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:12:53,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1121.54602 ± 764.014
2025-09-16 14:12:53,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [2338.7847, 1107.599, 605.1945, 864.78754, 1051.9753, 126.89483, 794.10986, 536.00616, 1063.823, 2726.2854]
2025-09-16 14:12:53,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [433.0, 206.0, 117.0, 154.0, 192.0, 24.0, 148.0, 111.0, 196.0, 490.0]
2025-09-16 14:12:53,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 16 minutes, 3 seconds)
2025-09-16 14:14:54,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:14:57,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1090.92908 ± 525.957
2025-09-16 14:14:57,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [791.6796, 2132.0923, 293.38193, 1244.4033, 443.62332, 964.9306, 849.89703, 1083.9009, 1470.1542, 1635.228]
2025-09-16 14:14:57,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [141.0, 398.0, 52.0, 243.0, 81.0, 186.0, 173.0, 198.0, 276.0, 318.0]
2025-09-16 14:14:57,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 14 minutes, 7 seconds)
2025-09-16 14:16:54,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:16:56,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1050.56860 ± 447.421
2025-09-16 14:16:56,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1143.4589, 1786.3535, 1296.4456, 1273.1316, 546.3544, 594.56573, 895.9031, 1737.4492, 672.04987, 559.9739]
2025-09-16 14:16:56,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [207.0, 324.0, 238.0, 230.0, 95.0, 108.0, 161.0, 334.0, 139.0, 101.0]
2025-09-16 14:16:56,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 12 minutes, 25 seconds)
2025-09-16 14:18:52,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:18:56,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1500.20776 ± 980.452
2025-09-16 14:18:56,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [769.4685, 751.4511, 1124.7179, 1326.8444, 1641.0168, 803.0398, 1890.3809, 1743.0117, 4168.0405, 784.10565]
2025-09-16 14:18:56,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [154.0, 140.0, 213.0, 263.0, 314.0, 173.0, 360.0, 325.0, 794.0, 153.0]
2025-09-16 14:18:56,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (1500.21) for latency 9
2025-09-16 14:18:56,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 9 minutes, 57 seconds)
2025-09-16 14:20:54,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:20:57,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 947.12958 ± 444.095
2025-09-16 14:20:57,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [295.09143, 1080.8727, 1420.1764, 889.4444, 1280.0826, 1181.1538, 802.7655, 561.2907, 1674.6627, 285.75546]
2025-09-16 14:20:57,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [53.0, 203.0, 257.0, 176.0, 241.0, 221.0, 166.0, 121.0, 327.0, 51.0]
2025-09-16 14:20:57,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 8 minutes, 29 seconds)
2025-09-16 14:22:58,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:23:00,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 879.31366 ± 213.367
2025-09-16 14:23:00,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1119.7113, 888.3307, 352.8732, 905.78156, 838.86, 1041.4684, 1098.6083, 1013.8089, 758.67346, 775.02057]
2025-09-16 14:23:00,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [203.0, 170.0, 63.0, 180.0, 178.0, 190.0, 205.0, 183.0, 155.0, 146.0]
2025-09-16 14:23:00,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 6 minutes, 43 seconds)
2025-09-16 14:24:58,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:25:01,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1015.05682 ± 471.116
2025-09-16 14:25:01,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1034.1078, 726.9085, 532.8115, 1906.9559, 1395.0201, 585.4869, 919.7543, 1717.0178, 796.63666, 535.86847]
2025-09-16 14:25:01,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [190.0, 150.0, 115.0, 363.0, 251.0, 100.0, 168.0, 320.0, 146.0, 117.0]
2025-09-16 14:25:01,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 4 minutes, 26 seconds)
2025-09-16 14:26:57,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:27:00,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1218.99683 ± 620.294
2025-09-16 14:27:00,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1032.2349, 2159.622, 910.848, 2527.9355, 1267.8147, 912.32324, 1000.82104, 605.35565, 1321.2635, 451.74948]
2025-09-16 14:27:00,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [194.0, 404.0, 178.0, 494.0, 239.0, 180.0, 201.0, 111.0, 255.0, 80.0]
2025-09-16 14:27:00,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 2 minutes, 24 seconds)
2025-09-16 14:29:03,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:29:06,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1221.27454 ± 575.637
2025-09-16 14:29:06,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [466.88693, 1431.6356, 1333.2886, 1623.7607, 715.79083, 337.2116, 1087.8381, 2314.1763, 1743.266, 1158.8917]
2025-09-16 14:29:06,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 282.0, 241.0, 320.0, 153.0, 74.0, 227.0, 440.0, 328.0, 230.0]
2025-09-16 14:29:06,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 1 minute, 3 seconds)
2025-09-16 14:31:03,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:31:09,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 2232.97095 ± 1448.137
2025-09-16 14:31:09,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [2374.7551, 2120.0242, 408.67456, 1984.7042, 1932.025, 4296.391, 1912.4269, 816.62933, 5369.8276, 1114.2498]
2025-09-16 14:31:09,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [434.0, 403.0, 72.0, 372.0, 353.0, 796.0, 366.0, 176.0, 1000.0, 216.0]
2025-09-16 14:31:09,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (2232.97) for latency 9
2025-09-16 14:31:09,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 59 minutes, 9 seconds)
2025-09-16 14:33:03,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:33:08,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1691.34692 ± 1308.089
2025-09-16 14:33:08,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1278.8623, 649.7287, 1931.6086, 977.58716, 1544.3461, 2292.6345, 5292.3477, 502.18234, 1017.39685, 1426.7761]
2025-09-16 14:33:08,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [245.0, 128.0, 360.0, 171.0, 293.0, 439.0, 1000.0, 86.0, 201.0, 265.0]
2025-09-16 14:33:08,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 56 minutes, 42 seconds)
2025-09-16 14:35:10,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:35:16,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 2145.23730 ± 1637.970
2025-09-16 14:35:16,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [894.01465, 532.74585, 1724.3436, 840.19684, 3535.066, 473.4399, 2845.5498, 5251.831, 1042.2358, 4312.949]
2025-09-16 14:35:16,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [167.0, 92.0, 322.0, 163.0, 659.0, 84.0, 551.0, 1000.0, 200.0, 815.0]
2025-09-16 14:35:16,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 55 minutes, 18 seconds)
2025-09-16 14:37:17,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:37:21,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1738.31030 ± 951.718
2025-09-16 14:37:21,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1064.6372, 3985.6545, 2150.9526, 1497.3651, 1993.625, 815.8552, 2114.681, 2095.8625, 1330.9337, 333.5374]
2025-09-16 14:37:21,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [191.0, 756.0, 403.0, 283.0, 385.0, 158.0, 409.0, 406.0, 259.0, 59.0]
2025-09-16 14:37:21,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 53 minutes, 48 seconds)
2025-09-16 14:39:15,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:39:18,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 942.80627 ± 430.008
2025-09-16 14:39:18,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1120.8864, 872.82623, 754.5256, 717.6322, 438.3158, 1319.7008, 658.1907, 1304.2794, 1847.0322, 394.67282]
2025-09-16 14:39:18,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [201.0, 177.0, 153.0, 141.0, 76.0, 249.0, 125.0, 242.0, 370.0, 68.0]
2025-09-16 14:39:18,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 50 minutes, 58 seconds)
2025-09-16 14:41:20,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:41:26,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 2310.50366 ± 1558.645
2025-09-16 14:41:26,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [410.33466, 3656.4673, 5435.769, 510.636, 1697.1371, 1065.6785, 3333.716, 2833.84, 959.9162, 3201.5425]
2025-09-16 14:41:26,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 696.0, 1000.0, 105.0, 333.0, 195.0, 623.0, 529.0, 193.0, 611.0]
2025-09-16 14:41:26,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (2310.50) for latency 9
2025-09-16 14:41:26,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 49 minutes, 20 seconds)
2025-09-16 14:43:25,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:43:32,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 2394.89111 ± 1676.741
2025-09-16 14:43:32,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1068.2214, 4492.496, 5068.1104, 3245.627, 4422.155, 2027.8759, 847.47504, 554.2349, 493.76794, 1728.9489]
2025-09-16 14:43:32,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [215.0, 850.0, 956.0, 597.0, 809.0, 368.0, 153.0, 94.0, 87.0, 321.0]
2025-09-16 14:43:32,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (2394.89) for latency 9
2025-09-16 14:43:32,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 47 minutes, 50 seconds)
2025-09-16 14:45:28,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:45:31,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1329.54919 ± 864.880
2025-09-16 14:45:31,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1093.6821, 185.07784, 2097.7969, 3368.6394, 814.1086, 467.8113, 1642.0187, 855.7182, 1507.7601, 1262.8789]
2025-09-16 14:45:31,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [190.0, 34.0, 393.0, 644.0, 158.0, 80.0, 309.0, 158.0, 283.0, 247.0]
2025-09-16 14:45:31,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 45 minutes, 8 seconds)
2025-09-16 14:47:29,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:47:36,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 2829.24756 ± 1524.856
2025-09-16 14:47:36,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [2189.21, 3930.1533, 4708.167, 4510.422, 1755.6163, 1144.2174, 5263.7876, 1489.0989, 1025.7809, 2276.0217]
2025-09-16 14:47:36,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [400.0, 755.0, 902.0, 858.0, 324.0, 225.0, 1000.0, 297.0, 200.0, 426.0]
2025-09-16 14:47:36,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (2829.25) for latency 9
2025-09-16 14:47:36,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 43 minutes, 3 seconds)
2025-09-16 14:49:41,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:49:48,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 2393.19287 ± 1726.370
2025-09-16 14:49:48,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [647.40424, 677.4509, 5261.6323, 5279.923, 368.07062, 3594.3098, 1728.3271, 1917.7773, 2930.1995, 1526.8331]
2025-09-16 14:49:48,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [119.0, 124.0, 1000.0, 1000.0, 65.0, 691.0, 330.0, 349.0, 558.0, 286.0]
2025-09-16 14:49:48,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 41 minutes, 58 seconds)
2025-09-16 14:51:42,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:51:48,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 2193.55054 ± 1678.185
2025-09-16 14:51:48,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5270.811, 1165.5128, 1477.7251, 2910.8838, 482.23163, 826.34375, 1511.3304, 1063.0957, 1876.7878, 5350.7827]
2025-09-16 14:51:48,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 208.0, 268.0, 540.0, 87.0, 145.0, 284.0, 198.0, 358.0, 1000.0]
2025-09-16 14:51:48,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 39 minutes, 24 seconds)
2025-09-16 14:53:51,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:53:56,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1773.34180 ± 1352.448
2025-09-16 14:53:56,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [364.42538, 488.30795, 920.0871, 4139.3223, 932.2249, 1562.2272, 2316.244, 4390.006, 1254.0052, 1366.5685]
2025-09-16 14:53:56,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 109.0, 162.0, 786.0, 182.0, 296.0, 438.0, 827.0, 247.0, 273.0]
2025-09-16 14:53:56,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 37 minutes, 26 seconds)
2025-09-16 14:55:54,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:55:59,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 2036.34766 ± 1721.917
2025-09-16 14:55:59,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [410.8624, 2907.1748, 455.36703, 5299.4707, 1644.1044, 616.91003, 635.1529, 646.7771, 3495.4211, 4252.236]
2025-09-16 14:55:59,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 547.0, 77.0, 1000.0, 298.0, 115.0, 116.0, 132.0, 644.0, 799.0]
2025-09-16 14:55:59,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 35 minutes, 35 seconds)
2025-09-16 14:57:56,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:58:03,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 2381.65991 ± 1561.211
2025-09-16 14:58:03,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1441.6882, 1435.1952, 545.1721, 4005.72, 1238.1162, 5280.7046, 2513.0237, 4229.3076, 2589.3416, 538.33014]
2025-09-16 14:58:03,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [257.0, 292.0, 103.0, 760.0, 223.0, 1000.0, 474.0, 791.0, 471.0, 92.0]
2025-09-16 14:58:03,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 33 minutes, 24 seconds)
2025-09-16 15:00:01,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:00:07,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 2161.41138 ± 1821.857
2025-09-16 15:00:07,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [245.5068, 4780.6855, 1600.7424, 324.96466, 5301.0537, 3110.372, 3574.7004, 359.75797, 1850.709, 465.61853]
2025-09-16 15:00:07,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [48.0, 893.0, 290.0, 58.0, 1000.0, 602.0, 657.0, 63.0, 348.0, 84.0]
2025-09-16 15:00:07,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 30 minutes, 58 seconds)
2025-09-16 15:02:08,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:02:13,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 2046.39685 ± 1775.150
2025-09-16 15:02:13,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5455.3154, 454.97028, 1056.5378, 383.87592, 1777.9314, 5434.456, 1735.2708, 1300.4929, 2005.5443, 859.573]
2025-09-16 15:02:13,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 81.0, 202.0, 70.0, 334.0, 1000.0, 316.0, 241.0, 376.0, 189.0]
2025-09-16 15:02:13,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 29 minutes, 10 seconds)
2025-09-16 15:04:11,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:04:18,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 2525.43799 ± 1275.100
2025-09-16 15:04:18,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [3322.1648, 2934.8848, 1866.5117, 1672.8423, 3425.614, 2291.3367, 326.5977, 2511.4358, 1596.7847, 5306.207]
2025-09-16 15:04:18,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [622.0, 544.0, 340.0, 308.0, 638.0, 445.0, 57.0, 486.0, 306.0, 1000.0]
2025-09-16 15:04:18,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 26 minutes, 58 seconds)
2025-09-16 15:06:22,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:06:29,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 2442.74023 ± 1927.278
2025-09-16 15:06:29,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [600.72754, 2051.7058, 5410.8286, 963.95886, 1531.9314, 2457.1577, 334.37964, 4911.553, 5364.3506, 800.8096]
2025-09-16 15:06:29,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [121.0, 371.0, 1000.0, 185.0, 313.0, 454.0, 62.0, 905.0, 1000.0, 151.0]
2025-09-16 15:06:29,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 25 minutes, 11 seconds)
2025-09-16 15:08:21,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:08:27,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 2155.27466 ± 1340.594
2025-09-16 15:08:27,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [695.08673, 747.161, 707.7357, 2550.148, 4059.9907, 1936.5878, 3655.898, 2857.7905, 3781.1494, 561.1997]
2025-09-16 15:08:27,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [122.0, 154.0, 148.0, 465.0, 782.0, 360.0, 709.0, 531.0, 715.0, 98.0]
2025-09-16 15:08:27,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 22 minutes, 54 seconds)
2025-09-16 15:10:36,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:10:43,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 2560.88379 ± 1951.849
2025-09-16 15:10:43,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1080.6862, 5335.86, 4530.746, 2741.7986, 3870.5776, 5351.9844, 461.4763, 468.64886, 1316.5591, 450.50116]
2025-09-16 15:10:43,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [202.0, 1000.0, 854.0, 521.0, 748.0, 1000.0, 99.0, 95.0, 257.0, 82.0]
2025-09-16 15:10:43,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 21 minutes, 12 seconds)
2025-09-16 15:12:42,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:12:50,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 2778.51147 ± 2062.447
2025-09-16 15:12:50,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5241.835, 609.0759, 4985.6196, 2387.0906, 5187.3716, 434.4613, 462.9302, 2556.8145, 5191.9785, 727.93677]
2025-09-16 15:12:50,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 124.0, 949.0, 445.0, 1000.0, 78.0, 101.0, 493.0, 1000.0, 148.0]
2025-09-16 15:12:50,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 19 minutes, 6 seconds)
2025-09-16 15:14:40,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:14:44,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 1532.19019 ± 1161.328
2025-09-16 15:14:44,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [491.00162, 317.74374, 2334.352, 345.9875, 691.17316, 3491.4612, 2916.8564, 2458.2524, 1926.4907, 348.5842]
2025-09-16 15:14:44,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 58.0, 423.0, 71.0, 121.0, 640.0, 549.0, 456.0, 361.0, 62.0]
2025-09-16 15:14:44,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 16 minutes, 42 seconds)
2025-09-16 15:16:44,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:16:57,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 4566.50977 ± 1432.426
2025-09-16 15:16:57,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5274.24, 5236.3457, 5339.7817, 2020.1465, 5297.096, 1410.3167, 5306.8013, 5264.3535, 5246.144, 5269.8745]
2025-09-16 15:16:57,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 385.0, 1000.0, 268.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:16:57,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1226 [INFO]: New best (4566.51) for latency 9
2025-09-16 15:16:57,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 14 minutes, 38 seconds)
2025-09-16 15:19:02,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:19:09,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 2610.63721 ± 1792.654
2025-09-16 15:19:09,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1091.1598, 780.4823, 1082.8956, 2114.9912, 3820.1467, 4373.8955, 602.1673, 5314.2363, 5269.9976, 1656.4003]
2025-09-16 15:19:09,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [194.0, 137.0, 216.0, 427.0, 723.0, 836.0, 112.0, 1000.0, 1000.0, 328.0]
2025-09-16 15:19:09,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 12 minutes, 50 seconds)
2025-09-16 15:21:11,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:21:17,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 2472.98975 ± 2075.159
2025-09-16 15:21:17,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [733.0882, 1036.333, 1048.9392, 5375.6978, 5419.297, 934.802, 3569.9304, 5360.021, 859.8872, 391.90424]
2025-09-16 15:21:17,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [142.0, 200.0, 206.0, 1000.0, 1000.0, 179.0, 678.0, 1000.0, 180.0, 69.0]
2025-09-16 15:21:17,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 10 minutes, 33 seconds)
2025-09-16 15:23:10,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:23:19,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 2828.38086 ± 2047.984
2025-09-16 15:23:19,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [5087.3457, 5256.4116, 5213.02, 2688.0522, 2137.4475, 1006.8672, 447.2753, 385.44022, 5235.845, 826.1035]
2025-09-16 15:23:19,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 514.0, 434.0, 196.0, 78.0, 87.0, 1000.0, 158.0]
2025-09-16 15:23:19,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes, 22 seconds)
2025-09-16 15:25:20,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:25:30,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 3837.63037 ± 1850.923
2025-09-16 15:25:30,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [1886.7537, 5379.1343, 5362.4043, 5350.3477, 5329.5356, 1529.6023, 4964.386, 5357.017, 514.17596, 2702.9482]
2025-09-16 15:25:30,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [345.0, 1000.0, 1000.0, 1000.0, 1000.0, 284.0, 943.0, 1000.0, 90.0, 521.0]
2025-09-16 15:25:30,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 27 seconds)
2025-09-16 15:27:34,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:27:46,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 4510.31250 ± 1553.539
2025-09-16 15:27:46,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [2788.6904, 5360.9673, 4529.7334, 5314.3623, 5328.557, 5321.262, 5289.796, 5413.609, 457.19952, 5298.9497]
2025-09-16 15:27:46,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [539.0, 1000.0, 860.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 83.0, 1000.0]
2025-09-16 15:27:46,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 19 seconds)
2025-09-16 15:29:48,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:29:58,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 3769.25537 ± 1831.312
2025-09-16 15:29:58,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [896.53094, 3119.8955, 514.19104, 5368.132, 4394.7456, 5321.018, 5240.8403, 2284.6614, 5275.7695, 5276.7695]
2025-09-16 15:29:58,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [177.0, 572.0, 89.0, 1000.0, 812.0, 1000.0, 1000.0, 416.0, 1000.0, 1000.0]
2025-09-16 15:29:58,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 9 seconds)
2025-09-16 15:31:52,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:32:00,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1221 [DEBUG]: Total Reward: 2961.31201 ± 2084.418
2025-09-16 15:32:00,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1222 [DEBUG]: All rewards: [2702.1587, 1633.3092, 505.05426, 746.3755, 5324.247, 5421.0674, 1836.4395, 5416.0195, 5437.8267, 590.6206]
2025-09-16 15:32:00,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1223 [DEBUG]: All trajectory lengths: [509.0, 313.0, 105.0, 149.0, 1000.0, 1000.0, 345.0, 1000.0, 1000.0, 109.0]
2025-09-16 15:32:00,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-humanoid):1251 [DEBUG]: Training session finished
