2025-09-16 11:26:19,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.050-delay_6
2025-09-16 11:26:19,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.050-delay_6
2025-09-16 11:26:19,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'6': <latency_env.delayed_mdp.ConstantDelay object at 0x1538763ec590>}
2025-09-16 11:26:19,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 11:26:19,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 11:26:19,124 baseline-bpql-noisepromille50-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=478, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 11:26:19,124 baseline-bpql-noisepromille50-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 11:26:20,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 11:26:20,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 11:28:04,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:28:04,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 222.91080 ± 65.569
2025-09-16 11:28:04,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [186.04758, 129.99506, 153.06761, 267.37537, 316.72705, 232.1861, 213.17885, 142.58533, 276.47998, 311.4651]
2025-09-16 11:28:04,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [40.0, 28.0, 32.0, 57.0, 63.0, 49.0, 45.0, 32.0, 54.0, 60.0]
2025-09-16 11:28:04,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (222.91) for latency 6
2025-09-16 11:28:04,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 51 minutes, 56 seconds)
2025-09-16 11:29:57,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:29:57,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 376.91028 ± 37.934
2025-09-16 11:29:57,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [396.74066, 433.05307, 318.32858, 371.22076, 331.55194, 425.63422, 331.33832, 370.17706, 403.70587, 387.35257]
2025-09-16 11:29:57,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 80.0, 58.0, 69.0, 61.0, 79.0, 61.0, 68.0, 75.0, 72.0]
2025-09-16 11:29:57,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (376.91) for latency 6
2025-09-16 11:29:57,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 57 minutes, 22 seconds)
2025-09-16 11:31:49,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:31:50,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 441.80048 ± 63.536
2025-09-16 11:31:50,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [541.8634, 435.5384, 379.29318, 422.4385, 317.76505, 507.29437, 424.6606, 484.097, 497.5148, 407.5395]
2025-09-16 11:31:50,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 84.0, 74.0, 84.0, 59.0, 100.0, 81.0, 94.0, 96.0, 89.0]
2025-09-16 11:31:50,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (441.80) for latency 6
2025-09-16 11:31:50,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 57 minutes, 49 seconds)
2025-09-16 11:33:42,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:33:43,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 387.47559 ± 62.493
2025-09-16 11:33:43,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [461.83334, 326.09164, 368.84903, 442.3927, 313.37482, 267.33765, 397.99548, 421.399, 428.94424, 446.53775]
2025-09-16 11:33:43,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 65.0, 78.0, 90.0, 60.0, 55.0, 82.0, 89.0, 83.0, 85.0]
2025-09-16 11:33:43,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 57 minutes, 10 seconds)
2025-09-16 11:35:36,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:35:37,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 460.05145 ± 67.686
2025-09-16 11:35:37,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [514.90875, 340.76025, 395.27094, 431.28705, 535.4924, 581.34186, 487.09158, 422.43427, 463.34152, 428.58618]
2025-09-16 11:35:37,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 65.0, 75.0, 80.0, 101.0, 121.0, 93.0, 80.0, 88.0, 90.0]
2025-09-16 11:35:37,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (460.05) for latency 6
2025-09-16 11:35:37,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 56 minutes, 23 seconds)
2025-09-16 11:37:30,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:37:31,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 388.24408 ± 102.113
2025-09-16 11:37:31,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [473.24118, 300.86063, 544.9391, 216.55333, 347.17682, 546.57635, 395.1155, 392.03616, 367.9846, 297.95718]
2025-09-16 11:37:31,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [89.0, 57.0, 107.0, 45.0, 64.0, 104.0, 75.0, 78.0, 83.0, 57.0]
2025-09-16 11:37:31,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 57 minutes, 31 seconds)
2025-09-16 11:39:23,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:39:24,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 507.77377 ± 124.947
2025-09-16 11:39:24,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [576.80914, 443.8284, 509.3006, 465.41605, 718.2199, 563.3332, 310.7929, 405.45212, 694.6689, 389.9163]
2025-09-16 11:39:24,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [107.0, 82.0, 93.0, 86.0, 145.0, 103.0, 57.0, 75.0, 133.0, 72.0]
2025-09-16 11:39:24,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (507.77) for latency 6
2025-09-16 11:39:24,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 55 minutes, 39 seconds)
2025-09-16 11:41:17,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:41:18,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 473.16534 ± 121.123
2025-09-16 11:41:18,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [406.2176, 725.8135, 588.0776, 331.01047, 519.5008, 550.29205, 341.75043, 413.20255, 505.10495, 350.68362]
2025-09-16 11:41:18,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [73.0, 153.0, 110.0, 61.0, 95.0, 104.0, 64.0, 86.0, 93.0, 68.0]
2025-09-16 11:41:18,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 54 minutes, 3 seconds)
2025-09-16 11:43:10,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:43:11,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 518.75842 ± 111.178
2025-09-16 11:43:11,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [440.55304, 453.8241, 651.5055, 469.3552, 528.8576, 650.53564, 405.70898, 544.42633, 344.9283, 697.8895]
2025-09-16 11:43:11,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 93.0, 140.0, 104.0, 99.0, 125.0, 77.0, 116.0, 64.0, 125.0]
2025-09-16 11:43:11,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (518.76) for latency 6
2025-09-16 11:43:11,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 52 minutes, 20 seconds)
2025-09-16 11:45:04,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:45:06,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 536.09979 ± 132.564
2025-09-16 11:45:06,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [464.75122, 395.34253, 437.28387, 759.66125, 429.5153, 480.3006, 477.50867, 688.93774, 751.1938, 476.5032]
2025-09-16 11:45:06,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 86.0, 82.0, 142.0, 89.0, 105.0, 87.0, 143.0, 154.0, 88.0]
2025-09-16 11:45:06,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (536.10) for latency 6
2025-09-16 11:45:06,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 50 minutes, 36 seconds)
2025-09-16 11:46:59,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:47:01,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 475.82953 ± 104.880
2025-09-16 11:47:01,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [523.50836, 391.9134, 647.84296, 316.46466, 424.90533, 478.677, 393.00934, 426.26804, 497.34866, 658.35785]
2025-09-16 11:47:01,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 73.0, 123.0, 70.0, 80.0, 101.0, 78.0, 86.0, 96.0, 140.0]
2025-09-16 11:47:01,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 48 minutes, 59 seconds)
2025-09-16 11:48:52,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:48:54,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 635.84070 ± 153.108
2025-09-16 11:48:54,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [676.4466, 456.85187, 586.1929, 592.5448, 613.66815, 991.1685, 560.55426, 422.54935, 752.1034, 706.32666]
2025-09-16 11:48:54,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [127.0, 85.0, 109.0, 111.0, 116.0, 196.0, 106.0, 90.0, 154.0, 134.0]
2025-09-16 11:48:54,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (635.84) for latency 6
2025-09-16 11:48:54,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 47 minutes, 11 seconds)
2025-09-16 11:50:47,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:50:48,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 608.04749 ± 69.475
2025-09-16 11:50:48,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [672.22205, 496.89014, 691.1957, 692.1955, 550.49915, 658.6621, 539.0588, 650.8797, 536.7982, 592.0734]
2025-09-16 11:50:48,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [131.0, 102.0, 135.0, 149.0, 106.0, 133.0, 102.0, 142.0, 112.0, 124.0]
2025-09-16 11:50:48,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 45 minutes, 30 seconds)
2025-09-16 11:52:41,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:52:42,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 502.44614 ± 72.238
2025-09-16 11:52:42,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [443.5698, 529.66656, 549.9241, 525.9736, 514.23145, 544.7018, 332.08026, 566.1761, 580.0248, 438.11288]
2025-09-16 11:52:42,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 97.0, 102.0, 97.0, 96.0, 100.0, 63.0, 105.0, 109.0, 83.0]
2025-09-16 11:52:42,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 43 minutes, 44 seconds)
2025-09-16 11:54:36,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:54:37,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 564.27661 ± 94.216
2025-09-16 11:54:37,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [727.6993, 539.8189, 565.3803, 473.44638, 486.74887, 578.3419, 751.1423, 544.09344, 493.196, 482.89896]
2025-09-16 11:54:37,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [142.0, 108.0, 107.0, 92.0, 92.0, 113.0, 159.0, 104.0, 95.0, 92.0]
2025-09-16 11:54:37,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 41 minutes, 52 seconds)
2025-09-16 11:56:31,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:56:32,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 607.96680 ± 112.033
2025-09-16 11:56:32,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [721.8198, 569.3902, 468.3808, 578.04663, 825.4553, 538.7838, 435.73758, 598.68176, 690.7576, 652.614]
2025-09-16 11:56:32,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [136.0, 105.0, 100.0, 105.0, 159.0, 108.0, 93.0, 127.0, 134.0, 122.0]
2025-09-16 11:56:32,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 40 minutes, 5 seconds)
2025-09-16 11:58:25,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 11:58:27,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 597.00574 ± 96.832
2025-09-16 11:58:27,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [596.3666, 485.04437, 555.47363, 729.69244, 518.284, 665.4835, 486.59204, 652.81, 513.6974, 766.6132]
2025-09-16 11:58:27,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [122.0, 93.0, 105.0, 153.0, 111.0, 140.0, 88.0, 142.0, 112.0, 161.0]
2025-09-16 11:58:27,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 38 minutes, 32 seconds)
2025-09-16 12:00:20,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:00:22,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 696.47797 ± 161.411
2025-09-16 12:00:22,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [844.24817, 641.60364, 982.65216, 793.74866, 513.0791, 547.9658, 490.76212, 576.5869, 698.9341, 875.1991]
2025-09-16 12:00:22,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [161.0, 133.0, 192.0, 157.0, 108.0, 104.0, 103.0, 117.0, 147.0, 181.0]
2025-09-16 12:00:22,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (696.48) for latency 6
2025-09-16 12:00:22,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 36 minutes, 42 seconds)
2025-09-16 12:02:15,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:02:17,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 748.75403 ± 143.639
2025-09-16 12:02:17,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [641.67175, 690.70355, 766.9872, 761.4204, 768.0515, 835.0521, 1054.0978, 831.0193, 666.069, 472.46756]
2025-09-16 12:02:17,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [119.0, 128.0, 142.0, 147.0, 155.0, 156.0, 205.0, 173.0, 135.0, 95.0]
2025-09-16 12:02:17,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (748.75) for latency 6
2025-09-16 12:02:17,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 35 minutes, 9 seconds)
2025-09-16 12:04:10,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:04:12,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 958.19891 ± 216.291
2025-09-16 12:04:12,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1187.2814, 878.1802, 758.3435, 922.67865, 645.155, 1145.8677, 1169.1273, 994.13727, 1247.2987, 633.9199]
2025-09-16 12:04:12,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [230.0, 166.0, 161.0, 176.0, 141.0, 221.0, 220.0, 190.0, 257.0, 116.0]
2025-09-16 12:04:12,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (958.20) for latency 6
2025-09-16 12:04:12,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 33 minutes, 19 seconds)
2025-09-16 12:06:06,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:06:08,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 716.29858 ± 172.923
2025-09-16 12:06:08,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [616.6102, 920.9017, 696.3142, 499.397, 646.369, 1083.1885, 525.37573, 604.0706, 756.3196, 814.4389]
2025-09-16 12:06:08,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 177.0, 130.0, 93.0, 137.0, 208.0, 97.0, 128.0, 147.0, 153.0]
2025-09-16 12:06:08,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 31 minutes, 33 seconds)
2025-09-16 12:08:02,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:08:04,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 835.38232 ± 183.792
2025-09-16 12:08:04,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1001.88806, 764.74945, 661.0651, 528.4894, 1066.1705, 1157.163, 790.6673, 813.5439, 695.1504, 874.93616]
2025-09-16 12:08:04,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [191.0, 147.0, 139.0, 115.0, 222.0, 221.0, 156.0, 155.0, 133.0, 175.0]
2025-09-16 12:08:04,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 30 minutes)
2025-09-16 12:09:58,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:10:00,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 703.58411 ± 209.280
2025-09-16 12:10:00,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1207.6212, 718.76825, 717.91315, 598.3976, 679.33875, 691.37415, 847.6292, 418.72546, 439.6989, 716.37463]
2025-09-16 12:10:00,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [233.0, 137.0, 143.0, 119.0, 136.0, 138.0, 170.0, 93.0, 98.0, 146.0]
2025-09-16 12:10:00,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 28 minutes, 20 seconds)
2025-09-16 12:11:54,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:11:56,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 694.11389 ± 103.442
2025-09-16 12:11:56,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [644.9296, 754.9243, 738.67487, 752.1367, 464.54572, 719.60443, 860.63245, 712.9475, 715.4347, 577.3088]
2025-09-16 12:11:56,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [130.0, 148.0, 146.0, 145.0, 97.0, 143.0, 169.0, 150.0, 138.0, 119.0]
2025-09-16 12:11:56,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 26 minutes, 37 seconds)
2025-09-16 12:13:50,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:13:51,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 760.05847 ± 209.865
2025-09-16 12:13:51,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [577.78455, 1121.0807, 675.7061, 534.5956, 870.6322, 776.8454, 1079.1945, 870.5443, 518.31866, 575.883]
2025-09-16 12:13:51,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [120.0, 218.0, 130.0, 101.0, 165.0, 151.0, 213.0, 165.0, 97.0, 110.0]
2025-09-16 12:13:51,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 24 minutes, 49 seconds)
2025-09-16 12:15:46,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:15:48,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 858.75848 ± 217.276
2025-09-16 12:15:48,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [617.65405, 564.7215, 908.65204, 834.755, 1261.7526, 1064.0547, 890.7709, 1062.852, 793.7584, 588.61334]
2025-09-16 12:15:48,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [130.0, 112.0, 192.0, 166.0, 250.0, 209.0, 171.0, 213.0, 162.0, 125.0]
2025-09-16 12:15:48,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 23 minutes, 9 seconds)
2025-09-16 12:17:43,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:17:46,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 877.70764 ± 171.334
2025-09-16 12:17:46,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [861.61676, 690.698, 807.5234, 1087.5487, 1228.2506, 843.039, 950.55927, 618.8662, 917.85736, 771.11755]
2025-09-16 12:17:46,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [159.0, 136.0, 150.0, 207.0, 242.0, 156.0, 182.0, 114.0, 177.0, 145.0]
2025-09-16 12:17:46,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 21 minutes, 30 seconds)
2025-09-16 12:19:38,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:19:41,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 958.27112 ± 231.608
2025-09-16 12:19:41,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [660.6765, 652.3767, 983.773, 870.19617, 1129.0576, 848.0718, 1034.035, 1111.4078, 1466.3029, 826.81305]
2025-09-16 12:19:41,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [124.0, 120.0, 184.0, 165.0, 221.0, 163.0, 197.0, 231.0, 281.0, 154.0]
2025-09-16 12:19:41,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (958.27) for latency 6
2025-09-16 12:19:41,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 19 minutes, 30 seconds)
2025-09-16 12:21:36,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:21:38,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 753.70544 ± 154.764
2025-09-16 12:21:38,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [862.25977, 678.9685, 648.6612, 947.663, 698.91956, 1098.0613, 594.6403, 637.49994, 642.0613, 728.3189]
2025-09-16 12:21:38,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [172.0, 133.0, 121.0, 182.0, 139.0, 207.0, 120.0, 120.0, 123.0, 136.0]
2025-09-16 12:21:38,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 17 minutes, 46 seconds)
2025-09-16 12:23:32,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:23:35,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1122.41589 ± 307.628
2025-09-16 12:23:35,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1037.6342, 805.53766, 1877.8623, 1222.4214, 1406.7275, 1138.8519, 909.3975, 853.2663, 1091.197, 881.26276]
2025-09-16 12:23:35,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [202.0, 173.0, 373.0, 234.0, 274.0, 220.0, 177.0, 163.0, 212.0, 176.0]
2025-09-16 12:23:35,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (1122.42) for latency 6
2025-09-16 12:23:35,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 16 minutes, 14 seconds)
2025-09-16 12:25:29,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:25:32,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1097.90173 ± 307.568
2025-09-16 12:25:32,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1378.4473, 1653.5939, 742.3734, 1140.6066, 1215.5177, 1038.1636, 766.29694, 671.78015, 1416.0604, 956.178]
2025-09-16 12:25:32,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [267.0, 323.0, 140.0, 219.0, 237.0, 197.0, 154.0, 134.0, 282.0, 186.0]
2025-09-16 12:25:32,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 14 minutes, 17 seconds)
2025-09-16 12:27:28,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:27:31,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 970.48163 ± 220.414
2025-09-16 12:27:31,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [996.91614, 698.70355, 868.24115, 651.1662, 1081.9563, 933.2441, 1271.2234, 1118.7081, 757.4695, 1327.1888]
2025-09-16 12:27:31,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [193.0, 130.0, 167.0, 126.0, 203.0, 180.0, 256.0, 218.0, 143.0, 254.0]
2025-09-16 12:27:31,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 12 minutes, 36 seconds)
2025-09-16 12:29:24,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:29:27,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1391.07593 ± 603.455
2025-09-16 12:29:27,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1135.4196, 1067.6095, 1036.4698, 1512.11, 533.1275, 2020.2101, 856.04364, 2579.9836, 1124.2294, 2045.5549]
2025-09-16 12:29:27,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [220.0, 208.0, 200.0, 303.0, 105.0, 392.0, 171.0, 502.0, 224.0, 424.0]
2025-09-16 12:29:27,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (1391.08) for latency 6
2025-09-16 12:29:27,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 10 minutes, 58 seconds)
2025-09-16 12:31:23,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:31:25,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 990.40967 ± 189.365
2025-09-16 12:31:25,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [959.10785, 1434.9231, 1031.1871, 642.0061, 950.2415, 979.31903, 834.9314, 991.0477, 1095.4851, 985.84766]
2025-09-16 12:31:25,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [182.0, 286.0, 214.0, 127.0, 181.0, 189.0, 166.0, 188.0, 219.0, 193.0]
2025-09-16 12:31:25,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 9 minutes, 10 seconds)
2025-09-16 12:33:19,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:33:22,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1235.92371 ± 600.956
2025-09-16 12:33:22,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1776.4862, 1091.8147, 1133.6088, 805.5578, 643.60004, 1054.4126, 2793.3608, 1107.3159, 709.7866, 1243.2941]
2025-09-16 12:33:22,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [363.0, 215.0, 230.0, 159.0, 121.0, 201.0, 552.0, 209.0, 137.0, 250.0]
2025-09-16 12:33:23,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 7 minutes, 13 seconds)
2025-09-16 12:35:17,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:35:20,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 898.60583 ± 357.789
2025-09-16 12:35:20,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1664.2269, 1126.7394, 534.89154, 1062.14, 573.8526, 591.0523, 1107.6891, 842.72516, 435.9725, 1046.7684]
2025-09-16 12:35:20,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [324.0, 207.0, 97.0, 208.0, 120.0, 116.0, 214.0, 161.0, 81.0, 200.0]
2025-09-16 12:35:20,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 5 minutes, 20 seconds)
2025-09-16 12:37:15,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:37:18,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1256.14319 ± 489.540
2025-09-16 12:37:18,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [917.41626, 979.6889, 1074.9928, 2236.6604, 975.1645, 984.9376, 1244.9939, 1411.7416, 657.53784, 2078.2983]
2025-09-16 12:37:18,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [172.0, 205.0, 213.0, 438.0, 184.0, 193.0, 253.0, 274.0, 131.0, 405.0]
2025-09-16 12:37:18,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 3 minutes, 19 seconds)
2025-09-16 12:39:13,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:39:16,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1235.90784 ± 265.399
2025-09-16 12:39:16,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1099.5901, 1350.5114, 1542.236, 852.1205, 1793.4247, 1178.3606, 1153.0465, 1160.4243, 1308.8833, 920.4801]
2025-09-16 12:39:16,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [215.0, 255.0, 298.0, 158.0, 345.0, 219.0, 216.0, 221.0, 251.0, 182.0]
2025-09-16 12:39:16,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 1 minute, 35 seconds)
2025-09-16 12:41:13,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:41:16,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1149.41431 ± 652.426
2025-09-16 12:41:16,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [687.10504, 1605.9097, 1133.2858, 612.04114, 2861.0752, 1143.4823, 748.15344, 576.9864, 1299.7328, 826.3709]
2025-09-16 12:41:16,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [128.0, 318.0, 215.0, 114.0, 564.0, 219.0, 140.0, 107.0, 248.0, 154.0]
2025-09-16 12:41:16,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 7 seconds)
2025-09-16 12:43:09,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:43:13,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1520.63306 ± 390.695
2025-09-16 12:43:13,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1036.4359, 1921.0297, 1479.2328, 1472.8298, 1408.7223, 1214.8427, 1556.3362, 2416.9656, 1631.6477, 1068.2881]
2025-09-16 12:43:13,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [208.0, 382.0, 310.0, 303.0, 283.0, 255.0, 305.0, 478.0, 330.0, 209.0]
2025-09-16 12:43:13,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (1520.63) for latency 6
2025-09-16 12:43:13,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 58 minutes, 3 seconds)
2025-09-16 12:45:09,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:45:12,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1164.92322 ± 394.669
2025-09-16 12:45:12,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1137.9819, 773.9612, 984.8216, 1159.1694, 1309.9858, 631.82007, 846.39087, 1106.8483, 1736.7802, 1961.4739]
2025-09-16 12:45:12,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [225.0, 153.0, 197.0, 220.0, 253.0, 123.0, 162.0, 211.0, 337.0, 378.0]
2025-09-16 12:45:12,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 56 minutes, 25 seconds)
2025-09-16 12:47:06,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:47:10,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1405.02075 ± 444.743
2025-09-16 12:47:10,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1181.7255, 1225.5752, 615.4151, 1211.7106, 2016.6987, 1518.2894, 2085.1155, 1364.5453, 977.56665, 1853.5658]
2025-09-16 12:47:10,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [227.0, 237.0, 115.0, 235.0, 395.0, 288.0, 419.0, 261.0, 186.0, 354.0]
2025-09-16 12:47:10,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 54 minutes, 30 seconds)
2025-09-16 12:49:09,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:49:13,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1489.81189 ± 210.220
2025-09-16 12:49:13,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1150.4247, 1491.773, 1864.6132, 1376.2985, 1703.4047, 1643.0381, 1196.1133, 1381.3711, 1550.2339, 1540.8479]
2025-09-16 12:49:13,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [217.0, 290.0, 376.0, 266.0, 339.0, 323.0, 229.0, 267.0, 300.0, 297.0]
2025-09-16 12:49:13,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 53 minutes, 31 seconds)
2025-09-16 12:51:05,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:51:09,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1273.45935 ± 188.940
2025-09-16 12:51:09,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1363.0922, 1407.68, 1221.7455, 1245.9307, 888.4101, 1271.882, 1455.6908, 1368.9332, 994.5821, 1516.6475]
2025-09-16 12:51:09,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [262.0, 266.0, 243.0, 234.0, 166.0, 252.0, 266.0, 257.0, 187.0, 285.0]
2025-09-16 12:51:09,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 50 minutes, 39 seconds)
2025-09-16 12:53:03,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:53:08,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1741.46313 ± 525.498
2025-09-16 12:53:08,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1558.202, 1985.3236, 3055.6724, 1856.2682, 1138.1438, 1255.6466, 1640.6615, 1472.3215, 1376.1802, 2076.211]
2025-09-16 12:53:08,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [316.0, 395.0, 627.0, 358.0, 230.0, 239.0, 336.0, 296.0, 277.0, 400.0]
2025-09-16 12:53:08,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (1741.46) for latency 6
2025-09-16 12:53:08,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 49 minutes, 8 seconds)
2025-09-16 12:55:03,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:55:07,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1424.34961 ± 315.507
2025-09-16 12:55:07,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1168.1157, 1647.3481, 1482.1588, 1544.9248, 861.9201, 1273.1276, 1700.3591, 1590.1608, 1034.2085, 1941.1722]
2025-09-16 12:55:07,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [233.0, 321.0, 280.0, 295.0, 175.0, 240.0, 327.0, 316.0, 203.0, 369.0]
2025-09-16 12:55:07,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 47 minutes, 9 seconds)
2025-09-16 12:57:08,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:57:14,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1940.52893 ± 944.637
2025-09-16 12:57:14,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1584.582, 1367.8336, 2207.0432, 765.6751, 1684.8191, 2138.0574, 2066.982, 4503.3975, 1461.7167, 1625.183]
2025-09-16 12:57:14,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [307.0, 289.0, 427.0, 150.0, 339.0, 428.0, 412.0, 909.0, 295.0, 330.0]
2025-09-16 12:57:14,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (1940.53) for latency 6
2025-09-16 12:57:14,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 46 minutes, 39 seconds)
2025-09-16 12:59:05,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 12:59:08,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1239.02637 ± 245.063
2025-09-16 12:59:08,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1538.0187, 615.7133, 1019.0253, 1390.8844, 1321.9382, 1212.7461, 1244.9915, 1294.4415, 1414.0076, 1338.4966]
2025-09-16 12:59:08,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [294.0, 121.0, 193.0, 269.0, 261.0, 233.0, 242.0, 241.0, 263.0, 269.0]
2025-09-16 12:59:08,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 43 minutes, 3 seconds)
2025-09-16 13:01:02,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:01:07,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1734.68518 ± 663.498
2025-09-16 13:01:07,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1807.1123, 1465.9297, 1336.022, 1134.8679, 1077.2323, 2358.0134, 2121.7996, 1092.0131, 1664.3477, 3289.5134]
2025-09-16 13:01:07,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [347.0, 288.0, 251.0, 217.0, 203.0, 461.0, 412.0, 209.0, 320.0, 667.0]
2025-09-16 13:01:07,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 41 minutes, 40 seconds)
2025-09-16 13:03:02,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:03:04,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 989.25848 ± 129.709
2025-09-16 13:03:04,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [726.9375, 1019.80975, 1216.4893, 956.1621, 962.08246, 910.7541, 1096.6554, 877.23663, 1107.0615, 1019.3965]
2025-09-16 13:03:04,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [137.0, 194.0, 224.0, 176.0, 186.0, 174.0, 204.0, 163.0, 212.0, 200.0]
2025-09-16 13:03:04,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 39 minutes, 19 seconds)
2025-09-16 13:05:02,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:05:06,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1368.34595 ± 270.261
2025-09-16 13:05:06,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [950.9968, 882.0323, 1345.0779, 1264.5421, 1320.9166, 1703.5597, 1625.8385, 1633.1249, 1607.2067, 1350.1628]
2025-09-16 13:05:06,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [187.0, 171.0, 253.0, 236.0, 253.0, 323.0, 313.0, 322.0, 312.0, 254.0]
2025-09-16 13:05:06,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 37 minutes, 50 seconds)
2025-09-16 13:07:01,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:07:08,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 2322.35156 ± 1323.621
2025-09-16 13:07:08,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [2172.0527, 2553.115, 1725.5958, 4672.932, 732.6771, 4859.176, 2164.875, 1068.8219, 1817.1476, 1457.1204]
2025-09-16 13:07:08,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [458.0, 526.0, 330.0, 985.0, 155.0, 1000.0, 411.0, 219.0, 379.0, 315.0]
2025-09-16 13:07:08,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (2322.35) for latency 6
2025-09-16 13:07:08,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 34 minutes, 59 seconds)
2025-09-16 13:09:01,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:09:05,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1600.05481 ± 231.011
2025-09-16 13:09:05,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1683.6725, 1624.6053, 1693.5696, 2114.5986, 1415.1978, 1710.728, 1545.9125, 1505.9824, 1162.8359, 1543.4447]
2025-09-16 13:09:05,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [323.0, 311.0, 324.0, 398.0, 277.0, 336.0, 298.0, 279.0, 221.0, 293.0]
2025-09-16 13:09:05,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 33 minutes, 37 seconds)
2025-09-16 13:11:00,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:11:09,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 2962.17505 ± 1417.979
2025-09-16 13:11:09,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [3956.3643, 4848.1323, 4710.3286, 2750.4988, 1987.9011, 1164.1635, 4687.3267, 2827.334, 1182.5239, 1507.1804]
2025-09-16 13:11:09,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [803.0, 1000.0, 1000.0, 543.0, 389.0, 237.0, 945.0, 560.0, 227.0, 302.0]
2025-09-16 13:11:09,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (2962.18) for latency 6
2025-09-16 13:11:09,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 32 minutes, 22 seconds)
2025-09-16 13:13:10,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:13:16,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 2090.40356 ± 813.970
2025-09-16 13:13:16,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1996.1729, 2342.332, 4017.8796, 2771.1843, 1041.4337, 1692.9973, 2245.2466, 1103.9727, 1854.5295, 1838.2874]
2025-09-16 13:13:16,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [382.0, 479.0, 844.0, 555.0, 195.0, 325.0, 472.0, 210.0, 365.0, 357.0]
2025-09-16 13:13:16,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 31 minutes, 48 seconds)
2025-09-16 13:15:13,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:15:18,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1881.85120 ± 435.794
2025-09-16 13:15:18,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [2791.2744, 1314.8372, 1659.818, 2413.072, 1865.2416, 1468.5189, 1897.4558, 1710.1176, 2177.8066, 1520.3694]
2025-09-16 13:15:18,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [556.0, 250.0, 323.0, 481.0, 352.0, 277.0, 374.0, 328.0, 432.0, 306.0]
2025-09-16 13:15:18,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 29 minutes, 48 seconds)
2025-09-16 13:17:10,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:17:16,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 2210.45020 ± 883.698
2025-09-16 13:17:16,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [2624.0376, 1858.9889, 2118.2068, 4022.3423, 3402.356, 1898.8275, 1471.0143, 2021.7242, 1900.2141, 786.79156]
2025-09-16 13:17:16,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [503.0, 373.0, 410.0, 776.0, 636.0, 364.0, 302.0, 390.0, 359.0, 157.0]
2025-09-16 13:17:16,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 27 minutes, 8 seconds)
2025-09-16 13:19:15,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:19:22,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 2568.31836 ± 807.404
2025-09-16 13:19:22,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1391.2684, 1822.7975, 2749.6086, 1839.368, 2745.628, 1779.2682, 3185.5432, 2643.7874, 3519.5317, 4006.3818]
2025-09-16 13:19:22,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [273.0, 354.0, 550.0, 351.0, 543.0, 345.0, 649.0, 499.0, 698.0, 824.0]
2025-09-16 13:19:22,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 26 minutes, 19 seconds)
2025-09-16 13:21:16,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:21:21,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1740.59888 ± 540.410
2025-09-16 13:21:21,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1433.3386, 1238.9355, 2762.9353, 1058.001, 1186.0527, 1388.8599, 2299.493, 2147.1028, 1758.1395, 2133.1306]
2025-09-16 13:21:21,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [279.0, 245.0, 557.0, 207.0, 237.0, 275.0, 458.0, 437.0, 346.0, 417.0]
2025-09-16 13:21:21,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 23 minutes, 32 seconds)
2025-09-16 13:23:15,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:23:22,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 2508.12939 ± 1219.249
2025-09-16 13:23:22,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1674.5042, 2740.6191, 1516.3273, 4903.3276, 1887.7303, 4790.4355, 2194.962, 2117.1287, 1592.9714, 1663.2875]
2025-09-16 13:23:22,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [326.0, 553.0, 288.0, 1000.0, 368.0, 952.0, 426.0, 408.0, 307.0, 326.0]
2025-09-16 13:23:22,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 20 minutes, 47 seconds)
2025-09-16 13:25:20,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:25:26,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 2254.47412 ± 633.981
2025-09-16 13:25:26,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1919.3789, 2049.1882, 2249.4236, 2119.2544, 1338.965, 2720.9485, 1755.2504, 3847.9297, 2241.6711, 2302.7302]
2025-09-16 13:25:26,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [361.0, 388.0, 444.0, 398.0, 256.0, 518.0, 340.0, 741.0, 431.0, 444.0]
2025-09-16 13:25:27,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 19 minutes, 3 seconds)
2025-09-16 13:27:20,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:27:27,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 2365.78882 ± 1012.003
2025-09-16 13:27:27,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1698.032, 5108.0396, 2258.7178, 2243.2854, 2276.3103, 2842.6301, 1936.6044, 1834.518, 1122.1815, 2337.5698]
2025-09-16 13:27:27,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [323.0, 1000.0, 442.0, 435.0, 437.0, 558.0, 376.0, 353.0, 214.0, 437.0]
2025-09-16 13:27:27,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 17 minutes, 26 seconds)
2025-09-16 13:29:24,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:29:32,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 3134.98730 ± 1215.617
2025-09-16 13:29:32,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [4587.6304, 4543.7617, 5204.166, 1665.8978, 1927.3274, 1846.4906, 3175.888, 2683.6318, 2239.2026, 3475.875]
2025-09-16 13:29:32,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [878.0, 901.0, 1000.0, 318.0, 365.0, 354.0, 595.0, 504.0, 423.0, 659.0]
2025-09-16 13:29:32,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (3134.99) for latency 6
2025-09-16 13:29:32,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 15 minutes, 14 seconds)
2025-09-16 13:31:31,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:31:40,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 3074.88452 ± 1297.847
2025-09-16 13:31:40,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [3467.329, 1797.2533, 1749.6855, 4888.6636, 4617.4897, 2735.6494, 2135.7578, 5214.052, 2222.2014, 1920.7633]
2025-09-16 13:31:40,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [665.0, 359.0, 337.0, 1000.0, 926.0, 540.0, 411.0, 1000.0, 432.0, 378.0]
2025-09-16 13:31:40,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 14 minutes, 19 seconds)
2025-09-16 13:33:35,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:33:44,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 2912.88940 ± 1385.212
2025-09-16 13:33:44,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1678.2556, 4826.843, 4526.875, 2747.0837, 1716.2064, 3910.023, 1294.2203, 2109.0967, 1493.8054, 4826.4844]
2025-09-16 13:33:44,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [327.0, 1000.0, 886.0, 548.0, 347.0, 806.0, 255.0, 426.0, 291.0, 1000.0]
2025-09-16 13:33:44,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 12 minutes, 33 seconds)
2025-09-16 13:35:43,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:35:53,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 3332.60425 ± 1449.766
2025-09-16 13:35:53,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [2403.905, 4780.433, 1683.6946, 1319.8884, 3627.841, 2614.7808, 1889.8297, 5153.684, 5251.138, 4600.849]
2025-09-16 13:35:53,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [494.0, 989.0, 338.0, 271.0, 706.0, 501.0, 362.0, 1000.0, 1000.0, 973.0]
2025-09-16 13:35:53,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (3332.60) for latency 6
2025-09-16 13:35:53,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 10 minutes, 58 seconds)
2025-09-16 13:37:46,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:37:55,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 3194.58545 ± 1241.890
2025-09-16 13:37:55,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5150.6216, 2154.0437, 2097.201, 3590.7263, 1843.9799, 4753.1978, 3158.2573, 2484.3655, 1883.435, 4830.0254]
2025-09-16 13:37:55,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [982.0, 406.0, 403.0, 682.0, 342.0, 887.0, 611.0, 466.0, 367.0, 910.0]
2025-09-16 13:37:55,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 9 minutes, 2 seconds)
2025-09-16 13:39:49,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:39:57,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 3051.34937 ± 963.992
2025-09-16 13:39:57,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [4428.638, 1993.816, 3310.017, 4309.4946, 3134.0547, 1494.8634, 3520.4014, 1703.7253, 3429.7803, 3188.7012]
2025-09-16 13:39:57,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [841.0, 384.0, 630.0, 862.0, 600.0, 292.0, 680.0, 329.0, 647.0, 610.0]
2025-09-16 13:39:58,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 6 minutes, 42 seconds)
2025-09-16 13:41:58,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:42:07,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 3181.36157 ± 1575.758
2025-09-16 13:42:07,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [4916.526, 4926.6313, 1381.6147, 2857.8123, 1447.0608, 1642.9592, 4906.3374, 1585.505, 2862.743, 5286.4277]
2025-09-16 13:42:07,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [935.0, 1000.0, 262.0, 557.0, 288.0, 311.0, 1000.0, 302.0, 559.0, 1000.0]
2025-09-16 13:42:07,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 4 minutes, 45 seconds)
2025-09-16 13:44:09,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:44:23,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4788.29590 ± 945.848
2025-09-16 13:44:23,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5208.261, 5173.4243, 5297.7974, 5276.836, 5235.0376, 2472.7058, 3419.4128, 5294.7524, 5233.301, 5271.4336]
2025-09-16 13:44:23,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 480.0, 692.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:44:23,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (4788.30) for latency 6
2025-09-16 13:44:23,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 3 minutes, 51 seconds)
2025-09-16 13:46:08,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:46:20,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4463.44434 ± 1278.465
2025-09-16 13:46:20,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [4940.692, 5341.006, 5422.4604, 5297.0195, 2219.7756, 5345.631, 5454.282, 2198.7068, 3291.4368, 5123.431]
2025-09-16 13:46:20,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [909.0, 1000.0, 1000.0, 976.0, 423.0, 1000.0, 1000.0, 417.0, 633.0, 966.0]
2025-09-16 13:46:20,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 40 seconds)
2025-09-16 13:48:19,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:48:32,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4850.07861 ± 867.018
2025-09-16 13:48:32,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5288.3145, 4392.256, 5349.7334, 5325.5933, 5324.927, 5301.3096, 5254.8174, 4283.7695, 5447.2275, 2532.8413]
2025-09-16 13:48:32,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 821.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 820.0, 1000.0, 463.0]
2025-09-16 13:48:32,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (4850.08) for latency 6
2025-09-16 13:48:32,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 59 minutes, 31 seconds)
2025-09-16 13:50:35,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:50:47,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4683.64355 ± 991.210
2025-09-16 13:50:47,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5315.614, 5308.268, 5156.823, 5211.7847, 5327.7837, 5260.8076, 2930.0854, 2627.7393, 4385.9917, 5311.54]
2025-09-16 13:50:47,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 991.0, 1000.0, 1000.0, 1000.0, 560.0, 495.0, 842.0, 1000.0]
2025-09-16 13:50:47,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 58 minutes, 28 seconds)
2025-09-16 13:52:46,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:52:57,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 3913.96631 ± 1521.467
2025-09-16 13:52:57,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [4261.202, 5237.879, 5348.5317, 5245.1963, 2219.5505, 5323.8306, 2162.938, 1593.7786, 5311.036, 2435.7239]
2025-09-16 13:52:57,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [821.0, 1000.0, 1000.0, 1000.0, 432.0, 1000.0, 418.0, 311.0, 1000.0, 469.0]
2025-09-16 13:52:57,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 56 minutes, 21 seconds)
2025-09-16 13:54:46,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:54:57,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 3948.17261 ± 1077.424
2025-09-16 13:54:57,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [2985.7441, 3220.3835, 4074.3882, 4359.418, 5154.252, 3140.1167, 4480.5044, 5081.6265, 1778.478, 5206.8164]
2025-09-16 13:54:57,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [582.0, 618.0, 770.0, 840.0, 1000.0, 609.0, 878.0, 1000.0, 348.0, 1000.0]
2025-09-16 13:54:57,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 52 minutes, 53 seconds)
2025-09-16 13:56:52,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:57:02,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 3791.89844 ± 1437.695
2025-09-16 13:57:02,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5347.4263, 2277.3, 2492.705, 1949.66, 4464.6016, 5266.8135, 3113.709, 2190.9768, 5391.29, 5424.5034]
2025-09-16 13:57:02,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 423.0, 446.0, 375.0, 816.0, 1000.0, 589.0, 396.0, 1000.0, 1000.0]
2025-09-16 13:57:02,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 51 minutes, 19 seconds)
2025-09-16 13:58:57,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 13:59:09,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4291.08105 ± 1291.337
2025-09-16 13:59:09,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [2477.543, 5258.735, 5301.3804, 2041.5565, 5219.257, 2535.484, 5216.522, 5152.638, 4542.863, 5164.8306]
2025-09-16 13:59:09,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [504.0, 1000.0, 1000.0, 409.0, 1000.0, 492.0, 1000.0, 1000.0, 915.0, 1000.0]
2025-09-16 13:59:09,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 48 minutes, 47 seconds)
2025-09-16 14:01:07,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:01:17,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4135.91113 ± 1309.692
2025-09-16 14:01:17,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5484.515, 3944.6917, 2054.6377, 4055.4133, 5501.0225, 5445.2144, 2199.0872, 4126.9326, 5613.9966, 2933.6008]
2025-09-16 14:01:17,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 734.0, 382.0, 736.0, 1000.0, 1000.0, 406.0, 750.0, 1000.0, 528.0]
2025-09-16 14:01:17,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 46 minutes, 11 seconds)
2025-09-16 14:03:16,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:03:27,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 3896.35620 ± 1203.925
2025-09-16 14:03:27,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5231.8647, 1880.8876, 4065.276, 5259.397, 2046.7743, 4226.052, 4943.156, 3012.7498, 4932.5005, 3364.9043]
2025-09-16 14:03:27,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 394.0, 766.0, 1000.0, 409.0, 806.0, 1000.0, 552.0, 1000.0, 677.0]
2025-09-16 14:03:27,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 44 minutes, 6 seconds)
2025-09-16 14:05:28,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:05:41,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5041.52637 ± 984.769
2025-09-16 14:05:41,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5285.0474, 5374.3594, 5356.1797, 5455.0513, 5371.542, 2089.6948, 5345.355, 5396.5337, 5359.003, 5382.5015]
2025-09-16 14:05:41,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 412.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:05:41,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (5041.53) for latency 6
2025-09-16 14:05:41,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 42 minutes, 56 seconds)
2025-09-16 14:07:34,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:07:47,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5040.90723 ± 577.577
2025-09-16 14:07:47,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5401.401, 5436.821, 3701.4656, 5429.953, 4411.1685, 5373.3447, 4533.1543, 5404.233, 5309.7134, 5407.818]
2025-09-16 14:07:47,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 689.0, 1000.0, 811.0, 1000.0, 840.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:07:47,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 40 minutes, 53 seconds)
2025-09-16 14:09:44,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:09:52,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 2760.96973 ± 1711.885
2025-09-16 14:09:52,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5378.4575, 885.1892, 2151.2656, 3268.6218, 5268.315, 888.6715, 2821.854, 1576.5994, 801.7359, 4568.9883]
2025-09-16 14:09:52,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 178.0, 432.0, 632.0, 1000.0, 176.0, 562.0, 319.0, 162.0, 901.0]
2025-09-16 14:09:52,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 38 minutes, 34 seconds)
2025-09-16 14:11:46,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:12:00,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5275.62158 ± 113.479
2025-09-16 14:12:00,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5176.1533, 5325.4756, 4964.8677, 5321.2476, 5334.289, 5339.0693, 5300.095, 5344.1606, 5328.6245, 5322.231]
2025-09-16 14:12:00,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:12:00,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (5275.62) for latency 6
2025-09-16 14:12:00,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 36 minutes, 25 seconds)
2025-09-16 14:14:00,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:14:09,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 3689.24414 ± 1234.017
2025-09-16 14:14:09,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5327.0664, 2754.3792, 3561.0361, 2523.3943, 5483.5146, 1999.9493, 4513.274, 2255.7234, 4903.606, 3570.5015]
2025-09-16 14:14:09,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 510.0, 657.0, 468.0, 1000.0, 365.0, 824.0, 414.0, 911.0, 685.0]
2025-09-16 14:14:09,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 34 minutes, 15 seconds)
2025-09-16 14:16:03,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:16:16,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4923.81104 ± 1167.922
2025-09-16 14:16:16,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5343.3457, 5344.487, 1424.4218, 5236.9175, 5269.5054, 5364.9214, 5342.4453, 5359.7437, 5186.703, 5365.616]
2025-09-16 14:16:16,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 273.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:16:16,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 31 minutes, 44 seconds)
2025-09-16 14:18:20,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:18:33,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4677.64600 ± 1270.042
2025-09-16 14:18:33,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5035.5435, 1647.7959, 5335.519, 5322.9116, 5325.5483, 5335.494, 2733.9792, 5341.4004, 5343.0034, 5355.2666]
2025-09-16 14:18:33,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 321.0, 1000.0, 1000.0, 1000.0, 1000.0, 554.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:18:33,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 30 minutes, 6 seconds)
2025-09-16 14:20:33,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:20:47,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5380.12793 ± 41.447
2025-09-16 14:20:47,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5414.855, 5303.007, 5406.982, 5400.8867, 5298.959, 5377.2017, 5387.6426, 5406.223, 5386.735, 5418.788]
2025-09-16 14:20:47,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 986.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:20:47,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (5380.13) for latency 6
2025-09-16 14:20:47,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 28 minutes, 23 seconds)
2025-09-16 14:22:43,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:22:58,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5344.05078 ± 12.494
2025-09-16 14:22:58,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5340.5903, 5332.964, 5342.701, 5372.451, 5335.69, 5331.4307, 5350.0273, 5332.159, 5358.5864, 5343.908]
2025-09-16 14:22:58,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:22:58,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 26 minutes, 17 seconds)
2025-09-16 14:24:54,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:25:07,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5076.46289 ± 634.425
2025-09-16 14:25:07,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5510.6333, 3924.818, 5462.2417, 5471.8374, 4544.44, 3944.6912, 5483.7705, 5459.5034, 5464.2637, 5498.427]
2025-09-16 14:25:07,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 723.0, 1000.0, 1000.0, 829.0, 725.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:25:07,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 24 minutes, 6 seconds)
2025-09-16 14:27:04,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:27:18,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5396.58203 ± 22.608
2025-09-16 14:27:18,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5446.09, 5421.714, 5395.8296, 5376.2646, 5370.851, 5383.502, 5399.5957, 5405.7637, 5395.869, 5370.3457]
2025-09-16 14:27:18,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:27:18,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (5396.58) for latency 6
2025-09-16 14:27:18,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 22 minutes, 4 seconds)
2025-09-16 14:29:06,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:29:20,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5377.59375 ± 21.541
2025-09-16 14:29:20,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5381.6187, 5400.5444, 5371.965, 5421.598, 5363.1816, 5368.8267, 5344.1396, 5354.802, 5391.9355, 5377.329]
2025-09-16 14:29:20,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:29:20,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 19 minutes, 25 seconds)
2025-09-16 14:31:17,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:31:32,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5434.44873 ± 18.816
2025-09-16 14:31:32,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5397.6743, 5462.547, 5430.3765, 5424.398, 5454.627, 5418.503, 5452.1855, 5441.871, 5441.829, 5420.475]
2025-09-16 14:31:32,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:31:32,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (5434.45) for latency 6
2025-09-16 14:31:32,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 17 minutes, 11 seconds)
2025-09-16 14:33:28,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:33:43,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5407.47949 ± 16.241
2025-09-16 14:33:43,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5394.599, 5402.5557, 5417.416, 5401.4624, 5395.2715, 5425.864, 5444.5713, 5401.5107, 5404.1436, 5387.4]
2025-09-16 14:33:43,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:33:43,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 15 minutes, 3 seconds)
2025-09-16 14:35:45,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:36:00,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5504.51074 ± 64.117
2025-09-16 14:36:00,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5432.0205, 5479.7544, 5515.082, 5462.5317, 5639.7344, 5559.8887, 5442.562, 5576.3164, 5453.697, 5483.5234]
2025-09-16 14:36:00,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:36:00,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (5504.51) for latency 6
2025-09-16 14:36:00,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 13 minutes, 3 seconds)
2025-09-16 14:37:56,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:38:10,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5184.74463 ± 984.083
2025-09-16 14:38:10,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5529.591, 5476.6704, 5549.7686, 5535.5117, 5486.2974, 5508.24, 2233.7332, 5511.098, 5464.212, 5552.3257]
2025-09-16 14:38:10,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 420.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:38:10,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 10 minutes, 51 seconds)
2025-09-16 14:40:06,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:40:19,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4765.02881 ± 1298.487
2025-09-16 14:40:19,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [4328.2925, 5544.9697, 1966.0021, 5521.0312, 2603.8408, 5508.542, 5575.3003, 5538.9956, 5509.1865, 5554.126]
2025-09-16 14:40:19,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [794.0, 1000.0, 363.0, 1000.0, 475.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:40:19,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes, 46 seconds)
2025-09-16 14:42:15,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:42:29,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5456.36230 ± 25.218
2025-09-16 14:42:29,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5485.434, 5455.1323, 5436.9473, 5432.525, 5473.439, 5483.3823, 5413.3237, 5437.73, 5452.8247, 5492.8823]
2025-09-16 14:42:29,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:42:30,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 34 seconds)
2025-09-16 14:44:26,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:44:40,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5510.62109 ± 17.397
2025-09-16 14:44:40,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5502.9155, 5524.1187, 5509.1055, 5501.292, 5507.0776, 5551.0864, 5513.3296, 5481.7524, 5517.478, 5498.0557]
2025-09-16 14:44:40,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:44:40,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (5510.62) for latency 6
2025-09-16 14:44:40,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 22 seconds)
2025-09-16 14:46:37,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:46:51,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5414.03320 ± 29.553
2025-09-16 14:46:51,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5419.932, 5418.534, 5444.5654, 5372.1626, 5443.154, 5408.786, 5405.647, 5374.122, 5466.293, 5387.1333]
2025-09-16 14:46:51,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:46:51,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 10 seconds)
2025-09-16 14:48:46,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 6...
2025-09-16 14:49:00,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5545.78369 ± 46.431
2025-09-16 14:49:00,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5558.8677, 5412.964, 5564.2046, 5551.2695, 5561.7915, 5563.9033, 5541.5957, 5575.6646, 5588.6216, 5538.951]
2025-09-16 14:49:00,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:49:00,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (5545.78) for latency 6
2025-09-16 14:49:00,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1251 [DEBUG]: Training session finished
