2025-09-16 12:06:38,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.100-delay_9
2025-09-16 12:06:38,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.100-delay_9
2025-09-16 12:06:38,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'9': <latency_env.delayed_mdp.ConstantDelay object at 0x14f8d9c94550>}
2025-09-16 12:06:38,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 12:06:38,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 12:06:38,966 baseline-bpql-noisepromille100-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=529, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 12:06:38,966 baseline-bpql-noisepromille100-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 12:06:40,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 12:06:40,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 12:08:32,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:08:33,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 310.62128 ± 31.216
2025-09-16 12:08:33,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [367.11618, 279.40198, 366.73093, 308.34183, 281.6964, 293.7749, 327.0921, 298.76865, 300.92508, 282.3647]
2025-09-16 12:08:33,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [69.0, 56.0, 68.0, 57.0, 57.0, 54.0, 60.0, 61.0, 55.0, 58.0]
2025-09-16 12:08:33,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (310.62) for latency 9
2025-09-16 12:08:33,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 5 minutes, 47 seconds)
2025-09-16 12:10:35,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:10:36,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 356.53601 ± 83.201
2025-09-16 12:10:36,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [241.01566, 449.13367, 396.15915, 484.95282, 415.69662, 238.3734, 412.23343, 341.6242, 278.85266, 307.3185]
2025-09-16 12:10:36,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [48.0, 99.0, 83.0, 95.0, 92.0, 50.0, 77.0, 75.0, 58.0, 68.0]
2025-09-16 12:10:36,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (356.54) for latency 9
2025-09-16 12:10:36,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 13 minutes, 3 seconds)
2025-09-16 12:12:39,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:12:40,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 355.69577 ± 73.952
2025-09-16 12:12:40,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [399.93326, 481.71973, 437.23862, 310.21283, 351.33826, 358.864, 402.21518, 322.82446, 224.78471, 267.82684]
2025-09-16 12:12:40,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [75.0, 91.0, 85.0, 59.0, 69.0, 68.0, 79.0, 64.0, 44.0, 51.0]
2025-09-16 12:12:40,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 14 minutes, 7 seconds)
2025-09-16 12:14:42,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:14:44,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 436.16180 ± 108.190
2025-09-16 12:14:44,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [336.47137, 685.3371, 331.44684, 433.98322, 494.37744, 456.78925, 537.71814, 332.92075, 408.66916, 343.90472]
2025-09-16 12:14:44,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [65.0, 150.0, 62.0, 94.0, 94.0, 92.0, 118.0, 67.0, 75.0, 66.0]
2025-09-16 12:14:44,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (436.16) for latency 9
2025-09-16 12:14:44,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 13 minutes, 23 seconds)
2025-09-16 12:16:48,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:16:49,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 425.40118 ± 83.305
2025-09-16 12:16:49,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [409.69952, 394.6145, 400.4426, 511.34036, 519.99524, 341.2068, 324.95535, 592.91315, 409.24814, 349.5963]
2025-09-16 12:16:49,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 86.0, 78.0, 100.0, 112.0, 75.0, 60.0, 127.0, 87.0, 74.0]
2025-09-16 12:16:49,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 12 minutes, 46 seconds)
2025-09-16 12:18:50,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:18:51,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 445.72818 ± 103.697
2025-09-16 12:18:51,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [347.65424, 402.4641, 396.72018, 631.96875, 416.60876, 580.9486, 584.3893, 343.67383, 393.3614, 359.4926]
2025-09-16 12:18:51,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 76.0, 74.0, 121.0, 79.0, 110.0, 109.0, 64.0, 76.0, 69.0]
2025-09-16 12:18:51,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (445.73) for latency 9
2025-09-16 12:18:51,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 13 minutes, 53 seconds)
2025-09-16 12:20:55,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:20:56,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 520.21814 ± 103.908
2025-09-16 12:20:56,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [702.78656, 464.36694, 649.93823, 628.87537, 526.62787, 550.0259, 416.4346, 434.8952, 387.7647, 440.46576]
2025-09-16 12:20:56,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [132.0, 99.0, 127.0, 124.0, 103.0, 117.0, 77.0, 95.0, 81.0, 80.0]
2025-09-16 12:20:56,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (520.22) for latency 9
2025-09-16 12:20:56,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 12 minutes, 9 seconds)
2025-09-16 12:22:59,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:23:00,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 445.57846 ± 64.756
2025-09-16 12:23:00,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [390.06628, 344.1727, 341.72888, 460.90082, 453.92038, 496.97583, 530.11127, 429.49557, 487.4877, 520.92535]
2025-09-16 12:23:00,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 63.0, 63.0, 96.0, 91.0, 106.0, 116.0, 83.0, 93.0, 108.0]
2025-09-16 12:23:00,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 9 minutes, 59 seconds)
2025-09-16 12:25:04,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:25:05,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 481.84967 ± 87.077
2025-09-16 12:25:05,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [602.9879, 484.7639, 335.58536, 440.46503, 524.7359, 465.83417, 563.83514, 365.96378, 435.63507, 598.69006]
2025-09-16 12:25:05,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [132.0, 89.0, 62.0, 93.0, 97.0, 85.0, 107.0, 80.0, 98.0, 131.0]
2025-09-16 12:25:05,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 3 hours, 8 minutes, 27 seconds)
2025-09-16 12:27:07,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:27:08,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 522.51257 ± 87.399
2025-09-16 12:27:08,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [508.9218, 453.0142, 554.59265, 486.5118, 608.746, 674.261, 562.1274, 561.1494, 339.89743, 475.9043]
2025-09-16 12:27:08,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 90.0, 106.0, 91.0, 113.0, 138.0, 104.0, 110.0, 61.0, 89.0]
2025-09-16 12:27:08,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (522.51) for latency 9
2025-09-16 12:27:08,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 3 hours, 5 minutes, 52 seconds)
2025-09-16 12:29:12,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:29:13,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 450.43433 ± 86.941
2025-09-16 12:29:13,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [564.73126, 458.21182, 460.19046, 423.59042, 357.08646, 350.01212, 641.2981, 382.75363, 408.25867, 458.2098]
2025-09-16 12:29:13,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 90.0, 87.0, 80.0, 74.0, 76.0, 120.0, 77.0, 74.0, 85.0]
2025-09-16 12:29:13,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 3 hours, 4 minutes, 21 seconds)
2025-09-16 12:31:17,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:31:18,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 517.04456 ± 97.957
2025-09-16 12:31:18,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [698.00366, 417.9801, 637.2682, 547.7763, 483.13992, 452.9308, 526.9454, 587.9088, 362.47012, 456.0223]
2025-09-16 12:31:18,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [146.0, 78.0, 126.0, 102.0, 91.0, 85.0, 112.0, 110.0, 77.0, 87.0]
2025-09-16 12:31:18,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 3 hours, 2 minutes, 29 seconds)
2025-09-16 12:33:21,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:33:22,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 492.52277 ± 153.980
2025-09-16 12:33:22,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [332.94562, 487.365, 489.44006, 469.61243, 454.9517, 909.5799, 559.20917, 476.4604, 407.22995, 338.43295]
2025-09-16 12:33:22,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [62.0, 95.0, 106.0, 93.0, 85.0, 178.0, 107.0, 97.0, 75.0, 72.0]
2025-09-16 12:33:22,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 3 hours, 32 seconds)
2025-09-16 12:35:25,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:35:26,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 565.51624 ± 166.495
2025-09-16 12:35:26,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [988.5044, 468.81857, 417.7233, 559.49225, 474.71158, 462.84076, 571.7425, 749.4319, 488.08215, 473.81555]
2025-09-16 12:35:26,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [194.0, 88.0, 77.0, 105.0, 87.0, 85.0, 108.0, 146.0, 92.0, 88.0]
2025-09-16 12:35:26,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (565.52) for latency 9
2025-09-16 12:35:26,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 58 minutes, 12 seconds)
2025-09-16 12:37:30,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:37:31,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 483.45020 ± 105.878
2025-09-16 12:37:31,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [599.8631, 360.44366, 451.16858, 405.96085, 516.2139, 379.30496, 437.28244, 717.9023, 422.54504, 543.81757]
2025-09-16 12:37:31,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [124.0, 66.0, 84.0, 76.0, 100.0, 69.0, 86.0, 134.0, 78.0, 101.0]
2025-09-16 12:37:31,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 56 minutes, 21 seconds)
2025-09-16 12:39:33,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:39:35,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 609.61536 ± 148.902
2025-09-16 12:39:35,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [606.2142, 465.63132, 417.054, 928.60486, 621.12463, 683.42, 511.26236, 787.2238, 487.6526, 587.96643]
2025-09-16 12:39:35,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [129.0, 88.0, 89.0, 185.0, 132.0, 136.0, 111.0, 148.0, 92.0, 125.0]
2025-09-16 12:39:35,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (609.62) for latency 9
2025-09-16 12:39:35,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 54 minutes, 15 seconds)
2025-09-16 12:41:38,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:41:39,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 570.48419 ± 111.853
2025-09-16 12:41:39,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [513.7851, 468.1933, 398.3369, 532.0292, 594.8838, 741.06415, 589.4147, 645.9679, 754.37787, 466.7892]
2025-09-16 12:41:39,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [96.0, 101.0, 74.0, 115.0, 112.0, 150.0, 112.0, 137.0, 152.0, 99.0]
2025-09-16 12:41:39,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 51 minutes, 45 seconds)
2025-09-16 12:43:44,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:43:46,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 631.92456 ± 150.717
2025-09-16 12:43:46,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [715.8564, 900.9946, 468.17975, 564.8552, 829.30194, 495.80975, 716.1789, 460.2515, 489.73666, 678.0809]
2025-09-16 12:43:46,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [143.0, 172.0, 101.0, 106.0, 164.0, 98.0, 146.0, 102.0, 97.0, 132.0]
2025-09-16 12:43:46,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (631.92) for latency 9
2025-09-16 12:43:46,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 50 minutes, 21 seconds)
2025-09-16 12:45:50,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:45:52,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 607.06238 ± 67.191
2025-09-16 12:45:52,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [644.29364, 576.84045, 625.6692, 632.54816, 779.90393, 547.4555, 533.2874, 587.3639, 565.21375, 578.0477]
2025-09-16 12:45:52,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [130.0, 108.0, 118.0, 129.0, 153.0, 116.0, 110.0, 120.0, 113.0, 122.0]
2025-09-16 12:45:52,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 48 minutes, 52 seconds)
2025-09-16 12:47:55,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:47:57,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 605.16479 ± 153.216
2025-09-16 12:47:57,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [374.1289, 777.0226, 560.87555, 582.90546, 620.50134, 629.97235, 636.28125, 419.50165, 519.9725, 930.4857]
2025-09-16 12:47:57,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [69.0, 154.0, 103.0, 119.0, 120.0, 124.0, 134.0, 78.0, 103.0, 186.0]
2025-09-16 12:47:57,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 46 minutes, 54 seconds)
2025-09-16 12:50:01,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:50:02,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 575.39587 ± 119.657
2025-09-16 12:50:02,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [713.789, 599.3445, 453.2403, 548.9836, 845.94946, 451.9346, 590.541, 466.091, 492.10104, 591.98425]
2025-09-16 12:50:02,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [131.0, 128.0, 97.0, 118.0, 160.0, 82.0, 124.0, 93.0, 102.0, 123.0]
2025-09-16 12:50:02,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 45 minutes, 4 seconds)
2025-09-16 12:52:06,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:52:08,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 632.00995 ± 85.313
2025-09-16 12:52:08,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [653.1292, 788.2027, 537.16675, 586.04254, 732.9415, 592.3853, 637.23315, 717.35724, 541.0212, 534.62085]
2025-09-16 12:52:08,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [124.0, 151.0, 112.0, 110.0, 137.0, 126.0, 139.0, 136.0, 113.0, 103.0]
2025-09-16 12:52:08,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (632.01) for latency 9
2025-09-16 12:52:08,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 43 minutes, 24 seconds)
2025-09-16 12:54:13,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:54:14,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 660.65002 ± 187.145
2025-09-16 12:54:14,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [682.18396, 677.8282, 591.6675, 1016.0371, 498.95587, 883.6157, 527.2367, 744.23395, 317.53342, 667.20807]
2025-09-16 12:54:14,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [132.0, 135.0, 118.0, 196.0, 95.0, 169.0, 106.0, 147.0, 60.0, 146.0]
2025-09-16 12:54:14,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (660.65) for latency 9
2025-09-16 12:54:14,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 41 minutes, 21 seconds)
2025-09-16 12:56:17,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:56:19,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 653.34265 ± 157.038
2025-09-16 12:56:19,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [763.2346, 771.0604, 868.258, 676.5226, 500.64102, 443.86713, 413.5414, 705.3527, 548.85944, 842.0889]
2025-09-16 12:56:19,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [145.0, 142.0, 166.0, 126.0, 92.0, 87.0, 84.0, 134.0, 102.0, 160.0]
2025-09-16 12:56:19,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 38 minutes, 53 seconds)
2025-09-16 12:58:23,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:58:25,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 668.30164 ± 190.885
2025-09-16 12:58:25,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [558.0886, 375.7932, 667.4766, 847.44824, 524.6085, 622.42523, 646.2266, 632.15234, 1126.3649, 682.4328]
2025-09-16 12:58:25,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [117.0, 80.0, 120.0, 179.0, 101.0, 121.0, 123.0, 135.0, 220.0, 128.0]
2025-09-16 12:58:25,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (668.30) for latency 9
2025-09-16 12:58:25,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 36 minutes, 59 seconds)
2025-09-16 13:00:29,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:00:31,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 524.22705 ± 198.007
2025-09-16 13:00:31,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [375.8253, 383.71936, 570.09393, 742.41626, 1005.1208, 315.97247, 424.541, 485.53116, 534.88605, 404.16367]
2025-09-16 13:00:31,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [75.0, 83.0, 108.0, 145.0, 192.0, 59.0, 79.0, 92.0, 117.0, 78.0]
2025-09-16 13:00:31,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 35 minutes, 7 seconds)
2025-09-16 13:02:36,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:02:38,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 775.85779 ± 258.961
2025-09-16 13:02:38,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [751.2458, 730.45465, 518.30774, 817.23694, 570.73083, 790.3887, 1236.572, 1213.9204, 744.3237, 385.39722]
2025-09-16 13:02:38,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [162.0, 153.0, 109.0, 161.0, 123.0, 164.0, 245.0, 235.0, 152.0, 72.0]
2025-09-16 13:02:38,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (775.86) for latency 9
2025-09-16 13:02:38,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 33 minutes, 21 seconds)
2025-09-16 13:04:43,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:04:45,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 805.35657 ± 243.405
2025-09-16 13:04:45,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [792.5404, 683.41797, 1447.0162, 798.5579, 880.2754, 527.20483, 769.60535, 845.8791, 785.40985, 523.6587]
2025-09-16 13:04:45,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [170.0, 142.0, 298.0, 165.0, 170.0, 98.0, 146.0, 173.0, 146.0, 95.0]
2025-09-16 13:04:45,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (805.36) for latency 9
2025-09-16 13:04:45,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 31 minutes, 18 seconds)
2025-09-16 13:06:49,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:06:51,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 887.63702 ± 395.356
2025-09-16 13:06:51,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [817.17957, 1251.1594, 941.6711, 604.45746, 696.50433, 547.79553, 342.0256, 1278.2106, 675.6751, 1721.6913]
2025-09-16 13:06:51,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [149.0, 239.0, 176.0, 126.0, 148.0, 104.0, 64.0, 243.0, 124.0, 312.0]
2025-09-16 13:06:51,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (887.64) for latency 9
2025-09-16 13:06:51,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 29 minutes, 36 seconds)
2025-09-16 13:08:54,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:08:56,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 811.36597 ± 324.796
2025-09-16 13:08:56,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1341.2694, 611.41736, 660.35767, 992.36127, 770.22034, 668.7522, 689.8097, 1422.9984, 629.4484, 327.02478]
2025-09-16 13:08:56,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [278.0, 115.0, 139.0, 193.0, 147.0, 127.0, 148.0, 272.0, 118.0, 62.0]
2025-09-16 13:08:56,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 27 minutes, 24 seconds)
2025-09-16 13:10:59,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:11:01,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 698.31244 ± 207.198
2025-09-16 13:11:01,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [951.5862, 639.55597, 646.1419, 670.8912, 1181.3143, 512.26276, 581.26685, 763.0623, 456.11456, 580.9284]
2025-09-16 13:11:01,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [183.0, 135.0, 120.0, 128.0, 228.0, 94.0, 108.0, 144.0, 85.0, 108.0]
2025-09-16 13:11:01,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 25 minutes)
2025-09-16 13:13:08,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:13:10,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 940.96112 ± 296.603
2025-09-16 13:13:10,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [640.187, 1095.7567, 786.4984, 1679.1985, 573.7875, 988.41364, 867.78735, 860.8778, 1111.8466, 805.2577]
2025-09-16 13:13:10,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [124.0, 207.0, 143.0, 315.0, 111.0, 189.0, 170.0, 175.0, 213.0, 149.0]
2025-09-16 13:13:10,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (940.96) for latency 9
2025-09-16 13:13:10,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 23 minutes, 18 seconds)
2025-09-16 13:15:14,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:15:17,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 880.07971 ± 417.719
2025-09-16 13:15:17,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [737.5733, 722.51324, 474.3167, 907.2681, 628.8549, 709.18176, 1892.2692, 746.65594, 1424.1428, 558.0213]
2025-09-16 13:15:17,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [151.0, 131.0, 101.0, 175.0, 123.0, 125.0, 346.0, 140.0, 265.0, 106.0]
2025-09-16 13:15:17,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 21 minutes, 5 seconds)
2025-09-16 13:17:22,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:17:24,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1040.38855 ± 426.189
2025-09-16 13:17:24,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1318.2906, 603.0902, 833.3218, 1944.934, 726.0969, 893.2262, 1102.0032, 631.915, 751.6895, 1599.3181]
2025-09-16 13:17:24,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [243.0, 111.0, 180.0, 394.0, 133.0, 183.0, 231.0, 137.0, 138.0, 305.0]
2025-09-16 13:17:24,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (1040.39) for latency 9
2025-09-16 13:17:24,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 19 minutes, 17 seconds)
2025-09-16 13:19:30,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:19:33,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1121.00110 ± 265.868
2025-09-16 13:19:33,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [765.4273, 1321.2286, 926.6532, 1162.8755, 1411.4032, 1520.2673, 904.7488, 1347.3425, 1127.837, 722.2275]
2025-09-16 13:19:33,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [153.0, 249.0, 174.0, 216.0, 268.0, 292.0, 180.0, 257.0, 211.0, 146.0]
2025-09-16 13:19:33,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (1121.00) for latency 9
2025-09-16 13:19:33,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 17 minutes, 52 seconds)
2025-09-16 13:21:36,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:21:39,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 948.03455 ± 325.389
2025-09-16 13:21:39,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [270.30655, 603.9763, 1332.6195, 1382.6278, 1072.6411, 1248.9178, 985.521, 919.00775, 764.3632, 900.3645]
2025-09-16 13:21:39,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [48.0, 128.0, 250.0, 274.0, 211.0, 230.0, 189.0, 191.0, 144.0, 192.0]
2025-09-16 13:21:39,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 15 minutes, 56 seconds)
2025-09-16 13:23:44,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:23:47,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 983.44104 ± 232.121
2025-09-16 13:23:47,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [848.4037, 1438.5485, 1325.3787, 677.7806, 1007.2453, 764.08185, 771.51605, 948.4135, 978.77576, 1074.2667]
2025-09-16 13:23:47,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [169.0, 268.0, 262.0, 128.0, 206.0, 143.0, 154.0, 201.0, 198.0, 205.0]
2025-09-16 13:23:47,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 13 minutes, 41 seconds)
2025-09-16 13:25:51,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:25:53,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 780.88165 ± 212.326
2025-09-16 13:25:53,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [509.32407, 1221.8107, 955.261, 743.27405, 675.7811, 604.5196, 773.0414, 740.20386, 561.83765, 1023.76294]
2025-09-16 13:25:53,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [101.0, 243.0, 179.0, 139.0, 120.0, 109.0, 136.0, 144.0, 106.0, 192.0]
2025-09-16 13:25:53,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 11 minutes, 33 seconds)
2025-09-16 13:27:57,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:28:00,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1120.38843 ± 599.531
2025-09-16 13:28:00,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [948.32446, 1327.5511, 1006.2951, 634.8763, 923.9361, 2684.1213, 400.71082, 1486.3433, 760.63934, 1031.0856]
2025-09-16 13:28:00,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [188.0, 241.0, 191.0, 120.0, 187.0, 535.0, 87.0, 275.0, 142.0, 187.0]
2025-09-16 13:28:00,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 9 minutes, 10 seconds)
2025-09-16 13:30:04,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:30:06,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1007.52039 ± 249.987
2025-09-16 13:30:06,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1068.8053, 608.47784, 625.91693, 924.43835, 1237.6229, 1228.1572, 1166.3386, 766.54645, 1347.7056, 1101.1941]
2025-09-16 13:30:06,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [197.0, 113.0, 128.0, 177.0, 248.0, 238.0, 243.0, 164.0, 278.0, 214.0]
2025-09-16 13:30:06,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 6 minutes, 42 seconds)
2025-09-16 13:32:12,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:32:14,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1030.24609 ± 462.001
2025-09-16 13:32:14,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [775.0834, 787.3201, 716.48804, 839.97144, 2174.2788, 1322.5404, 968.35315, 428.53946, 933.7034, 1356.1823]
2025-09-16 13:32:14,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [152.0, 145.0, 155.0, 166.0, 396.0, 250.0, 200.0, 92.0, 172.0, 248.0]
2025-09-16 13:32:14,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 5 minutes, 1 second)
2025-09-16 13:34:19,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:34:21,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1144.23596 ± 375.296
2025-09-16 13:34:21,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1789.7448, 755.36847, 1181.5846, 1081.586, 1465.4618, 1552.721, 801.96, 1196.5267, 1140.4819, 476.9236]
2025-09-16 13:34:21,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [358.0, 166.0, 230.0, 195.0, 284.0, 292.0, 150.0, 232.0, 214.0, 97.0]
2025-09-16 13:34:21,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (1144.24) for latency 9
2025-09-16 13:34:22,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 2 minutes, 42 seconds)
2025-09-16 13:36:27,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:36:31,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1327.03235 ± 375.948
2025-09-16 13:36:31,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [2031.7867, 1244.9519, 699.25146, 898.4276, 1658.2021, 1640.628, 1341.3635, 1217.5619, 1498.4414, 1039.7083]
2025-09-16 13:36:31,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [399.0, 255.0, 142.0, 181.0, 318.0, 306.0, 251.0, 225.0, 283.0, 220.0]
2025-09-16 13:36:31,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (1327.03) for latency 9
2025-09-16 13:36:31,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 1 minute, 7 seconds)
2025-09-16 13:38:35,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:38:38,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1269.03455 ± 565.226
2025-09-16 13:38:38,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [960.05237, 676.2144, 2430.5657, 938.0094, 855.932, 657.7271, 1227.1863, 1738.6719, 1989.9342, 1216.0513]
2025-09-16 13:38:38,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [199.0, 134.0, 467.0, 183.0, 169.0, 127.0, 227.0, 347.0, 372.0, 239.0]
2025-09-16 13:38:38,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 59 minutes, 14 seconds)
2025-09-16 13:40:42,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:40:44,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1001.70959 ± 306.378
2025-09-16 13:40:44,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [800.6338, 640.2401, 872.7681, 1604.5663, 788.91064, 887.7279, 1092.6443, 675.87054, 1285.9515, 1367.7822]
2025-09-16 13:40:44,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [145.0, 121.0, 162.0, 295.0, 155.0, 176.0, 208.0, 132.0, 245.0, 262.0]
2025-09-16 13:40:44,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 56 minutes, 55 seconds)
2025-09-16 13:42:50,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:42:54,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1381.80908 ± 504.510
2025-09-16 13:42:54,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [533.6005, 1910.9885, 971.22784, 2095.203, 1008.83887, 1552.7235, 1595.6357, 765.7986, 1894.9095, 1489.1652]
2025-09-16 13:42:54,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [112.0, 371.0, 190.0, 401.0, 192.0, 294.0, 316.0, 149.0, 370.0, 284.0]
2025-09-16 13:42:54,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (1381.81) for latency 9
2025-09-16 13:42:54,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 55 minutes, 6 seconds)
2025-09-16 13:44:59,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:45:02,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1286.43201 ± 413.697
2025-09-16 13:45:02,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1142.7827, 1251.5287, 1266.3933, 700.6867, 1383.116, 1463.1095, 938.9099, 1014.5481, 1365.8428, 2337.402]
2025-09-16 13:45:02,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [226.0, 232.0, 237.0, 136.0, 263.0, 280.0, 174.0, 193.0, 264.0, 454.0]
2025-09-16 13:45:02,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 53 minutes, 11 seconds)
2025-09-16 13:47:06,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:47:10,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1223.37695 ± 275.044
2025-09-16 13:47:10,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [942.47675, 743.84265, 1345.0784, 1486.9829, 1477.0989, 1024.6234, 1469.3516, 893.129, 1489.2689, 1361.9169]
2025-09-16 13:47:10,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [175.0, 142.0, 260.0, 277.0, 294.0, 208.0, 293.0, 175.0, 283.0, 260.0]
2025-09-16 13:47:10,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 50 minutes, 48 seconds)
2025-09-16 13:49:13,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:49:17,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1480.68176 ± 651.628
2025-09-16 13:49:17,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1189.8256, 2840.098, 882.7944, 1521.6198, 1712.5214, 1514.9355, 1327.5242, 2325.276, 541.74146, 950.4824]
2025-09-16 13:49:17,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [232.0, 559.0, 165.0, 285.0, 324.0, 297.0, 271.0, 451.0, 106.0, 176.0]
2025-09-16 13:49:17,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (1480.68) for latency 9
2025-09-16 13:49:17,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 48 minutes, 34 seconds)
2025-09-16 13:51:21,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:51:24,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1261.33423 ± 407.820
2025-09-16 13:51:24,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [841.4526, 1638.805, 1218.0055, 670.87134, 731.4864, 1870.7839, 1022.46826, 1530.8878, 1705.3073, 1383.2745]
2025-09-16 13:51:24,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [150.0, 304.0, 216.0, 124.0, 141.0, 340.0, 193.0, 291.0, 315.0, 265.0]
2025-09-16 13:51:24,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 46 minutes, 41 seconds)
2025-09-16 13:53:30,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:53:34,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1600.55542 ± 1123.318
2025-09-16 13:53:34,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1513.6555, 936.8188, 1021.01984, 4905.8486, 1582.0117, 1171.2783, 1303.9349, 1416.7661, 911.0797, 1243.1406]
2025-09-16 13:53:34,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [293.0, 174.0, 191.0, 942.0, 299.0, 220.0, 246.0, 276.0, 170.0, 227.0]
2025-09-16 13:53:34,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (1600.56) for latency 9
2025-09-16 13:53:34,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 44 minutes, 32 seconds)
2025-09-16 13:55:37,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:55:41,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1500.38647 ± 544.330
2025-09-16 13:55:41,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1013.6477, 1747.8894, 2372.5166, 1837.8782, 858.0792, 1382.6182, 745.68195, 2071.2979, 1970.8469, 1003.4089]
2025-09-16 13:55:41,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [202.0, 327.0, 443.0, 337.0, 174.0, 275.0, 138.0, 396.0, 356.0, 211.0]
2025-09-16 13:55:41,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 42 minutes, 13 seconds)
2025-09-16 13:57:50,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:57:55,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1962.63342 ± 949.750
2025-09-16 13:57:55,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [3479.296, 2488.5344, 1872.0918, 913.0124, 3832.3833, 1469.26, 1769.1929, 1530.8937, 1175.3815, 1096.2894]
2025-09-16 13:57:55,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [682.0, 491.0, 365.0, 176.0, 745.0, 274.0, 334.0, 295.0, 220.0, 195.0]
2025-09-16 13:57:55,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (1962.63) for latency 9
2025-09-16 13:57:55,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 41 minutes, 5 seconds)
2025-09-16 13:59:58,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:00:02,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1770.75098 ± 732.851
2025-09-16 14:00:02,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [2450.0754, 2235.7578, 582.0381, 2876.3684, 1453.3326, 836.93396, 1698.5956, 2696.579, 1405.7222, 1472.1078]
2025-09-16 14:00:02,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [454.0, 417.0, 105.0, 540.0, 289.0, 165.0, 324.0, 510.0, 271.0, 281.0]
2025-09-16 14:00:02,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 38 minutes, 55 seconds)
2025-09-16 14:02:09,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:02:14,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1910.06384 ± 1165.182
2025-09-16 14:02:14,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1091.2563, 2051.0725, 2063.3435, 1987.3153, 1589.8695, 705.8563, 1088.4633, 1914.1433, 1469.0094, 5140.309]
2025-09-16 14:02:14,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [203.0, 377.0, 379.0, 368.0, 310.0, 123.0, 201.0, 361.0, 289.0, 1000.0]
2025-09-16 14:02:14,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 37 minutes, 23 seconds)
2025-09-16 14:04:17,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:04:24,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 2628.41162 ± 1426.161
2025-09-16 14:04:24,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [3656.9539, 2392.0476, 1850.2244, 1958.209, 4296.3345, 5003.8247, 774.29675, 592.0915, 1871.2858, 3888.8489]
2025-09-16 14:04:24,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [693.0, 454.0, 342.0, 367.0, 819.0, 940.0, 154.0, 122.0, 349.0, 748.0]
2025-09-16 14:04:24,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (2628.41) for latency 9
2025-09-16 14:04:24,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 35 minutes, 16 seconds)
2025-09-16 14:06:31,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:06:35,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1593.83862 ± 460.421
2025-09-16 14:06:35,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1595.6906, 1934.1857, 2218.918, 1354.7164, 716.7659, 1609.099, 1656.7961, 2213.766, 967.4126, 1671.036]
2025-09-16 14:06:35,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [298.0, 356.0, 419.0, 242.0, 135.0, 319.0, 324.0, 416.0, 175.0, 310.0]
2025-09-16 14:06:35,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 33 minutes, 45 seconds)
2025-09-16 14:08:41,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:08:46,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1934.75610 ± 822.551
2025-09-16 14:08:46,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1364.3431, 2307.3115, 1006.3859, 2630.3687, 1828.947, 1974.5349, 1723.338, 1321.0852, 3946.2583, 1244.9877]
2025-09-16 14:08:46,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [259.0, 445.0, 190.0, 501.0, 346.0, 378.0, 325.0, 251.0, 756.0, 227.0]
2025-09-16 14:08:46,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 31 minutes, 7 seconds)
2025-09-16 14:10:55,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:11:02,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 2564.19800 ± 1306.673
2025-09-16 14:11:02,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1417.1193, 5195.2017, 3159.9597, 3654.8682, 1328.1603, 1102.63, 2877.0017, 1157.348, 3698.6877, 2051.0044]
2025-09-16 14:11:02,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [276.0, 1000.0, 641.0, 736.0, 236.0, 208.0, 574.0, 221.0, 712.0, 390.0]
2025-09-16 14:11:02,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 30 minutes, 12 seconds)
2025-09-16 14:13:03,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:13:09,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 2063.64526 ± 1344.500
2025-09-16 14:13:09,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1121.3185, 3080.4766, 2479.8088, 925.2869, 978.93884, 1399.8472, 1401.2273, 3221.5547, 5191.462, 836.53046]
2025-09-16 14:13:09,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [232.0, 596.0, 472.0, 176.0, 186.0, 274.0, 278.0, 609.0, 1000.0, 161.0]
2025-09-16 14:13:09,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 27 minutes, 25 seconds)
2025-09-16 14:15:17,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:15:23,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 2112.16064 ± 1178.363
2025-09-16 14:15:23,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1821.1436, 703.18414, 1485.9061, 4157.9727, 781.6242, 1959.7003, 2239.7935, 3074.8923, 3910.6663, 986.72314]
2025-09-16 14:15:23,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [348.0, 144.0, 269.0, 778.0, 148.0, 390.0, 409.0, 581.0, 740.0, 178.0]
2025-09-16 14:15:23,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 25 minutes, 42 seconds)
2025-09-16 14:17:27,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:17:36,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 2977.92725 ± 1239.602
2025-09-16 14:17:36,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1527.3346, 1669.4048, 2701.2502, 4332.4985, 2648.2302, 4676.533, 2131.6516, 1610.6862, 4899.9585, 3581.7231]
2025-09-16 14:17:36,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [282.0, 312.0, 500.0, 819.0, 518.0, 907.0, 381.0, 289.0, 889.0, 664.0]
2025-09-16 14:17:36,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (2977.93) for latency 9
2025-09-16 14:17:36,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 23 minutes, 43 seconds)
2025-09-16 14:19:41,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:19:49,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 2657.08130 ± 1206.373
2025-09-16 14:19:49,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [4016.2217, 2416.2688, 5105.8896, 3665.6733, 1436.3359, 1337.4435, 1639.8662, 3053.6167, 1625.784, 2273.7144]
2025-09-16 14:19:49,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [794.0, 452.0, 1000.0, 713.0, 288.0, 260.0, 322.0, 599.0, 312.0, 429.0]
2025-09-16 14:19:49,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 21 minutes, 44 seconds)
2025-09-16 14:21:57,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:22:07,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 3558.64380 ± 1839.234
2025-09-16 14:22:07,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5272.2583, 5281.915, 5221.776, 1648.9565, 5274.3955, 5242.109, 1228.4425, 879.5092, 3736.7275, 1800.3464]
2025-09-16 14:22:07,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 317.0, 1000.0, 1000.0, 234.0, 185.0, 719.0, 347.0]
2025-09-16 14:22:07,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (3558.64) for latency 9
2025-09-16 14:22:07,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 19 minutes, 43 seconds)
2025-09-16 14:24:12,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:24:21,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 3328.85498 ± 1378.611
2025-09-16 14:24:21,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1571.1763, 4957.507, 5038.2075, 2947.3545, 5280.2524, 2294.7354, 3724.7654, 3874.9812, 2106.253, 1493.3182]
2025-09-16 14:24:21,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [279.0, 933.0, 936.0, 564.0, 996.0, 441.0, 686.0, 716.0, 386.0, 287.0]
2025-09-16 14:24:21,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 18 minutes, 23 seconds)
2025-09-16 14:26:24,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:26:31,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 2531.18701 ± 1264.540
2025-09-16 14:26:31,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [2516.7021, 1161.4504, 1373.585, 4633.265, 1645.0033, 3434.947, 4808.9316, 1404.6652, 2289.9832, 2043.3367]
2025-09-16 14:26:31,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [477.0, 219.0, 255.0, 902.0, 302.0, 647.0, 917.0, 296.0, 446.0, 385.0]
2025-09-16 14:26:31,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 15 minutes, 42 seconds)
2025-09-16 14:28:42,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:28:50,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 2702.30127 ± 1027.230
2025-09-16 14:28:50,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1466.323, 2762.6875, 5194.0396, 3201.4868, 1588.6057, 3373.1611, 2730.2063, 2377.4324, 2470.8994, 1858.1743]
2025-09-16 14:28:50,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [281.0, 530.0, 1000.0, 590.0, 298.0, 655.0, 511.0, 462.0, 465.0, 355.0]
2025-09-16 14:28:50,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 14 minutes, 6 seconds)
2025-09-16 14:30:50,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:30:56,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1984.40234 ± 967.082
2025-09-16 14:30:56,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [3399.993, 1820.5338, 1220.3134, 4095.367, 1301.2773, 1419.1814, 1116.2814, 2354.8, 1210.7457, 1905.5315]
2025-09-16 14:30:56,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [626.0, 352.0, 232.0, 781.0, 241.0, 272.0, 200.0, 434.0, 234.0, 345.0]
2025-09-16 14:30:56,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 11 minutes, 6 seconds)
2025-09-16 14:33:01,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:33:10,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 3340.16064 ± 1604.977
2025-09-16 14:33:10,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5312.3784, 4063.5186, 4209.141, 5261.6177, 5213.1924, 1969.7716, 2842.661, 1839.5964, 2070.0093, 619.71857]
2025-09-16 14:33:10,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [978.0, 759.0, 797.0, 1000.0, 1000.0, 384.0, 518.0, 354.0, 392.0, 120.0]
2025-09-16 14:33:10,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 8 minutes, 33 seconds)
2025-09-16 14:35:15,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:35:24,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 3245.61377 ± 1727.167
2025-09-16 14:35:24,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5335.6943, 735.4268, 4651.662, 905.0764, 3180.755, 5224.1265, 3014.532, 5206.976, 2896.8381, 1305.051]
2025-09-16 14:35:24,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [999.0, 150.0, 874.0, 188.0, 591.0, 1000.0, 597.0, 1000.0, 548.0, 250.0]
2025-09-16 14:35:24,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 6 minutes, 17 seconds)
2025-09-16 14:37:37,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:37:42,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 1982.68774 ± 1173.669
2025-09-16 14:37:42,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1390.2205, 5161.27, 2219.6492, 1725.631, 763.0154, 1277.5643, 997.9489, 1734.5914, 2338.8901, 2218.096]
2025-09-16 14:37:42,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [269.0, 1000.0, 425.0, 333.0, 170.0, 261.0, 215.0, 327.0, 450.0, 421.0]
2025-09-16 14:37:42,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 4 minutes, 52 seconds)
2025-09-16 14:39:48,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:39:59,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 3616.89917 ± 1698.570
2025-09-16 14:39:59,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5207.916, 957.7278, 5197.5586, 5092.6626, 5235.9463, 1249.3315, 4913.1304, 2888.5388, 3831.4275, 1594.7524]
2025-09-16 14:39:59,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 200.0, 1000.0, 1000.0, 1000.0, 235.0, 956.0, 568.0, 739.0, 322.0]
2025-09-16 14:39:59,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (3616.90) for latency 9
2025-09-16 14:39:59,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 2 minutes, 24 seconds)
2025-09-16 14:42:06,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:42:16,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 3413.78833 ± 1668.631
2025-09-16 14:42:16,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [1141.0813, 5257.177, 5289.694, 5252.8247, 1483.2109, 1788.616, 3179.636, 3597.824, 5287.189, 1860.6287]
2025-09-16 14:42:16,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [225.0, 1000.0, 1000.0, 1000.0, 279.0, 348.0, 612.0, 697.0, 1000.0, 338.0]
2025-09-16 14:42:16,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 1 minute, 15 seconds)
2025-09-16 14:44:25,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:44:34,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 3354.11279 ± 1400.962
2025-09-16 14:44:34,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5154.8604, 2944.624, 4805.077, 3466.766, 4336.9463, 1836.7356, 1163.933, 5260.8765, 2046.539, 2524.7688]
2025-09-16 14:44:34,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 550.0, 919.0, 648.0, 865.0, 366.0, 207.0, 1000.0, 404.0, 489.0]
2025-09-16 14:44:34,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 59 minutes, 18 seconds)
2025-09-16 14:46:34,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:46:44,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 3686.50000 ± 1289.501
2025-09-16 14:46:44,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [4392.9243, 3312.2664, 2131.747, 3016.1448, 5292.7295, 5293.1455, 5349.823, 2697.6187, 1633.8912, 3744.7095]
2025-09-16 14:46:44,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [829.0, 629.0, 400.0, 570.0, 1000.0, 1000.0, 1000.0, 495.0, 304.0, 705.0]
2025-09-16 14:46:44,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (3686.50) for latency 9
2025-09-16 14:46:44,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 56 minutes, 37 seconds)
2025-09-16 14:48:49,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:48:55,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 2425.13672 ± 1590.789
2025-09-16 14:48:55,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [2243.562, 2171.3713, 5379.7007, 680.15717, 1917.5486, 771.07025, 5461.8354, 2255.288, 1904.4817, 1466.3508]
2025-09-16 14:48:55,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [419.0, 400.0, 1000.0, 135.0, 366.0, 139.0, 1000.0, 414.0, 349.0, 281.0]
2025-09-16 14:48:55,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 53 minutes, 51 seconds)
2025-09-16 14:51:01,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:51:10,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 3013.29370 ± 1512.248
2025-09-16 14:51:10,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [2094.2244, 2012.2764, 5196.752, 3084.2783, 2554.8699, 2303.6006, 1440.7025, 5135.27, 1091.5537, 5219.4097]
2025-09-16 14:51:10,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [398.0, 396.0, 1000.0, 589.0, 497.0, 455.0, 283.0, 1000.0, 201.0, 1000.0]
2025-09-16 14:51:10,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 51 minutes, 26 seconds)
2025-09-16 14:53:25,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:53:39,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5322.27100 ± 45.699
2025-09-16 14:53:39,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5334.8525, 5338.461, 5366.191, 5346.9917, 5385.2217, 5326.2163, 5301.6997, 5332.352, 5273.5693, 5217.1577]
2025-09-16 14:53:39,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:53:39,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1226 [INFO]: New best (5322.27) for latency 9
2025-09-16 14:53:39,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 50 minutes, 5 seconds)
2025-09-16 14:55:41,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:55:54,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 4033.05078 ± 1372.605
2025-09-16 14:55:54,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [2697.4297, 2274.7046, 5317.3843, 3502.24, 2072.2156, 5362.4897, 5409.6797, 5355.413, 2998.9539, 5339.9995]
2025-09-16 14:55:54,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [499.0, 424.0, 1000.0, 659.0, 379.0, 998.0, 1000.0, 1000.0, 560.0, 1000.0]
2025-09-16 14:55:54,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 47 minutes, 32 seconds)
2025-09-16 14:58:07,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:58:18,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 3951.39966 ± 1764.734
2025-09-16 14:58:18,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5302.3823, 5376.598, 2371.1145, 5328.598, 2848.726, 1211.4742, 5328.6543, 5262.6885, 1057.127, 5426.6377]
2025-09-16 14:58:18,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 434.0, 1000.0, 541.0, 211.0, 1000.0, 1000.0, 192.0, 1000.0]
2025-09-16 14:58:18,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 46 minutes, 15 seconds)
2025-09-16 15:00:15,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:00:28,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 4378.16260 ± 1319.032
2025-09-16 15:00:28,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5241.7974, 4144.4067, 4388.9214, 5162.1514, 4926.349, 5280.413, 852.1624, 3358.6353, 5238.8438, 5187.9443]
2025-09-16 15:00:28,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 777.0, 825.0, 992.0, 944.0, 1000.0, 159.0, 648.0, 1000.0, 1000.0]
2025-09-16 15:00:28,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 43 minutes, 51 seconds)
2025-09-16 15:02:33,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:02:49,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5050.23193 ± 519.549
2025-09-16 15:02:49,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5299.9917, 5307.9106, 4083.3086, 5354.2183, 5313.9536, 5128.406, 3961.0442, 5338.66, 5425.526, 5289.301]
2025-09-16 15:02:49,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 747.0, 1000.0, 1000.0, 1000.0, 745.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:02:49,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 41 minutes, 56 seconds)
2025-09-16 15:04:54,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:05:09,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 4950.29590 ± 686.654
2025-09-16 15:05:09,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5285.62, 3721.8093, 3447.1387, 5382.2446, 5279.0977, 5278.9043, 5254.885, 5236.5684, 5304.1475, 5312.5474]
2025-09-16 15:05:09,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 712.0, 655.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:05:09,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 39 minutes, 6 seconds)
2025-09-16 15:07:24,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:07:38,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 4891.19775 ± 980.645
2025-09-16 15:07:38,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5253.9326, 5256.4634, 5281.3105, 5153.668, 5179.5576, 5150.499, 5179.607, 5268.283, 1952.5203, 5236.135]
2025-09-16 15:07:38,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 380.0, 1000.0]
2025-09-16 15:07:38,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 37 minutes, 34 seconds)
2025-09-16 15:09:40,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:09:51,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 4055.90698 ± 1589.501
2025-09-16 15:09:51,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5210.9746, 5333.096, 5220.662, 1309.2614, 1298.5745, 5376.7944, 4031.4854, 5366.848, 4727.182, 2684.1895]
2025-09-16 15:09:51,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 261.0, 251.0, 1000.0, 749.0, 1000.0, 881.0, 506.0]
2025-09-16 15:09:51,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 34 minutes, 40 seconds)
2025-09-16 15:11:56,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:12:09,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 4505.42773 ± 1247.495
2025-09-16 15:12:09,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5100.7803, 5136.372, 5118.852, 5115.946, 5123.307, 5169.5547, 5032.4585, 2559.8962, 5151.121, 1545.992]
2025-09-16 15:12:09,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 499.0, 1000.0, 303.0]
2025-09-16 15:12:09,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 32 minutes, 43 seconds)
2025-09-16 15:14:15,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:14:28,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 4787.12988 ± 1072.511
2025-09-16 15:14:28,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5310.673, 5186.8374, 3849.7622, 5273.674, 5247.364, 5370.7515, 1836.3472, 5337.6006, 5218.3477, 5239.9434]
2025-09-16 15:14:28,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 731.0, 1000.0, 1000.0, 1000.0, 374.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:14:28,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 30 minutes, 18 seconds)
2025-09-16 15:16:39,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:16:51,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 4600.06104 ± 1008.862
2025-09-16 15:16:51,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5215.1675, 5228.6587, 3070.7068, 5267.385, 2476.0852, 5229.128, 5333.737, 5292.091, 3874.9275, 5012.7207]
2025-09-16 15:16:51,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 583.0, 1000.0, 475.0, 1000.0, 1000.0, 1000.0, 719.0, 957.0]
2025-09-16 15:16:51,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 28 minutes, 4 seconds)
2025-09-16 15:19:01,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:19:15,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 4966.49072 ± 663.161
2025-09-16 15:19:15,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5168.7764, 5193.402, 5135.422, 2981.53, 5281.9824, 5118.184, 5185.6143, 5196.735, 5238.076, 5165.1895]
2025-09-16 15:19:15,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 577.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:19:15,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 25 minutes, 32 seconds)
2025-09-16 15:21:15,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:21:27,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 4306.70410 ± 1148.994
2025-09-16 15:21:27,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5204.933, 5269.95, 5243.289, 2560.9902, 5241.745, 5048.9575, 3264.5757, 5360.0137, 2648.4463, 3224.1409]
2025-09-16 15:21:27,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 491.0, 1000.0, 1000.0, 623.0, 1000.0, 512.0, 605.0]
2025-09-16 15:21:27,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 23 minutes, 12 seconds)
2025-09-16 15:23:33,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:23:49,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5213.31445 ± 55.104
2025-09-16 15:23:49,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5287.8896, 5194.0103, 5189.7505, 5210.21, 5167.911, 5212.3545, 5259.0356, 5264.8916, 5256.8804, 5090.211]
2025-09-16 15:23:49,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:23:49,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 20 minutes, 59 seconds)
2025-09-16 15:25:50,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:26:04,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 4663.92822 ± 1465.943
2025-09-16 15:26:04,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5291.776, 3552.8394, 5345.999, 5284.329, 5313.8027, 559.79266, 5289.633, 5301.348, 5295.435, 5404.3267]
2025-09-16 15:26:04,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 672.0, 1000.0, 1000.0, 1000.0, 101.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:26:04,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 18 minutes, 32 seconds)
2025-09-16 15:28:11,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:28:24,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 4611.88818 ± 870.505
2025-09-16 15:28:24,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [3582.971, 5255.923, 5212.279, 5127.3975, 5121.285, 2595.756, 5244.245, 3982.5403, 4825.9883, 5170.4966]
2025-09-16 15:28:24,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [687.0, 1000.0, 1000.0, 1000.0, 1000.0, 493.0, 1000.0, 789.0, 924.0, 1000.0]
2025-09-16 15:28:24,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 16 minutes, 10 seconds)
2025-09-16 15:30:26,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:30:42,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5160.97119 ± 343.207
2025-09-16 15:30:42,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5235.3506, 4138.3896, 5231.915, 5270.884, 5308.3555, 5297.7925, 5253.6343, 5338.898, 5211.0615, 5323.4297]
2025-09-16 15:30:42,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 774.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:30:42,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 13 minutes, 44 seconds)
2025-09-16 15:32:48,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:33:00,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 4248.70459 ± 1312.442
2025-09-16 15:33:00,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [2378.3804, 5130.064, 5138.7886, 5048.614, 3134.1826, 4747.1753, 1486.5159, 5074.204, 5204.9014, 5144.2183]
2025-09-16 15:33:00,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [453.0, 1000.0, 1000.0, 1000.0, 626.0, 912.0, 290.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:33:00,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 11 minutes, 32 seconds)
2025-09-16 15:35:11,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:35:27,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5196.19043 ± 110.970
2025-09-16 15:35:27,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5203.4595, 4877.7, 5215.916, 5273.073, 5198.8364, 5243.7534, 5206.1963, 5233.9585, 5303.2734, 5205.743]
2025-09-16 15:35:27,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 908.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:35:27,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 9 minutes, 18 seconds)
2025-09-16 15:37:30,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:37:45,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 4639.58301 ± 1347.367
2025-09-16 15:37:45,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5318.5654, 5315.683, 1328.1829, 5362.3584, 2705.7087, 5248.3833, 5304.094, 5279.678, 5265.2056, 5267.973]
2025-09-16 15:37:45,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 241.0, 1000.0, 504.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:37:45,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 7 minutes)
2025-09-16 15:39:45,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:40:00,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 5110.73633 ± 390.454
2025-09-16 15:40:00,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5271.185, 5258.6777, 5301.8105, 3977.1077, 5286.868, 5138.742, 5340.482, 5294.4785, 4983.361, 5254.652]
2025-09-16 15:40:00,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 753.0, 1000.0, 1000.0, 1000.0, 1000.0, 962.0, 1000.0]
2025-09-16 15:40:00,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 38 seconds)
2025-09-16 15:42:07,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:42:19,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 4371.71875 ± 1411.655
2025-09-16 15:42:19,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [4100.505, 5379.0513, 5403.9795, 5285.296, 5331.701, 4362.6807, 5235.426, 1705.01, 1655.937, 5257.6]
2025-09-16 15:42:19,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [753.0, 1000.0, 1000.0, 1000.0, 1000.0, 817.0, 1000.0, 303.0, 307.0, 1000.0]
2025-09-16 15:42:19,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 19 seconds)
2025-09-16 15:44:16,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:44:29,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1221 [DEBUG]: Total Reward: 4706.04980 ± 1180.744
2025-09-16 15:44:29,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1222 [DEBUG]: All rewards: [5291.4956, 4014.6074, 5226.521, 5124.93, 5273.668, 5208.8813, 5267.649, 1339.141, 5300.901, 5012.704]
2025-09-16 15:44:29,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 784.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 269.0, 1000.0, 1000.0]
2025-09-16 15:44:29,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille100-humanoid):1251 [DEBUG]: Training session finished
