2025-09-16 12:05:16,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.050-delay_9
2025-09-16 12:05:16,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.050-delay_9
2025-09-16 12:05:16,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'9': <latency_env.delayed_mdp.ConstantDelay object at 0x152487ea0710>}
2025-09-16 12:05:16,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 12:05:16,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 12:05:16,903 baseline-bpql-noisepromille50-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=529, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 12:05:16,903 baseline-bpql-noisepromille50-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 12:05:20,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 12:05:20,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 12:07:03,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:07:04,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 240.24028 ± 38.755
2025-09-16 12:07:04,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [211.36684, 196.73032, 264.2042, 201.55194, 260.6072, 245.96887, 220.35811, 208.25366, 265.1802, 328.1814]
2025-09-16 12:07:04,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [42.0, 38.0, 52.0, 40.0, 50.0, 49.0, 43.0, 41.0, 54.0, 69.0]
2025-09-16 12:07:04,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (240.24) for latency 9
2025-09-16 12:07:04,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 52 minutes, 14 seconds)
2025-09-16 12:08:57,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:08:57,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 358.52106 ± 35.588
2025-09-16 12:08:57,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [385.86395, 381.23096, 334.01053, 399.95517, 401.28043, 292.75156, 318.45386, 330.6126, 359.17834, 381.8732]
2025-09-16 12:08:57,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 72.0, 66.0, 76.0, 80.0, 56.0, 60.0, 62.0, 68.0, 73.0]
2025-09-16 12:08:57,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (358.52) for latency 9
2025-09-16 12:08:57,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 57 minutes, 56 seconds)
2025-09-16 12:10:49,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:10:50,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 401.12262 ± 70.170
2025-09-16 12:10:50,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [348.63727, 396.71033, 441.83063, 317.09277, 337.90805, 543.3682, 401.62665, 394.882, 495.5631, 333.60718]
2025-09-16 12:10:50,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [69.0, 82.0, 85.0, 64.0, 73.0, 112.0, 85.0, 73.0, 101.0, 69.0]
2025-09-16 12:10:50,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (401.12) for latency 9
2025-09-16 12:10:50,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 58 minutes, 17 seconds)
2025-09-16 12:12:46,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:12:47,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 442.46597 ± 75.457
2025-09-16 12:12:47,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [417.05032, 577.2566, 465.92465, 366.4597, 470.3323, 452.79242, 293.57767, 507.84183, 483.63962, 389.78455]
2025-09-16 12:12:47,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 112.0, 94.0, 71.0, 90.0, 85.0, 62.0, 94.0, 100.0, 75.0]
2025-09-16 12:12:47,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (442.47) for latency 9
2025-09-16 12:12:47,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 58 minutes, 54 seconds)
2025-09-16 12:14:42,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:14:43,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 453.79752 ± 82.355
2025-09-16 12:14:43,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [609.61676, 532.1042, 473.8107, 332.47815, 458.24463, 531.83136, 347.39175, 393.12845, 426.7361, 432.63263]
2025-09-16 12:14:43,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [125.0, 112.0, 94.0, 64.0, 100.0, 116.0, 64.0, 73.0, 83.0, 94.0]
2025-09-16 12:14:43,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (453.80) for latency 9
2025-09-16 12:14:43,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 58 minutes, 28 seconds)
2025-09-16 12:16:37,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:16:38,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 447.06836 ± 126.501
2025-09-16 12:16:38,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [494.78275, 350.26318, 342.8361, 498.23682, 452.37585, 328.2624, 446.65503, 306.99768, 487.8161, 762.45764]
2025-09-16 12:16:38,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 74.0, 63.0, 109.0, 98.0, 73.0, 84.0, 61.0, 109.0, 163.0]
2025-09-16 12:16:38,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 59 minutes, 57 seconds)
2025-09-16 12:18:33,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:18:35,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 462.67822 ± 88.338
2025-09-16 12:18:35,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [368.78647, 432.00766, 618.4248, 544.47314, 370.04782, 392.34888, 500.5605, 471.4064, 569.93054, 358.7957]
2025-09-16 12:18:35,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [69.0, 83.0, 132.0, 115.0, 70.0, 88.0, 93.0, 104.0, 108.0, 79.0]
2025-09-16 12:18:35,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (462.68) for latency 9
2025-09-16 12:18:35,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 58 minutes, 54 seconds)
2025-09-16 12:20:30,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:20:31,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 489.08530 ± 64.945
2025-09-16 12:20:31,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [596.7595, 411.77737, 446.09625, 524.9634, 513.22095, 589.3159, 391.08752, 478.76257, 482.8845, 455.98547]
2025-09-16 12:20:31,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [111.0, 77.0, 82.0, 111.0, 110.0, 116.0, 81.0, 89.0, 95.0, 84.0]
2025-09-16 12:20:31,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (489.09) for latency 9
2025-09-16 12:20:31,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 58 minutes)
2025-09-16 12:22:26,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:22:27,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 513.97083 ± 60.925
2025-09-16 12:22:27,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [392.87488, 460.05743, 586.89453, 596.9743, 566.3595, 518.1045, 463.52176, 552.99615, 491.68625, 510.239]
2025-09-16 12:22:27,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 85.0, 111.0, 110.0, 106.0, 106.0, 92.0, 102.0, 102.0, 95.0]
2025-09-16 12:22:27,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (513.97) for latency 9
2025-09-16 12:22:27,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 56 minutes, 5 seconds)
2025-09-16 12:24:23,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:24:25,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 617.86975 ± 220.903
2025-09-16 12:24:25,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [477.55722, 1150.8337, 854.63965, 413.0603, 474.65668, 702.7654, 580.9161, 480.465, 422.26535, 621.5387]
2025-09-16 12:24:25,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 242.0, 170.0, 89.0, 91.0, 135.0, 110.0, 91.0, 78.0, 120.0]
2025-09-16 12:24:25,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (617.87) for latency 9
2025-09-16 12:24:25,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 54 minutes, 34 seconds)
2025-09-16 12:26:20,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:26:21,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 490.23154 ± 135.297
2025-09-16 12:26:21,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [490.0749, 384.33218, 375.06668, 334.6779, 489.7116, 430.93228, 524.5027, 399.94788, 726.3847, 746.68445]
2025-09-16 12:26:21,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 72.0, 71.0, 64.0, 91.0, 81.0, 99.0, 77.0, 142.0, 146.0]
2025-09-16 12:26:21,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 52 minutes, 59 seconds)
2025-09-16 12:28:17,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:28:19,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 539.44354 ± 70.722
2025-09-16 12:28:19,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [474.0346, 407.19852, 494.38174, 640.0313, 555.21564, 607.09686, 540.588, 508.2812, 527.38354, 640.2237]
2025-09-16 12:28:19,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [102.0, 76.0, 92.0, 124.0, 120.0, 117.0, 103.0, 95.0, 114.0, 123.0]
2025-09-16 12:28:19,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 51 minutes, 19 seconds)
2025-09-16 12:30:13,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:30:15,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 556.87915 ± 157.514
2025-09-16 12:30:15,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [561.86096, 592.80273, 363.25226, 772.04785, 444.41672, 352.8006, 594.9867, 431.00504, 856.21326, 599.4051]
2025-09-16 12:30:15,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 112.0, 81.0, 153.0, 85.0, 65.0, 113.0, 80.0, 162.0, 113.0]
2025-09-16 12:30:15,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 49 minutes, 18 seconds)
2025-09-16 12:32:09,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:32:11,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 581.36340 ± 163.028
2025-09-16 12:32:11,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [964.9443, 694.473, 400.56906, 429.7153, 567.205, 620.45544, 496.4646, 697.6791, 512.11816, 430.01007]
2025-09-16 12:32:11,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [202.0, 147.0, 74.0, 81.0, 108.0, 123.0, 112.0, 142.0, 102.0, 84.0]
2025-09-16 12:32:11,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 47 minutes, 18 seconds)
2025-09-16 12:34:06,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:34:07,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 509.93082 ± 94.699
2025-09-16 12:34:07,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [410.2597, 611.83124, 539.3306, 522.08765, 652.03125, 368.66455, 450.26508, 610.1338, 393.2797, 541.4242]
2025-09-16 12:34:07,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 117.0, 104.0, 97.0, 141.0, 81.0, 85.0, 119.0, 83.0, 106.0]
2025-09-16 12:34:07,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 44 minutes, 56 seconds)
2025-09-16 12:36:02,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:36:03,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 536.88757 ± 72.268
2025-09-16 12:36:03,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [442.67255, 557.5935, 492.9771, 425.35876, 623.6739, 617.1325, 514.2198, 580.64813, 482.0605, 632.539]
2025-09-16 12:36:03,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 105.0, 91.0, 80.0, 123.0, 117.0, 100.0, 108.0, 90.0, 122.0]
2025-09-16 12:36:03,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 42 minutes, 51 seconds)
2025-09-16 12:37:57,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:37:58,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 506.13681 ± 98.322
2025-09-16 12:37:58,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [629.2073, 539.2876, 669.156, 553.7853, 510.59903, 358.32394, 465.71198, 535.7453, 447.61377, 351.93817]
2025-09-16 12:37:58,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [119.0, 120.0, 126.0, 103.0, 102.0, 68.0, 87.0, 104.0, 83.0, 80.0]
2025-09-16 12:37:58,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 40 minutes, 21 seconds)
2025-09-16 12:39:52,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:39:54,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 552.70306 ± 99.346
2025-09-16 12:39:54,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [512.11127, 447.15448, 585.78864, 591.1738, 655.19354, 369.7933, 627.7453, 723.9784, 496.1348, 517.95764]
2025-09-16 12:39:54,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [112.0, 84.0, 123.0, 120.0, 139.0, 75.0, 124.0, 138.0, 94.0, 97.0]
2025-09-16 12:39:54,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 38 minutes, 19 seconds)
2025-09-16 12:41:48,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:41:50,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 590.60162 ± 133.021
2025-09-16 12:41:50,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [311.0253, 426.5987, 697.135, 692.7191, 554.54346, 683.19727, 595.7025, 691.7402, 743.67694, 509.67813]
2025-09-16 12:41:50,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [58.0, 79.0, 137.0, 150.0, 104.0, 128.0, 110.0, 131.0, 143.0, 96.0]
2025-09-16 12:41:50,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 36 minutes, 20 seconds)
2025-09-16 12:43:44,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:43:45,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 541.98926 ± 75.888
2025-09-16 12:43:45,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [500.6845, 581.6818, 528.9256, 577.8522, 641.1491, 403.0246, 515.4405, 435.2545, 611.8525, 624.0272]
2025-09-16 12:43:45,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [107.0, 111.0, 101.0, 109.0, 133.0, 74.0, 94.0, 81.0, 118.0, 120.0]
2025-09-16 12:43:45,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 34 minutes, 10 seconds)
2025-09-16 12:45:40,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:45:41,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 530.63147 ± 131.883
2025-09-16 12:45:41,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [472.50296, 677.9568, 324.11072, 569.6888, 685.7137, 516.1193, 440.5135, 582.17, 332.55515, 704.98364]
2025-09-16 12:45:41,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 149.0, 66.0, 107.0, 147.0, 113.0, 88.0, 120.0, 65.0, 136.0]
2025-09-16 12:45:41,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 32 minutes, 13 seconds)
2025-09-16 12:47:35,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:47:37,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 642.99823 ± 83.909
2025-09-16 12:47:37,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [625.02374, 611.57587, 587.59375, 593.8126, 828.8944, 532.83954, 764.3157, 635.2228, 652.87604, 597.82825]
2025-09-16 12:47:37,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [120.0, 127.0, 127.0, 116.0, 161.0, 114.0, 146.0, 131.0, 122.0, 114.0]
2025-09-16 12:47:37,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (643.00) for latency 9
2025-09-16 12:47:37,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 30 minutes, 23 seconds)
2025-09-16 12:49:31,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:49:33,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 692.02533 ± 168.819
2025-09-16 12:49:33,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [673.1624, 482.09613, 1155.4467, 698.7045, 667.59216, 736.1468, 666.0961, 651.443, 626.3645, 563.20184]
2025-09-16 12:49:33,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [136.0, 102.0, 227.0, 132.0, 128.0, 163.0, 130.0, 123.0, 135.0, 122.0]
2025-09-16 12:49:33,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (692.03) for latency 9
2025-09-16 12:49:33,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 28 minutes, 38 seconds)
2025-09-16 12:51:26,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:51:28,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 568.22913 ± 81.699
2025-09-16 12:51:28,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [558.8571, 531.0318, 471.59973, 445.90134, 723.43066, 650.71454, 559.6903, 512.81256, 577.20465, 651.04865]
2025-09-16 12:51:28,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 105.0, 106.0, 81.0, 137.0, 121.0, 104.0, 97.0, 127.0, 142.0]
2025-09-16 12:51:28,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 26 minutes, 23 seconds)
2025-09-16 12:53:22,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:53:24,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 680.99939 ± 187.653
2025-09-16 12:53:24,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [816.7285, 595.22296, 1152.2656, 449.96902, 659.9075, 562.38513, 784.00885, 605.5028, 546.04065, 637.9629]
2025-09-16 12:53:24,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [178.0, 113.0, 244.0, 84.0, 128.0, 121.0, 154.0, 118.0, 109.0, 120.0]
2025-09-16 12:53:24,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 24 minutes, 38 seconds)
2025-09-16 12:55:19,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:55:20,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 681.84290 ± 187.595
2025-09-16 12:55:20,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [470.22253, 752.34674, 767.8127, 793.8352, 477.214, 482.8482, 656.50214, 568.9099, 737.65405, 1111.0834]
2025-09-16 12:55:20,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 145.0, 144.0, 157.0, 89.0, 92.0, 140.0, 109.0, 144.0, 231.0]
2025-09-16 12:55:20,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 22 minutes, 53 seconds)
2025-09-16 12:57:15,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:57:17,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 726.82349 ± 171.506
2025-09-16 12:57:17,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [634.0853, 406.76303, 847.76855, 540.5279, 974.61273, 684.7454, 816.4296, 665.8147, 728.0811, 969.4068]
2025-09-16 12:57:17,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [119.0, 76.0, 167.0, 102.0, 186.0, 139.0, 156.0, 128.0, 137.0, 186.0]
2025-09-16 12:57:17,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (726.82) for latency 9
2025-09-16 12:57:17,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 21 minutes, 11 seconds)
2025-09-16 12:59:12,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 12:59:14,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 733.34973 ± 173.938
2025-09-16 12:59:14,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [894.3058, 659.85974, 814.2427, 537.1794, 734.6091, 754.8576, 612.0087, 1145.5933, 600.9259, 579.9148]
2025-09-16 12:59:14,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [189.0, 143.0, 154.0, 101.0, 137.0, 147.0, 117.0, 220.0, 115.0, 110.0]
2025-09-16 12:59:14,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (733.35) for latency 9
2025-09-16 12:59:14,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 19 minutes, 25 seconds)
2025-09-16 13:01:08,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:01:10,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 720.98840 ± 182.847
2025-09-16 13:01:10,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [692.30365, 1026.2611, 435.1408, 861.2975, 955.7256, 503.6908, 635.9378, 811.20605, 712.804, 575.51697]
2025-09-16 13:01:10,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [132.0, 210.0, 82.0, 186.0, 184.0, 93.0, 118.0, 155.0, 136.0, 121.0]
2025-09-16 13:01:10,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 17 minutes, 49 seconds)
2025-09-16 13:03:04,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:03:06,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 809.35663 ± 209.300
2025-09-16 13:03:06,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1061.0817, 486.52698, 784.9807, 482.91684, 652.39233, 938.28656, 821.1514, 1044.4387, 759.1696, 1062.6215]
2025-09-16 13:03:06,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [203.0, 89.0, 147.0, 95.0, 122.0, 189.0, 159.0, 202.0, 141.0, 209.0]
2025-09-16 13:03:06,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (809.36) for latency 9
2025-09-16 13:03:06,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 15 minutes, 51 seconds)
2025-09-16 13:05:01,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:05:03,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 756.41138 ± 158.891
2025-09-16 13:05:03,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [734.10754, 608.8629, 619.5698, 856.904, 1062.4786, 751.5297, 584.06067, 917.0324, 558.73975, 870.8283]
2025-09-16 13:05:03,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [139.0, 131.0, 129.0, 167.0, 200.0, 139.0, 113.0, 178.0, 114.0, 166.0]
2025-09-16 13:05:03,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 13 minutes, 55 seconds)
2025-09-16 13:06:58,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:07:00,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 775.83667 ± 175.931
2025-09-16 13:07:00,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [569.28204, 881.5975, 837.19214, 963.86163, 931.27185, 1076.9302, 641.25433, 642.1198, 674.57684, 540.2809]
2025-09-16 13:07:00,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [108.0, 183.0, 160.0, 184.0, 183.0, 212.0, 119.0, 137.0, 125.0, 99.0]
2025-09-16 13:07:00,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 12 minutes, 10 seconds)
2025-09-16 13:08:53,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:08:55,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 693.64575 ± 160.922
2025-09-16 13:08:55,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [649.21796, 446.48062, 637.7356, 832.5873, 951.8645, 500.3614, 832.66907, 564.1276, 650.40894, 871.0045]
2025-09-16 13:08:55,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [123.0, 84.0, 120.0, 159.0, 187.0, 91.0, 161.0, 107.0, 124.0, 169.0]
2025-09-16 13:08:55,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 9 minutes, 47 seconds)
2025-09-16 13:10:50,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:10:52,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 831.72864 ± 190.137
2025-09-16 13:10:52,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1149.7772, 1150.6338, 875.8243, 505.4041, 686.24225, 688.38434, 803.2055, 871.8039, 753.6312, 832.3794]
2025-09-16 13:10:52,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [216.0, 228.0, 168.0, 108.0, 127.0, 130.0, 154.0, 185.0, 147.0, 158.0]
2025-09-16 13:10:52,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (831.73) for latency 9
2025-09-16 13:10:52,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 8 minutes, 2 seconds)
2025-09-16 13:12:48,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:12:50,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 843.15076 ± 336.804
2025-09-16 13:12:50,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [810.74866, 749.72516, 1721.8003, 791.51733, 948.8876, 404.11066, 848.27637, 666.65295, 526.40027, 963.38806]
2025-09-16 13:12:50,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [155.0, 149.0, 345.0, 166.0, 185.0, 75.0, 163.0, 123.0, 97.0, 183.0]
2025-09-16 13:12:50,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (843.15) for latency 9
2025-09-16 13:12:50,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 6 minutes, 26 seconds)
2025-09-16 13:14:43,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:14:46,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 857.99542 ± 246.368
2025-09-16 13:14:46,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [609.4147, 1363.5775, 1147.8944, 861.03705, 643.15607, 1093.744, 588.7703, 738.48157, 752.16614, 781.7123]
2025-09-16 13:14:46,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [128.0, 279.0, 218.0, 177.0, 131.0, 219.0, 111.0, 140.0, 140.0, 150.0]
2025-09-16 13:14:46,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (858.00) for latency 9
2025-09-16 13:14:46,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 4 minutes, 23 seconds)
2025-09-16 13:16:41,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:16:43,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 729.52313 ± 152.621
2025-09-16 13:16:43,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [751.7663, 534.11774, 961.9243, 774.95593, 571.5852, 819.38275, 821.2061, 888.0035, 709.682, 462.60748]
2025-09-16 13:16:43,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [155.0, 102.0, 196.0, 154.0, 112.0, 180.0, 169.0, 164.0, 136.0, 90.0]
2025-09-16 13:16:43,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 2 minutes, 21 seconds)
2025-09-16 13:18:38,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:18:40,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 818.90259 ± 258.536
2025-09-16 13:18:40,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [794.49054, 1146.2589, 700.2662, 695.10516, 1362.9098, 547.92584, 590.8535, 675.82983, 1048.2833, 627.10284]
2025-09-16 13:18:40,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [149.0, 219.0, 131.0, 132.0, 264.0, 99.0, 111.0, 145.0, 198.0, 122.0]
2025-09-16 13:18:40,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 58 seconds)
2025-09-16 13:20:34,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:20:36,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 762.86633 ± 122.972
2025-09-16 13:20:36,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [906.0363, 904.1579, 691.3801, 946.72577, 783.3528, 627.3716, 651.0966, 627.2591, 641.73895, 849.5446]
2025-09-16 13:20:36,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [174.0, 168.0, 128.0, 179.0, 145.0, 120.0, 137.0, 116.0, 124.0, 160.0]
2025-09-16 13:20:36,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 58 minutes, 39 seconds)
2025-09-16 13:22:30,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:22:32,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 854.03418 ± 227.942
2025-09-16 13:22:32,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [873.836, 683.0998, 985.8269, 517.2483, 645.70764, 785.07635, 818.1463, 762.50006, 1157.2655, 1311.635]
2025-09-16 13:22:32,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [170.0, 144.0, 196.0, 97.0, 123.0, 154.0, 154.0, 143.0, 218.0, 248.0]
2025-09-16 13:22:32,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 56 minutes, 24 seconds)
2025-09-16 13:24:27,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:24:30,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 949.04755 ± 401.173
2025-09-16 13:24:30,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [834.70325, 1307.709, 762.29266, 991.3751, 714.50824, 797.09344, 490.42377, 1890.0853, 506.28143, 1196.0038]
2025-09-16 13:24:30,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [167.0, 259.0, 147.0, 210.0, 133.0, 151.0, 107.0, 377.0, 113.0, 235.0]
2025-09-16 13:24:30,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (949.05) for latency 9
2025-09-16 13:24:30,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 54 minutes, 50 seconds)
2025-09-16 13:26:25,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:26:28,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 903.79053 ± 421.970
2025-09-16 13:26:28,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [788.20233, 650.00433, 471.52707, 697.7956, 854.02374, 812.4655, 932.6878, 786.45465, 2106.679, 938.0657]
2025-09-16 13:26:28,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [154.0, 122.0, 88.0, 127.0, 164.0, 158.0, 189.0, 152.0, 424.0, 179.0]
2025-09-16 13:26:28,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 53 minutes, 7 seconds)
2025-09-16 13:28:22,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:28:24,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1004.07471 ± 305.148
2025-09-16 13:28:24,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1285.2681, 1062.4668, 590.00684, 1620.9163, 702.627, 944.99896, 964.1338, 940.00574, 654.00323, 1276.3202]
2025-09-16 13:28:24,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [247.0, 210.0, 122.0, 319.0, 148.0, 178.0, 189.0, 182.0, 122.0, 245.0]
2025-09-16 13:28:24,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (1004.07) for latency 9
2025-09-16 13:28:24,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 50 minutes, 58 seconds)
2025-09-16 13:30:23,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:30:25,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 760.57990 ± 260.022
2025-09-16 13:30:25,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [524.10175, 611.14325, 835.278, 534.255, 587.6616, 908.4977, 1330.8931, 582.561, 602.97186, 1088.4363]
2025-09-16 13:30:25,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [96.0, 131.0, 170.0, 110.0, 107.0, 192.0, 261.0, 123.0, 114.0, 207.0]
2025-09-16 13:30:25,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 49 minutes, 56 seconds)
2025-09-16 13:32:19,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:32:23,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1567.27148 ± 715.103
2025-09-16 13:32:23,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [928.58734, 2200.9568, 2424.8367, 1510.2821, 974.8738, 2311.9734, 587.37354, 1026.6472, 1065.0398, 2642.144]
2025-09-16 13:32:23,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [197.0, 437.0, 479.0, 304.0, 198.0, 460.0, 110.0, 203.0, 234.0, 513.0]
2025-09-16 13:32:23,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (1567.27) for latency 9
2025-09-16 13:32:24,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 48 minutes, 29 seconds)
2025-09-16 13:34:16,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:34:19,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 919.40125 ± 251.100
2025-09-16 13:34:19,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1325.974, 649.7545, 1372.7969, 993.221, 816.6238, 882.37854, 1053.7302, 764.0367, 648.875, 686.6223]
2025-09-16 13:34:19,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [265.0, 126.0, 287.0, 185.0, 158.0, 175.0, 221.0, 147.0, 124.0, 145.0]
2025-09-16 13:34:19,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 46 minutes, 2 seconds)
2025-09-16 13:36:14,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:36:18,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1522.11829 ± 659.400
2025-09-16 13:36:18,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1020.14374, 1253.3654, 860.41705, 2214.2764, 1089.6863, 2572.9114, 1262.8431, 1149.2236, 2713.1575, 1085.158]
2025-09-16 13:36:18,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [190.0, 233.0, 164.0, 438.0, 209.0, 511.0, 237.0, 218.0, 563.0, 207.0]
2025-09-16 13:36:18,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 44 minutes, 19 seconds)
2025-09-16 13:38:16,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:38:19,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1178.75378 ± 206.022
2025-09-16 13:38:19,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1384.4088, 1055.4816, 969.9353, 930.94995, 1255.2578, 1292.6249, 1581.1737, 1076.894, 1299.298, 941.51373]
2025-09-16 13:38:19,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [265.0, 198.0, 188.0, 176.0, 254.0, 241.0, 297.0, 203.0, 243.0, 180.0]
2025-09-16 13:38:19,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 43 minutes, 6 seconds)
2025-09-16 13:40:13,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:40:17,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1526.61450 ± 668.459
2025-09-16 13:40:17,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1913.6027, 1770.4863, 3067.3882, 810.03864, 989.8231, 1619.1099, 1129.1439, 2024.5779, 1131.5791, 810.3957]
2025-09-16 13:40:17,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [395.0, 367.0, 591.0, 174.0, 188.0, 313.0, 219.0, 396.0, 212.0, 173.0]
2025-09-16 13:40:17,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 40 minutes, 43 seconds)
2025-09-16 13:42:13,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:42:17,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1459.19885 ± 374.700
2025-09-16 13:42:17,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [848.9856, 1447.2837, 1124.218, 1445.1067, 1074.3594, 1669.3896, 1950.7448, 1960.0372, 1869.5297, 1202.3345]
2025-09-16 13:42:17,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [166.0, 282.0, 246.0, 285.0, 235.0, 347.0, 377.0, 381.0, 370.0, 235.0]
2025-09-16 13:42:17,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 38 minutes, 55 seconds)
2025-09-16 13:44:17,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:44:22,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1826.29102 ± 680.418
2025-09-16 13:44:22,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [998.5609, 1112.148, 2001.7975, 2196.143, 2473.0146, 1593.774, 3296.8206, 1065.7968, 1611.3137, 1913.5409]
2025-09-16 13:44:22,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [223.0, 204.0, 404.0, 451.0, 485.0, 308.0, 670.0, 233.0, 301.0, 383.0]
2025-09-16 13:44:22,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (1826.29) for latency 9
2025-09-16 13:44:22,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 38 minutes, 35 seconds)
2025-09-16 13:46:11,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:46:17,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 2137.06470 ± 1216.093
2025-09-16 13:46:17,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1882.9225, 3513.394, 2440.4094, 1772.2338, 1756.9739, 788.87085, 4965.8755, 777.718, 1250.7186, 2221.53]
2025-09-16 13:46:17,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [390.0, 705.0, 494.0, 341.0, 352.0, 151.0, 1000.0, 164.0, 231.0, 454.0]
2025-09-16 13:46:17,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (2137.06) for latency 9
2025-09-16 13:46:17,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 35 minutes, 49 seconds)
2025-09-16 13:48:16,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:48:24,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 3098.45532 ± 1008.066
2025-09-16 13:48:24,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [3821.5217, 4188.605, 2566.6934, 1276.8689, 2677.42, 3763.3103, 1435.9938, 3842.3037, 3409.8115, 4002.0261]
2025-09-16 13:48:24,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [741.0, 825.0, 485.0, 259.0, 513.0, 729.0, 272.0, 745.0, 666.0, 764.0]
2025-09-16 13:48:24,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (3098.46) for latency 9
2025-09-16 13:48:24,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 34 minutes, 48 seconds)
2025-09-16 13:50:17,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:50:24,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 2436.39722 ± 1045.848
2025-09-16 13:50:24,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [2659.6248, 2063.579, 718.89636, 2942.7944, 1537.1127, 2295.1577, 1264.187, 3915.5771, 2790.9644, 4176.077]
2025-09-16 13:50:24,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [524.0, 405.0, 144.0, 590.0, 292.0, 433.0, 243.0, 745.0, 551.0, 798.0]
2025-09-16 13:50:24,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 33 minutes, 1 second)
2025-09-16 13:52:30,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:52:37,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 2317.21680 ± 1303.419
2025-09-16 13:52:37,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [2202.6099, 942.7116, 1884.5359, 2576.7253, 789.0351, 4124.411, 2024.2731, 2264.2832, 5137.1885, 1226.3939]
2025-09-16 13:52:37,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [446.0, 184.0, 366.0, 507.0, 157.0, 807.0, 395.0, 461.0, 1000.0, 236.0]
2025-09-16 13:52:37,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 32 minutes, 55 seconds)
2025-09-16 13:54:24,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:54:29,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1809.85059 ± 1329.791
2025-09-16 13:54:29,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1353.5829, 781.5604, 848.339, 610.8349, 983.21454, 1134.2926, 4754.5737, 3821.4048, 2205.3865, 1605.3173]
2025-09-16 13:54:29,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [258.0, 155.0, 163.0, 126.0, 213.0, 228.0, 1000.0, 783.0, 469.0, 345.0]
2025-09-16 13:54:29,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 28 minutes, 59 seconds)
2025-09-16 13:56:34,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:56:43,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 3163.44775 ± 1821.598
2025-09-16 13:56:43,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5064.5605, 5070.567, 5068.5522, 4978.671, 2300.3906, 1621.0842, 695.29474, 4411.967, 657.88794, 1765.503]
2025-09-16 13:56:43,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 452.0, 329.0, 151.0, 872.0, 145.0, 359.0]
2025-09-16 13:56:43,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (3163.45) for latency 9
2025-09-16 13:56:43,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 29 minutes, 43 seconds)
2025-09-16 13:58:32,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 13:58:42,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 3652.10864 ± 1660.100
2025-09-16 13:58:42,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5192.0273, 2055.5525, 5224.8677, 2856.0286, 5202.8794, 1035.1138, 5329.451, 5169.2563, 2971.7559, 1484.1581]
2025-09-16 13:58:42,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 420.0, 1000.0, 549.0, 1000.0, 204.0, 1000.0, 1000.0, 592.0, 316.0]
2025-09-16 13:58:42,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (3652.11) for latency 9
2025-09-16 13:58:42,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 26 minutes, 26 seconds)
2025-09-16 14:00:40,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:00:47,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 2464.94189 ± 1464.573
2025-09-16 14:00:47,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [4867.308, 2987.9053, 4901.62, 3551.8218, 2218.8762, 979.0197, 1029.3167, 1795.442, 938.01715, 1380.0914]
2025-09-16 14:00:47,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 589.0, 1000.0, 672.0, 439.0, 202.0, 222.0, 342.0, 183.0, 267.0]
2025-09-16 14:00:47,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 25 minutes, 12 seconds)
2025-09-16 14:02:43,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:02:52,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 3031.66284 ± 1580.378
2025-09-16 14:02:52,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1387.6945, 2848.4548, 1358.1827, 2081.753, 1146.84, 5007.0605, 1974.4852, 5153.3555, 4551.2593, 4807.543]
2025-09-16 14:02:52,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [264.0, 544.0, 286.0, 395.0, 221.0, 1000.0, 390.0, 1000.0, 886.0, 1000.0]
2025-09-16 14:02:52,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 22 minutes, 1 second)
2025-09-16 14:04:47,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:04:53,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 2441.26978 ± 921.431
2025-09-16 14:04:53,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [3394.1357, 1494.4895, 3033.9917, 3435.6292, 707.26715, 2222.069, 1622.7141, 3634.2485, 2119.603, 2748.5483]
2025-09-16 14:04:53,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [647.0, 276.0, 612.0, 679.0, 155.0, 422.0, 325.0, 699.0, 429.0, 549.0]
2025-09-16 14:04:53,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 21 minutes, 7 seconds)
2025-09-16 14:06:52,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:07:05,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4273.93311 ± 1601.085
2025-09-16 14:07:05,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5144.296, 5049.133, 5082.1914, 5010.804, 4990.8706, 791.45886, 5104.398, 5104.982, 5085.4585, 1375.7434]
2025-09-16 14:07:05,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 173.0, 1000.0, 1000.0, 1000.0, 280.0]
2025-09-16 14:07:05,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (4273.93) for latency 9
2025-09-16 14:07:05,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 18 minutes, 44 seconds)
2025-09-16 14:08:59,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:09:10,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4190.37793 ± 1522.907
2025-09-16 14:09:10,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [4849.8765, 2231.3975, 2054.7788, 5453.0264, 5125.989, 5393.2964, 5302.8354, 1431.0813, 5329.696, 4731.8013]
2025-09-16 14:09:10,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [886.0, 396.0, 403.0, 1000.0, 928.0, 1000.0, 1000.0, 260.0, 1000.0, 868.0]
2025-09-16 14:09:10,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 17 minutes, 26 seconds)
2025-09-16 14:11:15,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:11:30,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5274.14600 ± 37.948
2025-09-16 14:11:30,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5279.8027, 5312.833, 5244.6934, 5313.8516, 5204.4277, 5231.0103, 5270.5786, 5301.1294, 5325.9106, 5257.2256]
2025-09-16 14:11:30,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:11:30,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (5274.15) for latency 9
2025-09-16 14:11:30,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 17 minutes, 8 seconds)
2025-09-16 14:13:18,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:13:31,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4690.00049 ± 802.769
2025-09-16 14:13:31,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5207.8984, 4546.6777, 5229.5264, 4993.67, 5109.729, 3422.1785, 2878.3237, 5234.4624, 5199.503, 5078.0317]
2025-09-16 14:13:31,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 866.0, 1000.0, 1000.0, 1000.0, 643.0, 572.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:13:31,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 14 minutes, 35 seconds)
2025-09-16 14:15:28,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:15:42,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4825.11865 ± 974.907
2025-09-16 14:15:42,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5192.455, 5180.199, 5118.163, 5187.815, 5172.3945, 1902.9847, 5081.7725, 5168.1436, 5172.452, 5074.8066]
2025-09-16 14:15:42,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 381.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:15:42,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 13 minutes, 34 seconds)
2025-09-16 14:17:49,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:18:03,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4657.20215 ± 1316.882
2025-09-16 14:18:03,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5242.819, 5228.0806, 809.35297, 5039.1245, 5230.5117, 5186.316, 4245.021, 5210.2363, 5375.2163, 5005.343]
2025-09-16 14:18:03,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 169.0, 1000.0, 1000.0, 1000.0, 835.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:18:03,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 12 minutes, 19 seconds)
2025-09-16 14:19:51,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:20:05,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5059.35059 ± 446.492
2025-09-16 14:20:05,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5288.57, 5276.2915, 5161.206, 5250.6025, 3782.5378, 5270.151, 5295.107, 5210.731, 5243.617, 4814.6924]
2025-09-16 14:20:05,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 981.0, 1000.0, 693.0, 1000.0, 1000.0, 1000.0, 1000.0, 922.0]
2025-09-16 14:20:05,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 9 minutes, 55 seconds)
2025-09-16 14:22:03,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:22:17,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5259.71143 ± 141.476
2025-09-16 14:22:17,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5309.402, 5313.8545, 5357.7593, 5295.1143, 5203.7964, 5334.74, 5346.5938, 5356.042, 5215.511, 4864.2993]
2025-09-16 14:22:17,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 912.0]
2025-09-16 14:22:17,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 6 minutes, 52 seconds)
2025-09-16 14:24:21,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:24:33,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4505.55566 ± 913.561
2025-09-16 14:24:33,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5277.508, 2292.727, 5068.9487, 3883.256, 5145.8315, 5215.6665, 3759.1267, 4603.724, 5322.4893, 4486.278]
2025-09-16 14:24:33,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 432.0, 989.0, 720.0, 1000.0, 1000.0, 748.0, 862.0, 1000.0, 842.0]
2025-09-16 14:24:33,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 6 minutes, 13 seconds)
2025-09-16 14:26:24,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:26:31,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 2656.89697 ± 1233.377
2025-09-16 14:26:31,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1476.751, 2348.5435, 1702.0581, 4903.704, 2739.6333, 1447.7433, 4030.6475, 1644.096, 4346.186, 1929.6085]
2025-09-16 14:26:31,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [274.0, 440.0, 324.0, 913.0, 516.0, 267.0, 738.0, 312.0, 823.0, 360.0]
2025-09-16 14:26:31,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 2 minutes, 43 seconds)
2025-09-16 14:28:27,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:28:39,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4138.09277 ± 1636.332
2025-09-16 14:28:39,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5409.3633, 2944.965, 5375.5527, 645.4786, 5321.4175, 5325.2344, 3564.529, 2133.2578, 5378.507, 5282.6206]
2025-09-16 14:28:39,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 553.0, 1000.0, 127.0, 1000.0, 1000.0, 659.0, 392.0, 1000.0, 1000.0]
2025-09-16 14:28:39,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 59 minutes, 21 seconds)
2025-09-16 14:30:35,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:30:49,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4928.23633 ± 690.479
2025-09-16 14:30:49,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5270.1123, 5285.3574, 5232.417, 3226.19, 5323.1646, 5188.504, 5236.263, 5354.1494, 5213.773, 3952.4326]
2025-09-16 14:30:49,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 622.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 738.0]
2025-09-16 14:30:49,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 57 minutes, 57 seconds)
2025-09-16 14:32:53,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:33:07,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5190.16650 ± 607.907
2025-09-16 14:33:07,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5404.0444, 5372.0337, 3375.1265, 5410.6943, 5463.108, 5464.1196, 5410.8076, 5239.1685, 5380.568, 5381.996]
2025-09-16 14:33:07,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 614.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:33:07,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 56 minutes, 17 seconds)
2025-09-16 14:35:04,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:35:16,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4600.49414 ± 1618.646
2025-09-16 14:35:16,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5447.135, 5477.5054, 5367.166, 5362.956, 5390.305, 1012.35846, 5339.724, 1749.7751, 5411.376, 5446.6436]
2025-09-16 14:35:16,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 986.0, 1000.0, 218.0, 1000.0, 347.0, 1000.0, 1000.0]
2025-09-16 14:35:16,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 53 minutes, 34 seconds)
2025-09-16 14:37:15,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:37:29,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5314.65186 ± 26.720
2025-09-16 14:37:29,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5305.6196, 5328.484, 5317.633, 5295.5967, 5254.0903, 5311.95, 5328.01, 5341.9766, 5357.2856, 5305.8755]
2025-09-16 14:37:29,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:37:29,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (5314.65) for latency 9
2025-09-16 14:37:29,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 52 minutes, 38 seconds)
2025-09-16 14:39:19,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:39:30,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4005.44775 ± 1544.732
2025-09-16 14:39:30,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5465.7373, 5513.4067, 5568.252, 1739.0834, 5060.6846, 2287.7573, 3662.2693, 5519.4824, 3501.438, 1736.3666]
2025-09-16 14:39:30,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 328.0, 914.0, 427.0, 667.0, 1000.0, 642.0, 317.0]
2025-09-16 14:39:30,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 49 minutes, 55 seconds)
2025-09-16 14:41:22,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:41:34,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4730.53223 ± 1434.819
2025-09-16 14:41:34,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5510.477, 5413.901, 5412.1475, 5484.5547, 804.6788, 5488.684, 4582.641, 5515.649, 5491.0986, 3601.4863]
2025-09-16 14:41:34,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 977.0, 1000.0, 163.0, 1000.0, 818.0, 1000.0, 1000.0, 656.0]
2025-09-16 14:41:34,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 47 minutes, 16 seconds)
2025-09-16 14:43:33,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:43:48,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5435.19238 ± 40.379
2025-09-16 14:43:48,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5363.104, 5456.3203, 5468.2856, 5459.1577, 5455.136, 5379.5557, 5457.5664, 5469.217, 5463.515, 5380.068]
2025-09-16 14:43:48,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:43:48,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (5435.19) for latency 9
2025-09-16 14:43:48,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 44 minutes, 52 seconds)
2025-09-16 14:45:43,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:45:58,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5318.24902 ± 39.250
2025-09-16 14:45:58,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5314.3193, 5224.4556, 5331.9346, 5349.0146, 5366.626, 5271.02, 5323.8755, 5342.857, 5331.561, 5326.826]
2025-09-16 14:45:58,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:45:58,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 42 minutes, 46 seconds)
2025-09-16 14:47:54,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:48:09,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5418.00146 ± 54.144
2025-09-16 14:48:09,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5378.0093, 5357.07, 5514.1343, 5431.47, 5400.063, 5440.548, 5496.206, 5331.188, 5409.0654, 5422.262]
2025-09-16 14:48:09,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:48:09,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 40 minutes, 31 seconds)
2025-09-16 14:50:06,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:50:21,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5318.19678 ± 30.387
2025-09-16 14:50:21,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5376.131, 5324.389, 5279.003, 5307.859, 5290.9004, 5307.4307, 5321.297, 5367.539, 5288.3657, 5319.055]
2025-09-16 14:50:21,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:50:21,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 39 minutes, 4 seconds)
2025-09-16 14:52:27,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:52:41,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5365.75537 ± 21.734
2025-09-16 14:52:41,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5403.7026, 5392.3467, 5341.693, 5357.1963, 5396.489, 5362.4395, 5350.579, 5357.2715, 5351.8237, 5344.016]
2025-09-16 14:52:41,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:52:41,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 37 minutes, 49 seconds)
2025-09-16 14:54:38,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:54:53,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5323.60303 ± 80.337
2025-09-16 14:54:53,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5347.7656, 5320.9937, 5116.7783, 5347.395, 5383.476, 5387.044, 5389.252, 5302.782, 5384.428, 5256.1147]
2025-09-16 14:54:53,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:54:53,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 35 minutes, 28 seconds)
2025-09-16 14:56:48,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:57:02,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5352.86719 ± 21.246
2025-09-16 14:57:02,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5326.512, 5327.828, 5358.0747, 5363.8735, 5358.519, 5385.746, 5321.2114, 5380.7275, 5361.8335, 5344.348]
2025-09-16 14:57:02,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:57:02,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 33 minutes, 13 seconds)
2025-09-16 14:58:55,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 14:59:10,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5363.05420 ± 15.076
2025-09-16 14:59:10,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5350.495, 5374.233, 5371.9155, 5349.578, 5391.437, 5356.744, 5374.014, 5372.6626, 5344.174, 5345.2847]
2025-09-16 14:59:10,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:59:10,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 30 minutes, 49 seconds)
2025-09-16 15:01:12,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:01:27,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5344.46143 ± 18.709
2025-09-16 15:01:27,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5355.8623, 5338.6206, 5314.714, 5366.105, 5364.194, 5355.791, 5310.7744, 5360.6445, 5336.973, 5340.936]
2025-09-16 15:01:27,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:01:27,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 28 minutes, 52 seconds)
2025-09-16 15:03:16,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:03:32,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5334.45312 ± 20.769
2025-09-16 15:03:32,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5352.177, 5314.7554, 5324.224, 5345.3853, 5328.218, 5315.1387, 5377.444, 5303.4355, 5336.644, 5347.1147]
2025-09-16 15:03:32,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:03:32,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 26 minutes)
2025-09-16 15:05:29,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:05:44,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5312.76855 ± 55.452
2025-09-16 15:05:44,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5286.1177, 5181.0117, 5349.5103, 5270.3257, 5285.62, 5326.8423, 5373.362, 5335.785, 5362.106, 5357.0034]
2025-09-16 15:05:44,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:05:44,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 23 minutes, 51 seconds)
2025-09-16 15:07:41,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:07:56,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5288.64062 ± 70.423
2025-09-16 15:07:56,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5284.613, 5328.0776, 5158.4487, 5322.6016, 5340.7285, 5204.8784, 5358.3457, 5350.6675, 5342.2114, 5195.83]
2025-09-16 15:07:56,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:07:56,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 21 minutes, 47 seconds)
2025-09-16 15:09:59,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:10:12,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4984.48291 ± 1373.948
2025-09-16 15:10:12,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5462.5405, 5490.549, 5430.407, 863.37646, 5484.0815, 5421.27, 5413.9546, 5419.278, 5425.139, 5434.2295]
2025-09-16 15:10:12,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 165.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:10:12,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 19 minutes, 52 seconds)
2025-09-16 15:12:03,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:12:18,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5364.57666 ± 9.977
2025-09-16 15:12:18,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5371.557, 5357.3076, 5354.133, 5350.19, 5383.9033, 5373.288, 5363.821, 5357.687, 5361.0586, 5372.8203]
2025-09-16 15:12:18,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:12:18,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 17 minutes, 21 seconds)
2025-09-16 15:14:15,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:14:30,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5326.45996 ± 16.175
2025-09-16 15:14:30,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5322.191, 5349.1943, 5346.5586, 5339.3516, 5308.078, 5302.0093, 5306.306, 5322.7505, 5329.648, 5338.5137]
2025-09-16 15:14:30,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:14:30,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 15 minutes, 21 seconds)
2025-09-16 15:16:27,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:16:42,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5407.04541 ± 21.734
2025-09-16 15:16:42,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5413.934, 5397.6733, 5389.5547, 5415.9395, 5446.267, 5383.0664, 5409.9, 5441.392, 5391.1934, 5381.53]
2025-09-16 15:16:42,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:16:42,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 13 minutes, 9 seconds)
2025-09-16 15:18:39,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:18:54,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5399.94824 ± 11.799
2025-09-16 15:18:54,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5374.7764, 5388.6255, 5405.087, 5403.513, 5391.228, 5403.5435, 5404.105, 5420.8906, 5406.666, 5401.042]
2025-09-16 15:18:54,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:18:54,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 10 minutes, 57 seconds)
2025-09-16 15:20:51,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:21:06,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5382.30371 ± 14.832
2025-09-16 15:21:06,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5389.3735, 5383.5405, 5368.3286, 5417.56, 5372.6426, 5366.3022, 5375.95, 5377.483, 5374.2876, 5397.565]
2025-09-16 15:21:06,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:21:06,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes, 43 seconds)
2025-09-16 15:23:03,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:23:18,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5393.36133 ± 17.569
2025-09-16 15:23:18,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5393.1455, 5405.0854, 5378.8535, 5415.074, 5396.131, 5399.671, 5406.0386, 5355.9893, 5409.9473, 5373.6846]
2025-09-16 15:23:18,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:23:18,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 36 seconds)
2025-09-16 15:25:18,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:25:31,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4931.73779 ± 1105.119
2025-09-16 15:25:31,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1616.8042, 5325.182, 5316.17, 5283.275, 5257.917, 5307.6123, 5303.055, 5307.342, 5295.122, 5304.8984]
2025-09-16 15:25:31,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [304.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:25:31,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 24 seconds)
2025-09-16 15:27:29,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:27:43,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5412.13770 ± 13.801
2025-09-16 15:27:43,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5395.411, 5410.0273, 5436.9966, 5394.0005, 5421.7876, 5425.681, 5404.7026, 5395.967, 5419.305, 5417.4976]
2025-09-16 15:27:43,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:27:43,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 12 seconds)
2025-09-16 15:29:40,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 9...
2025-09-16 15:29:56,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5284.33643 ± 13.697
2025-09-16 15:29:56,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5281.916, 5298.7056, 5268.0996, 5277.691, 5285.092, 5286.6147, 5311.959, 5294.4395, 5273.917, 5264.93]
2025-09-16 15:29:56,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:29:56,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1251 [DEBUG]: Training session finished
