2025-09-16 14:44:05,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.050-delay_21
2025-09-16 14:44:05,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.050-delay_21
2025-09-16 14:44:05,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'21': <latency_env.delayed_mdp.ConstantDelay object at 0x150acd4808d0>}
2025-09-16 14:44:05,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 14:44:05,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 14:44:05,297 baseline-bpql-noisepromille50-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=733, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 14:44:05,297 baseline-bpql-noisepromille50-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 14:44:07,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 14:44:07,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 14:45:58,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 14:45:59,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 261.62979 ± 62.420
2025-09-16 14:45:59,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [327.17493, 194.45589, 344.82687, 264.89926, 229.15897, 192.71765, 289.09018, 361.70398, 221.78345, 190.48662]
2025-09-16 14:45:59,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [61.0, 42.0, 62.0, 52.0, 45.0, 39.0, 54.0, 66.0, 43.0, 39.0]
2025-09-16 14:45:59,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (261.63) for latency 21
2025-09-16 14:45:59,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 5 minutes, 7 seconds)
2025-09-16 14:47:58,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 14:48:00,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 409.71149 ± 132.573
2025-09-16 14:48:00,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [134.79883, 329.68176, 499.05594, 620.3007, 442.0446, 406.2122, 355.6194, 587.17194, 386.06598, 336.16364]
2025-09-16 14:48:00,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 67.0, 92.0, 126.0, 91.0, 77.0, 71.0, 107.0, 78.0, 67.0]
2025-09-16 14:48:00,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (409.71) for latency 21
2025-09-16 14:48:00,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 10 minutes, 21 seconds)
2025-09-16 14:50:00,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 14:50:01,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 302.75909 ± 126.422
2025-09-16 14:50:01,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [135.65611, 428.07123, 170.7455, 419.54138, 382.2246, 286.35446, 155.24376, 440.52777, 442.09204, 167.13417]
2025-09-16 14:50:01,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [26.0, 91.0, 33.0, 83.0, 77.0, 54.0, 30.0, 94.0, 82.0, 32.0]
2025-09-16 14:50:01,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 10 minutes, 53 seconds)
2025-09-16 14:52:00,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 14:52:01,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 256.35785 ± 40.959
2025-09-16 14:52:01,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [259.29407, 173.30301, 270.32278, 332.2153, 219.48961, 261.42984, 303.8091, 252.03163, 241.17896, 250.50385]
2025-09-16 14:52:01,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [55.0, 33.0, 54.0, 62.0, 47.0, 54.0, 62.0, 52.0, 51.0, 51.0]
2025-09-16 14:52:01,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 9 minutes, 44 seconds)
2025-09-16 14:54:02,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 14:54:03,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 341.10547 ± 33.584
2025-09-16 14:54:03,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [363.67007, 305.35516, 401.38745, 299.6816, 355.6324, 323.35947, 336.1983, 293.1739, 369.832, 362.7642]
2025-09-16 14:54:03,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [67.0, 58.0, 76.0, 59.0, 68.0, 63.0, 64.0, 57.0, 69.0, 68.0]
2025-09-16 14:54:03,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 8 minutes, 44 seconds)
2025-09-16 14:56:03,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 14:56:04,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 345.20816 ± 110.045
2025-09-16 14:56:04,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [221.94528, 226.41714, 264.1081, 443.6675, 456.38068, 477.80865, 369.0117, 159.67267, 430.93405, 402.13586]
2025-09-16 14:56:04,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [43.0, 44.0, 51.0, 83.0, 88.0, 90.0, 69.0, 31.0, 78.0, 84.0]
2025-09-16 14:56:04,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 9 minutes, 40 seconds)
2025-09-16 14:58:05,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 14:58:06,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 324.67969 ± 133.728
2025-09-16 14:58:06,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [521.1595, 388.1946, 382.09055, 352.45477, 501.212, 140.19118, 225.30992, 158.48462, 404.74265, 172.95705]
2025-09-16 14:58:06,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 73.0, 71.0, 64.0, 107.0, 27.0, 44.0, 31.0, 74.0, 33.0]
2025-09-16 14:58:06,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 7 minutes, 55 seconds)
2025-09-16 15:00:06,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:00:07,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 359.11249 ± 161.730
2025-09-16 15:00:07,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [634.98193, 170.9965, 339.30313, 577.59, 176.14117, 381.0101, 125.000404, 454.88815, 321.30875, 409.90466]
2025-09-16 15:00:07,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [125.0, 33.0, 65.0, 108.0, 34.0, 77.0, 24.0, 95.0, 61.0, 84.0]
2025-09-16 15:00:07,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 6 minutes)
2025-09-16 15:02:08,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:02:09,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 347.17636 ± 124.629
2025-09-16 15:02:09,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [400.1472, 490.66608, 186.85973, 374.94254, 177.3217, 490.85406, 318.46292, 156.44385, 471.5708, 404.49475]
2025-09-16 15:02:09,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 90.0, 36.0, 71.0, 34.0, 92.0, 63.0, 30.0, 87.0, 76.0]
2025-09-16 15:02:09,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 3 hours, 4 minutes, 26 seconds)
2025-09-16 15:04:09,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:04:10,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 319.80530 ± 175.851
2025-09-16 15:04:10,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [448.45197, 134.70978, 327.1943, 312.8964, 125.12248, 154.28477, 668.51935, 155.2107, 342.54218, 529.12103]
2025-09-16 15:04:10,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [96.0, 26.0, 63.0, 60.0, 24.0, 30.0, 140.0, 30.0, 66.0, 100.0]
2025-09-16 15:04:10,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 3 hours, 2 minutes, 11 seconds)
2025-09-16 15:06:10,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:06:11,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 341.25909 ± 141.772
2025-09-16 15:06:11,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [367.63248, 349.8743, 140.55283, 524.0047, 129.52779, 459.11957, 180.38654, 531.1922, 316.54153, 413.75916]
2025-09-16 15:06:11,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [67.0, 64.0, 27.0, 96.0, 25.0, 86.0, 35.0, 99.0, 59.0, 79.0]
2025-09-16 15:06:11,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 59 minutes, 58 seconds)
2025-09-16 15:08:13,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:08:14,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 352.36655 ± 163.666
2025-09-16 15:08:14,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [547.6789, 394.10455, 151.23532, 246.53314, 225.37082, 480.27606, 235.00812, 150.73276, 462.29007, 630.4359]
2025-09-16 15:08:14,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [107.0, 72.0, 29.0, 47.0, 44.0, 90.0, 45.0, 29.0, 88.0, 123.0]
2025-09-16 15:08:14,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 58 minutes, 27 seconds)
2025-09-16 15:10:14,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:10:15,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 396.08307 ± 133.080
2025-09-16 15:10:15,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [439.54984, 403.86108, 160.62794, 266.29898, 307.4098, 595.0889, 311.76895, 429.77618, 609.3344, 437.11426]
2025-09-16 15:10:15,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 77.0, 31.0, 52.0, 60.0, 118.0, 59.0, 78.0, 116.0, 83.0]
2025-09-16 15:10:15,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 56 minutes, 13 seconds)
2025-09-16 15:12:16,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:12:17,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 433.02643 ± 144.584
2025-09-16 15:12:17,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [582.0175, 441.0437, 177.59938, 460.36368, 647.23486, 522.6067, 412.56274, 180.12462, 482.31772, 424.3935]
2025-09-16 15:12:17,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [113.0, 96.0, 34.0, 86.0, 142.0, 96.0, 82.0, 35.0, 91.0, 79.0]
2025-09-16 15:12:17,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (433.03) for latency 21
2025-09-16 15:12:17,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 54 minutes, 25 seconds)
2025-09-16 15:14:18,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:14:19,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 392.79843 ± 160.588
2025-09-16 15:14:19,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [145.1622, 433.4363, 413.43597, 540.1578, 637.2356, 511.0547, 493.4464, 205.7335, 387.65488, 160.66696]
2025-09-16 15:14:19,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 82.0, 76.0, 102.0, 122.0, 95.0, 92.0, 40.0, 72.0, 31.0]
2025-09-16 15:14:19,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 52 minutes, 30 seconds)
2025-09-16 15:16:20,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:16:21,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 410.42520 ± 232.958
2025-09-16 15:16:21,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [173.00772, 151.56584, 694.2563, 489.79865, 878.6318, 187.6594, 433.53162, 176.85995, 416.34766, 502.5931]
2025-09-16 15:16:21,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [33.0, 29.0, 133.0, 90.0, 182.0, 36.0, 80.0, 34.0, 80.0, 96.0]
2025-09-16 15:16:21,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 50 minutes, 52 seconds)
2025-09-16 15:18:22,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:18:23,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 379.40222 ± 157.301
2025-09-16 15:18:23,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [520.51154, 168.33293, 600.23175, 185.02411, 557.4891, 379.50867, 400.69788, 400.8517, 135.67767, 445.69672]
2025-09-16 15:18:23,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [96.0, 33.0, 130.0, 36.0, 109.0, 71.0, 77.0, 74.0, 26.0, 98.0]
2025-09-16 15:18:23,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 48 minutes, 28 seconds)
2025-09-16 15:20:24,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:20:25,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 367.63855 ± 89.675
2025-09-16 15:20:25,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [371.36044, 362.108, 493.9971, 367.82333, 319.25946, 451.44464, 368.53113, 372.79138, 428.063, 141.00706]
2025-09-16 15:20:25,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [72.0, 69.0, 96.0, 68.0, 62.0, 83.0, 78.0, 71.0, 80.0, 27.0]
2025-09-16 15:20:25,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 46 minutes, 48 seconds)
2025-09-16 15:22:25,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:22:26,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 331.52637 ± 121.800
2025-09-16 15:22:26,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [192.59761, 380.76944, 428.79065, 175.9222, 288.0451, 125.12084, 396.9825, 511.62195, 399.52914, 415.88434]
2025-09-16 15:22:26,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [37.0, 70.0, 80.0, 34.0, 57.0, 24.0, 75.0, 99.0, 74.0, 78.0]
2025-09-16 15:22:26,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 44 minutes, 22 seconds)
2025-09-16 15:24:28,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:24:29,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 416.25531 ± 131.996
2025-09-16 15:24:29,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [436.70062, 422.06683, 516.67706, 615.6442, 435.00775, 431.60245, 187.43144, 462.66974, 164.53236, 490.22046]
2025-09-16 15:24:29,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 80.0, 98.0, 119.0, 85.0, 79.0, 36.0, 86.0, 32.0, 99.0]
2025-09-16 15:24:29,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 42 minutes, 48 seconds)
2025-09-16 15:26:30,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:26:32,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 484.64145 ± 142.000
2025-09-16 15:26:32,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [437.61197, 531.80316, 691.8035, 536.5539, 172.532, 492.44153, 690.8544, 411.88486, 405.82462, 475.10458]
2025-09-16 15:26:32,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [80.0, 99.0, 128.0, 111.0, 33.0, 95.0, 133.0, 77.0, 74.0, 88.0]
2025-09-16 15:26:32,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (484.64) for latency 21
2025-09-16 15:26:32,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 40 minutes, 52 seconds)
2025-09-16 15:28:33,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:28:34,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 438.71835 ± 134.356
2025-09-16 15:28:34,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [524.92523, 240.69019, 371.4381, 442.0349, 461.8485, 534.6653, 225.59827, 531.2518, 369.42612, 685.30554]
2025-09-16 15:28:34,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 46.0, 70.0, 82.0, 85.0, 101.0, 43.0, 99.0, 69.0, 136.0]
2025-09-16 15:28:34,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 38 minutes, 47 seconds)
2025-09-16 15:30:35,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:30:36,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 397.30255 ± 121.175
2025-09-16 15:30:36,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [536.43646, 463.5452, 279.8732, 370.04553, 146.15187, 409.15985, 421.08313, 298.01, 560.6012, 488.11905]
2025-09-16 15:30:36,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [101.0, 86.0, 54.0, 70.0, 28.0, 76.0, 78.0, 58.0, 100.0, 101.0]
2025-09-16 15:30:36,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 36 minutes, 47 seconds)
2025-09-16 15:32:38,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:32:39,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 381.49606 ± 130.700
2025-09-16 15:32:39,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [387.2025, 119.83507, 456.87656, 447.6884, 487.92902, 415.67746, 391.22366, 161.97887, 387.72937, 558.81964]
2025-09-16 15:32:39,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 23.0, 90.0, 95.0, 98.0, 80.0, 78.0, 31.0, 77.0, 111.0]
2025-09-16 15:32:39,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 35 minutes, 9 seconds)
2025-09-16 15:34:40,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:34:41,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 358.07318 ± 146.333
2025-09-16 15:34:41,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [192.8412, 523.2841, 252.27042, 384.6678, 484.7328, 478.8651, 416.67505, 165.93066, 145.73361, 535.7312]
2025-09-16 15:34:41,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [37.0, 96.0, 48.0, 71.0, 90.0, 89.0, 77.0, 32.0, 28.0, 100.0]
2025-09-16 15:34:41,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 32 minutes, 56 seconds)
2025-09-16 15:36:42,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:36:43,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 417.72028 ± 177.562
2025-09-16 15:36:43,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [604.74255, 453.71628, 156.54387, 625.4018, 398.0567, 501.78314, 219.10632, 156.54788, 646.94275, 414.36148]
2025-09-16 15:36:43,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [113.0, 84.0, 30.0, 119.0, 84.0, 92.0, 42.0, 30.0, 119.0, 75.0]
2025-09-16 15:36:43,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 30 minutes, 40 seconds)
2025-09-16 15:38:44,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:38:45,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 454.25040 ± 148.358
2025-09-16 15:38:45,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [524.7834, 501.5203, 234.69421, 494.34225, 566.74866, 401.38782, 412.03287, 523.59406, 706.9587, 176.44179]
2025-09-16 15:38:45,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 97.0, 45.0, 92.0, 108.0, 75.0, 76.0, 111.0, 132.0, 34.0]
2025-09-16 15:38:45,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 28 minutes, 46 seconds)
2025-09-16 15:40:47,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:40:49,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 462.23282 ± 192.645
2025-09-16 15:40:49,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [381.47253, 395.75897, 755.97687, 145.40979, 628.6175, 411.08945, 491.80405, 554.209, 685.5824, 172.4075]
2025-09-16 15:40:49,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [72.0, 76.0, 145.0, 28.0, 137.0, 78.0, 95.0, 103.0, 131.0, 33.0]
2025-09-16 15:40:49,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 27 minutes)
2025-09-16 15:42:49,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:42:50,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 323.81836 ± 190.586
2025-09-16 15:42:50,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [145.68983, 718.98724, 417.301, 390.92334, 171.72083, 449.43018, 160.06354, 145.35875, 139.79228, 498.91666]
2025-09-16 15:42:50,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 137.0, 78.0, 73.0, 33.0, 87.0, 31.0, 28.0, 27.0, 92.0]
2025-09-16 15:42:50,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 24 minutes, 43 seconds)
2025-09-16 15:44:52,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:44:53,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 329.49158 ± 175.816
2025-09-16 15:44:53,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [380.52243, 197.2796, 621.3217, 174.23943, 171.2435, 366.19644, 177.19434, 397.56653, 643.66864, 165.68309]
2025-09-16 15:44:53,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [72.0, 38.0, 125.0, 34.0, 33.0, 68.0, 34.0, 74.0, 124.0, 32.0]
2025-09-16 15:44:53,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 22 minutes, 53 seconds)
2025-09-16 15:46:55,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:46:57,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 433.46915 ± 159.944
2025-09-16 15:46:57,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [130.31404, 483.19653, 547.78394, 546.55084, 631.08014, 357.42612, 156.32037, 535.1117, 487.89743, 459.01065]
2025-09-16 15:46:57,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [25.0, 103.0, 118.0, 106.0, 117.0, 74.0, 30.0, 103.0, 91.0, 84.0]
2025-09-16 15:46:57,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 21 minutes, 10 seconds)
2025-09-16 15:48:57,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:48:59,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 467.28214 ± 115.572
2025-09-16 15:48:59,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [496.38385, 423.4146, 582.1885, 600.0642, 462.5983, 443.63083, 461.08237, 565.3035, 167.74133, 470.41364]
2025-09-16 15:48:59,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 81.0, 111.0, 114.0, 85.0, 83.0, 87.0, 119.0, 32.0, 85.0]
2025-09-16 15:48:59,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 19 minutes, 4 seconds)
2025-09-16 15:50:58,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:50:59,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 350.07980 ± 135.496
2025-09-16 15:50:59,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [425.5036, 208.21819, 258.05023, 321.89612, 421.29788, 418.4292, 173.13861, 185.31686, 593.148, 495.79947]
2025-09-16 15:50:59,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 40.0, 51.0, 61.0, 79.0, 75.0, 33.0, 36.0, 110.0, 91.0]
2025-09-16 15:50:59,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 16 minutes, 21 seconds)
2025-09-16 15:53:00,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:53:01,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 442.79892 ± 198.566
2025-09-16 15:53:01,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [551.2327, 577.24603, 405.81284, 780.5823, 538.39624, 571.3723, 161.29143, 246.2536, 119.78785, 476.01407]
2025-09-16 15:53:01,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 107.0, 75.0, 147.0, 101.0, 119.0, 31.0, 47.0, 23.0, 91.0]
2025-09-16 15:53:01,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 14 minutes, 20 seconds)
2025-09-16 15:55:00,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:55:01,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 401.12277 ± 128.718
2025-09-16 15:55:01,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [504.8888, 186.47075, 529.00476, 341.20062, 411.77844, 607.29193, 388.88693, 442.4599, 408.73987, 190.50565]
2025-09-16 15:55:01,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [96.0, 36.0, 100.0, 72.0, 75.0, 115.0, 72.0, 86.0, 78.0, 37.0]
2025-09-16 15:55:01,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 11 minutes, 37 seconds)
2025-09-16 15:57:01,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:57:02,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 538.66711 ± 162.384
2025-09-16 15:57:02,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [577.2582, 532.66644, 577.7473, 456.71048, 157.2595, 548.2141, 579.07184, 510.27554, 586.9595, 860.5079]
2025-09-16 15:57:02,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [123.0, 100.0, 123.0, 85.0, 30.0, 110.0, 107.0, 93.0, 110.0, 165.0]
2025-09-16 15:57:02,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (538.67) for latency 21
2025-09-16 15:57:02,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 9 minutes, 11 seconds)
2025-09-16 15:59:01,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 15:59:03,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 467.06088 ± 197.965
2025-09-16 15:59:03,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [186.36066, 458.65152, 674.469, 416.284, 432.72266, 852.1406, 521.4249, 534.6026, 134.8014, 459.15137]
2025-09-16 15:59:03,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [36.0, 86.0, 125.0, 80.0, 90.0, 168.0, 97.0, 99.0, 26.0, 84.0]
2025-09-16 15:59:03,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 6 minutes, 47 seconds)
2025-09-16 16:01:01,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:01:03,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 509.96988 ± 253.310
2025-09-16 16:01:03,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [355.75125, 474.50934, 145.95758, 486.22867, 370.2249, 454.80267, 362.34332, 494.8304, 948.0557, 1006.9947]
2025-09-16 16:01:03,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [66.0, 85.0, 28.0, 84.0, 69.0, 86.0, 67.0, 91.0, 188.0, 201.0]
2025-09-16 16:01:03,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 4 minutes, 47 seconds)
2025-09-16 16:03:03,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:03:04,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 329.97198 ± 161.956
2025-09-16 16:03:04,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [145.89671, 489.92004, 547.6022, 303.43982, 399.87015, 524.2937, 135.54579, 181.96935, 441.26926, 129.91287]
2025-09-16 16:03:04,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [28.0, 90.0, 102.0, 56.0, 75.0, 97.0, 26.0, 35.0, 93.0, 25.0]
2025-09-16 16:03:04,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 2 minutes, 36 seconds)
2025-09-16 16:05:13,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:05:15,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 507.17950 ± 136.701
2025-09-16 16:05:15,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [431.74673, 525.70746, 557.8418, 166.33607, 714.4366, 525.9062, 449.99213, 521.32776, 619.66034, 558.83997]
2025-09-16 16:05:15,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [81.0, 99.0, 110.0, 32.0, 134.0, 97.0, 97.0, 98.0, 115.0, 105.0]
2025-09-16 16:05:15,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 2 minutes, 47 seconds)
2025-09-16 16:07:27,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:07:29,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 449.12216 ± 170.088
2025-09-16 16:07:29,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [461.22693, 539.3759, 516.05396, 150.06607, 769.6213, 480.0091, 432.6199, 525.89215, 171.20473, 445.1514]
2025-09-16 16:07:29,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 99.0, 93.0, 29.0, 154.0, 89.0, 79.0, 113.0, 33.0, 82.0]
2025-09-16 16:07:29,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 3 minutes, 14 seconds)
2025-09-16 16:09:44,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:09:45,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 368.93524 ± 187.978
2025-09-16 16:09:45,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [304.56052, 444.054, 165.79703, 325.86472, 788.19763, 125.09339, 470.7329, 490.24936, 403.2362, 171.56673]
2025-09-16 16:09:45,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [58.0, 89.0, 32.0, 63.0, 162.0, 24.0, 95.0, 88.0, 76.0, 33.0]
2025-09-16 16:09:45,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 4 minutes, 11 seconds)
2025-09-16 16:11:59,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:12:00,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 521.11163 ± 159.959
2025-09-16 16:12:00,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [176.55055, 551.81586, 452.22357, 487.00177, 491.75375, 550.43585, 845.7631, 662.4343, 464.95865, 528.1787]
2025-09-16 16:12:00,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [34.0, 103.0, 95.0, 103.0, 90.0, 101.0, 162.0, 123.0, 85.0, 109.0]
2025-09-16 16:12:00,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 4 minutes, 52 seconds)
2025-09-16 16:14:11,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:14:13,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 503.44467 ± 142.775
2025-09-16 16:14:13,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [580.1209, 120.0251, 649.8891, 550.4471, 560.52747, 632.8335, 529.74225, 516.876, 452.29248, 441.69357]
2025-09-16 16:14:13,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [115.0, 23.0, 120.0, 104.0, 105.0, 120.0, 96.0, 96.0, 96.0, 81.0]
2025-09-16 16:14:13,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 4 minutes, 51 seconds)
2025-09-16 16:16:30,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:16:32,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 548.44232 ± 209.045
2025-09-16 16:16:32,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [375.6584, 891.8283, 492.79468, 788.5301, 150.90572, 481.5067, 365.00684, 608.38165, 693.2635, 636.5474]
2025-09-16 16:16:32,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 192.0, 95.0, 145.0, 29.0, 89.0, 75.0, 115.0, 141.0, 116.0]
2025-09-16 16:16:32,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (548.44) for latency 21
2025-09-16 16:16:32,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 4 minutes, 6 seconds)
2025-09-16 16:18:49,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:18:50,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 371.33020 ± 212.723
2025-09-16 16:18:50,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [124.9901, 471.15762, 297.2796, 146.06203, 553.796, 648.04675, 145.63264, 155.7679, 469.5108, 701.0588]
2025-09-16 16:18:50,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 87.0, 60.0, 28.0, 102.0, 121.0, 28.0, 30.0, 86.0, 132.0]
2025-09-16 16:18:50,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 2 minutes, 35 seconds)
2025-09-16 16:21:03,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:21:05,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 543.07916 ± 202.435
2025-09-16 16:21:05,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [996.55493, 450.31934, 146.40187, 507.87015, 411.67838, 523.9138, 662.8328, 575.9334, 605.2021, 550.0843]
2025-09-16 16:21:05,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [185.0, 85.0, 28.0, 95.0, 76.0, 109.0, 137.0, 111.0, 114.0, 100.0]
2025-09-16 16:21:05,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 8 seconds)
2025-09-16 16:23:23,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:23:25,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 539.33032 ± 79.939
2025-09-16 16:23:25,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [468.32224, 472.53754, 678.512, 597.53906, 597.17975, 460.62823, 516.2366, 638.081, 430.79755, 533.4696]
2025-09-16 16:23:25,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 88.0, 129.0, 112.0, 113.0, 90.0, 97.0, 117.0, 87.0, 107.0]
2025-09-16 16:23:25,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 58 minutes, 41 seconds)
2025-09-16 16:25:40,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:25:41,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 465.42178 ± 183.461
2025-09-16 16:25:41,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [672.44244, 602.11694, 612.7412, 406.74844, 514.5891, 135.35432, 466.36847, 140.0114, 457.92874, 645.91705]
2025-09-16 16:25:41,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [122.0, 112.0, 114.0, 76.0, 98.0, 26.0, 85.0, 27.0, 88.0, 119.0]
2025-09-16 16:25:41,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 57 minutes, 2 seconds)
2025-09-16 16:27:58,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:27:59,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 384.94067 ± 217.698
2025-09-16 16:27:59,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [161.55313, 457.9976, 140.51122, 477.11044, 635.64264, 146.44475, 151.68292, 487.1061, 399.4074, 791.9503]
2025-09-16 16:27:59,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [31.0, 83.0, 27.0, 94.0, 118.0, 28.0, 29.0, 103.0, 74.0, 157.0]
2025-09-16 16:27:59,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 54 minutes, 36 seconds)
2025-09-16 16:30:19,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:30:21,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 485.65536 ± 195.078
2025-09-16 16:30:21,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [607.97235, 569.72675, 125.31187, 532.45734, 440.31424, 634.65485, 781.8598, 434.65952, 568.1222, 161.47493]
2025-09-16 16:30:21,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [126.0, 104.0, 24.0, 99.0, 81.0, 134.0, 156.0, 79.0, 106.0, 31.0]
2025-09-16 16:30:21,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 52 minutes, 53 seconds)
2025-09-16 16:32:40,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:32:41,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 476.46271 ± 174.400
2025-09-16 16:32:41,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [596.5612, 429.93533, 537.99115, 567.89136, 160.492, 636.0525, 140.3394, 622.92175, 470.8647, 601.57764]
2025-09-16 16:32:41,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [109.0, 78.0, 96.0, 123.0, 31.0, 123.0, 27.0, 129.0, 86.0, 109.0]
2025-09-16 16:32:41,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 51 minutes, 22 seconds)
2025-09-16 16:35:02,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:35:04,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 578.48553 ± 167.058
2025-09-16 16:35:04,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [731.4321, 666.2297, 633.9452, 151.79523, 599.3098, 652.5849, 732.2776, 677.20197, 472.1731, 467.90646]
2025-09-16 16:35:04,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [138.0, 144.0, 121.0, 29.0, 116.0, 123.0, 152.0, 142.0, 89.0, 85.0]
2025-09-16 16:35:04,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (578.49) for latency 21
2025-09-16 16:35:04,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 49 minutes, 24 seconds)
2025-09-16 16:37:26,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:37:27,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 460.72226 ± 189.685
2025-09-16 16:37:27,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [418.6953, 146.65381, 525.7957, 478.99536, 611.00494, 502.53326, 788.0434, 586.0529, 136.05136, 413.39664]
2025-09-16 16:37:27,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [79.0, 28.0, 95.0, 89.0, 111.0, 92.0, 162.0, 107.0, 26.0, 78.0]
2025-09-16 16:37:27,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 48 minutes, 16 seconds)
2025-09-16 16:39:47,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:39:48,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 580.40533 ± 224.096
2025-09-16 16:39:48,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [418.5419, 585.97394, 659.3579, 501.93405, 490.4129, 597.7943, 1007.309, 552.37134, 854.07007, 136.28767]
2025-09-16 16:39:48,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 120.0, 121.0, 105.0, 91.0, 114.0, 193.0, 100.0, 161.0, 26.0]
2025-09-16 16:39:48,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (580.41) for latency 21
2025-09-16 16:39:48,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 46 minutes, 21 seconds)
2025-09-16 16:42:05,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:42:07,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 468.55078 ± 235.201
2025-09-16 16:42:07,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [813.04034, 528.7948, 465.9847, 773.53253, 565.8196, 146.52763, 481.948, 151.36528, 146.8937, 611.6008]
2025-09-16 16:42:07,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [156.0, 98.0, 85.0, 157.0, 103.0, 28.0, 88.0, 29.0, 28.0, 110.0]
2025-09-16 16:42:07,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 43 minutes, 29 seconds)
2025-09-16 16:44:28,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:44:29,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 570.31915 ± 100.358
2025-09-16 16:44:29,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [564.92706, 483.80756, 531.49097, 638.29065, 404.08588, 629.1926, 554.8073, 547.01465, 803.94824, 545.6264]
2025-09-16 16:44:29,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [104.0, 88.0, 99.0, 121.0, 74.0, 126.0, 104.0, 98.0, 150.0, 102.0]
2025-09-16 16:44:29,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 41 minutes, 30 seconds)
2025-09-16 16:46:48,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:46:50,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 535.01398 ± 251.985
2025-09-16 16:46:50,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [573.1778, 673.6847, 508.7424, 1063.9532, 571.777, 130.3892, 140.2999, 501.6147, 543.04553, 643.4558]
2025-09-16 16:46:50,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [105.0, 127.0, 93.0, 232.0, 105.0, 25.0, 27.0, 94.0, 117.0, 122.0]
2025-09-16 16:46:50,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 38 minutes, 54 seconds)
2025-09-16 16:49:10,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:49:12,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 594.90698 ± 240.509
2025-09-16 16:49:12,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [432.97513, 615.9723, 130.26714, 1048.8102, 803.3168, 494.42142, 599.431, 486.46234, 832.5014, 504.91226]
2025-09-16 16:49:12,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 113.0, 25.0, 197.0, 153.0, 90.0, 110.0, 90.0, 152.0, 94.0]
2025-09-16 16:49:12,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (594.91) for latency 21
2025-09-16 16:49:12,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 36 minutes, 15 seconds)
2025-09-16 16:51:29,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:51:30,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 431.33456 ± 146.287
2025-09-16 16:51:30,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [440.7915, 484.51724, 453.9474, 389.62494, 202.30449, 586.0229, 480.16617, 686.05304, 413.24878, 176.66873]
2025-09-16 16:51:30,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 91.0, 94.0, 73.0, 39.0, 113.0, 88.0, 125.0, 84.0, 34.0]
2025-09-16 16:51:30,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 33 minutes, 34 seconds)
2025-09-16 16:53:50,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:53:52,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 544.11511 ± 142.787
2025-09-16 16:53:52,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [192.86418, 704.1881, 511.9254, 571.2059, 553.3637, 607.01135, 538.8682, 756.22363, 520.5094, 484.9913]
2025-09-16 16:53:52,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [37.0, 131.0, 97.0, 111.0, 101.0, 111.0, 115.0, 155.0, 97.0, 89.0]
2025-09-16 16:53:52,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 31 minutes, 38 seconds)
2025-09-16 16:56:12,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:56:14,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 548.49878 ± 221.202
2025-09-16 16:56:14,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [527.0684, 759.6842, 807.0557, 728.9064, 141.08844, 633.1846, 647.58057, 547.8007, 546.44916, 146.1695]
2025-09-16 16:56:14,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 154.0, 157.0, 138.0, 27.0, 117.0, 117.0, 108.0, 108.0, 28.0]
2025-09-16 16:56:14,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 29 minutes, 13 seconds)
2025-09-16 16:58:35,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 16:58:37,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 646.19653 ± 163.443
2025-09-16 16:58:37,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [572.31445, 504.17334, 659.361, 659.15234, 500.59985, 617.8165, 737.1888, 941.45154, 386.50558, 883.4025]
2025-09-16 16:58:37,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [124.0, 109.0, 140.0, 141.0, 107.0, 131.0, 138.0, 198.0, 73.0, 188.0]
2025-09-16 16:58:37,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (646.20) for latency 21
2025-09-16 16:58:37,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 27 minutes, 11 seconds)
2025-09-16 17:00:56,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:00:58,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 560.21436 ± 173.694
2025-09-16 17:00:58,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [548.88654, 691.4085, 150.26819, 424.9587, 720.09045, 666.6768, 530.25586, 547.0769, 802.0124, 520.5092]
2025-09-16 17:00:58,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [112.0, 125.0, 29.0, 77.0, 136.0, 118.0, 105.0, 98.0, 162.0, 96.0]
2025-09-16 17:00:58,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 24 minutes, 46 seconds)
2025-09-16 17:03:17,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:03:19,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 626.37286 ± 231.627
2025-09-16 17:03:19,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [927.42993, 832.0838, 967.64307, 512.13684, 140.61638, 509.55768, 663.05145, 479.4, 641.12683, 590.68274]
2025-09-16 17:03:19,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [189.0, 165.0, 183.0, 108.0, 27.0, 93.0, 131.0, 95.0, 129.0, 106.0]
2025-09-16 17:03:19,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 22 minutes, 42 seconds)
2025-09-16 17:05:40,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:05:42,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 678.56042 ± 127.728
2025-09-16 17:05:42,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [793.06775, 968.8714, 687.3649, 663.3818, 700.4991, 711.5435, 618.51263, 619.04333, 534.20233, 489.1179]
2025-09-16 17:05:42,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [144.0, 184.0, 128.0, 128.0, 128.0, 141.0, 129.0, 114.0, 98.0, 89.0]
2025-09-16 17:05:42,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (678.56) for latency 21
2025-09-16 17:05:42,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 20 minutes, 30 seconds)
2025-09-16 17:08:02,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:08:04,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 606.69006 ± 196.248
2025-09-16 17:08:04,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [460.52322, 178.2929, 918.1267, 555.8591, 580.5047, 718.30914, 697.6815, 702.3864, 470.74786, 784.4691]
2025-09-16 17:08:04,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [102.0, 34.0, 174.0, 111.0, 105.0, 133.0, 145.0, 124.0, 86.0, 162.0]
2025-09-16 17:08:04,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 18 minutes, 4 seconds)
2025-09-16 17:10:22,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:10:24,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 508.68613 ± 231.825
2025-09-16 17:10:24,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [657.193, 387.38443, 878.0881, 547.4856, 415.83212, 759.9673, 146.05006, 135.4687, 508.807, 650.58484]
2025-09-16 17:10:24,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [122.0, 72.0, 168.0, 98.0, 88.0, 143.0, 28.0, 26.0, 92.0, 117.0]
2025-09-16 17:10:24,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 15 minutes, 22 seconds)
2025-09-16 17:12:46,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:12:48,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 476.51099 ± 238.200
2025-09-16 17:12:48,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [789.1372, 812.2521, 547.0932, 502.34088, 530.4017, 162.53311, 584.61505, 124.602135, 562.53516, 149.59955]
2025-09-16 17:12:48,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [143.0, 152.0, 100.0, 102.0, 98.0, 31.0, 120.0, 24.0, 117.0, 29.0]
2025-09-16 17:12:48,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 13 minutes, 19 seconds)
2025-09-16 17:15:08,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:15:10,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 582.90662 ± 189.331
2025-09-16 17:15:10,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [468.7928, 674.3438, 888.5065, 629.74225, 625.15735, 140.3624, 776.56683, 513.9547, 564.14087, 547.4992]
2025-09-16 17:15:10,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [86.0, 127.0, 165.0, 117.0, 112.0, 27.0, 144.0, 95.0, 100.0, 102.0]
2025-09-16 17:15:10,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 11 minutes, 4 seconds)
2025-09-16 17:17:32,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:17:34,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 576.91754 ± 99.651
2025-09-16 17:17:34,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [494.82156, 579.09015, 456.14645, 494.40256, 609.60394, 658.34436, 617.4463, 784.50903, 449.0706, 625.7404]
2025-09-16 17:17:34,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [90.0, 102.0, 82.0, 92.0, 121.0, 125.0, 125.0, 146.0, 85.0, 132.0]
2025-09-16 17:17:34,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 8 minutes, 47 seconds)
2025-09-16 17:19:49,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:19:51,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 558.83337 ± 157.414
2025-09-16 17:19:51,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [688.3215, 447.43933, 563.59235, 177.21463, 565.11664, 697.22406, 465.9408, 565.9473, 716.80054, 700.7364]
2025-09-16 17:19:51,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [145.0, 101.0, 102.0, 34.0, 119.0, 126.0, 85.0, 104.0, 132.0, 143.0]
2025-09-16 17:19:51,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 6 minutes)
2025-09-16 17:22:10,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:22:12,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 496.46527 ± 186.515
2025-09-16 17:22:12,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [141.15652, 486.85028, 582.71045, 539.98376, 578.4193, 672.577, 624.3246, 624.428, 583.9376, 130.2652]
2025-09-16 17:22:12,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [27.0, 90.0, 108.0, 100.0, 105.0, 124.0, 114.0, 119.0, 109.0, 25.0]
2025-09-16 17:22:12,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 3 minutes, 43 seconds)
2025-09-16 17:24:35,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:24:37,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 525.17743 ± 248.643
2025-09-16 17:24:37,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [245.73813, 876.7668, 271.70834, 503.00815, 639.46735, 529.0385, 146.50307, 846.3746, 799.0177, 394.152]
2025-09-16 17:24:37,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [47.0, 170.0, 53.0, 104.0, 115.0, 102.0, 28.0, 160.0, 154.0, 85.0]
2025-09-16 17:24:37,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 1 minute, 28 seconds)
2025-09-16 17:26:57,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:26:59,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 668.79486 ± 134.684
2025-09-16 17:26:59,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [689.0997, 925.9501, 624.29944, 545.7172, 549.36414, 813.5958, 672.90375, 482.954, 807.93, 576.1344]
2025-09-16 17:26:59,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [128.0, 175.0, 112.0, 111.0, 117.0, 170.0, 122.0, 89.0, 171.0, 118.0]
2025-09-16 17:26:59,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 59 minutes, 7 seconds)
2025-09-16 17:29:20,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:29:22,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 569.64703 ± 301.117
2025-09-16 17:29:22,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [182.78485, 884.2486, 849.51013, 124.86321, 490.0245, 746.35583, 556.0908, 805.92883, 150.88538, 905.7785]
2025-09-16 17:29:22,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [35.0, 166.0, 160.0, 24.0, 89.0, 133.0, 117.0, 147.0, 29.0, 170.0]
2025-09-16 17:29:22,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 56 minutes, 39 seconds)
2025-09-16 17:31:44,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:31:46,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 600.85394 ± 250.338
2025-09-16 17:31:46,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [544.0908, 487.2036, 884.6063, 451.09125, 124.98768, 573.63965, 459.15442, 677.1728, 1092.4694, 714.1242]
2025-09-16 17:31:46,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 88.0, 165.0, 84.0, 24.0, 117.0, 90.0, 140.0, 203.0, 130.0]
2025-09-16 17:31:46,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 54 minutes, 48 seconds)
2025-09-16 17:34:03,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:34:05,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 635.46252 ± 167.858
2025-09-16 17:34:05,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [740.7103, 829.65674, 655.24005, 208.61281, 645.1411, 537.05505, 574.9108, 764.13464, 621.6074, 777.556]
2025-09-16 17:34:05,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [155.0, 156.0, 126.0, 40.0, 119.0, 96.0, 106.0, 141.0, 116.0, 170.0]
2025-09-16 17:34:05,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 52 minutes, 16 seconds)
2025-09-16 17:36:27,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:36:29,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 670.90472 ± 118.719
2025-09-16 17:36:29,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [572.5902, 620.6593, 734.06464, 482.69608, 781.0155, 704.43427, 662.321, 575.6329, 928.53613, 647.09735]
2025-09-16 17:36:29,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [103.0, 115.0, 144.0, 91.0, 137.0, 136.0, 120.0, 124.0, 167.0, 115.0]
2025-09-16 17:36:29,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 49 minutes, 50 seconds)
2025-09-16 17:38:51,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:38:54,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 692.08582 ± 248.225
2025-09-16 17:38:54,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [553.00476, 1028.8446, 135.61633, 1059.0452, 631.36145, 753.83545, 764.27484, 686.3499, 545.9078, 762.6175]
2025-09-16 17:38:54,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [99.0, 188.0, 26.0, 185.0, 124.0, 145.0, 152.0, 131.0, 114.0, 146.0]
2025-09-16 17:38:54,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (692.09) for latency 21
2025-09-16 17:38:54,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 47 minutes, 36 seconds)
2025-09-16 17:41:16,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:41:18,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 649.29041 ± 164.241
2025-09-16 17:41:18,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [748.7918, 498.6621, 567.1409, 910.279, 845.145, 422.80054, 513.0481, 474.9044, 735.23956, 776.89294]
2025-09-16 17:41:18,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [141.0, 92.0, 104.0, 171.0, 171.0, 80.0, 96.0, 85.0, 153.0, 150.0]
2025-09-16 17:41:18,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 45 minutes, 19 seconds)
2025-09-16 17:43:37,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:43:39,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 698.19299 ± 236.471
2025-09-16 17:43:39,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [825.67834, 669.18555, 614.1159, 683.35803, 1110.9137, 559.98773, 633.1617, 851.3821, 872.08124, 162.06584]
2025-09-16 17:43:39,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [166.0, 139.0, 109.0, 128.0, 212.0, 100.0, 131.0, 157.0, 174.0, 31.0]
2025-09-16 17:43:39,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (698.19) for latency 21
2025-09-16 17:43:39,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 42 minutes, 48 seconds)
2025-09-16 17:45:57,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:46:00,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 685.40076 ± 188.470
2025-09-16 17:46:00,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [693.84375, 680.2349, 787.5543, 419.99646, 778.47864, 935.97235, 482.05392, 1005.41754, 451.79846, 618.65753]
2025-09-16 17:46:00,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [123.0, 128.0, 171.0, 76.0, 137.0, 170.0, 87.0, 188.0, 83.0, 118.0]
2025-09-16 17:46:00,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 40 minutes, 30 seconds)
2025-09-16 17:48:22,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:48:25,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 774.18097 ± 255.009
2025-09-16 17:48:25,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1396.4744, 942.68994, 565.0401, 877.54706, 796.3827, 729.0436, 428.55347, 574.48987, 791.7624, 639.8263]
2025-09-16 17:48:25,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [257.0, 177.0, 105.0, 160.0, 143.0, 134.0, 80.0, 104.0, 144.0, 117.0]
2025-09-16 17:48:25,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (774.18) for latency 21
2025-09-16 17:48:25,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 38 minutes, 9 seconds)
2025-09-16 17:50:48,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:50:50,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 638.47058 ± 290.767
2025-09-16 17:50:50,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [155.23784, 702.59686, 979.3606, 815.5485, 545.2838, 688.0043, 119.336655, 585.11316, 750.5019, 1043.7219]
2025-09-16 17:50:50,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 125.0, 194.0, 153.0, 102.0, 131.0, 23.0, 109.0, 132.0, 193.0]
2025-09-16 17:50:50,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 35 minutes, 48 seconds)
2025-09-16 17:53:12,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:53:14,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 637.24963 ± 179.176
2025-09-16 17:53:14,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [717.4275, 702.6774, 761.6782, 774.9039, 608.37823, 669.9628, 726.87683, 784.4876, 182.52364, 443.58008]
2025-09-16 17:53:14,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [129.0, 125.0, 152.0, 159.0, 104.0, 133.0, 137.0, 141.0, 35.0, 81.0]
2025-09-16 17:53:14,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 33 minutes, 24 seconds)
2025-09-16 17:55:31,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:55:33,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 604.40332 ± 219.054
2025-09-16 17:55:33,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [876.34937, 643.911, 429.09048, 507.65875, 700.41895, 906.7675, 451.868, 151.51534, 585.35156, 791.10223]
2025-09-16 17:55:33,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [158.0, 120.0, 77.0, 95.0, 127.0, 176.0, 84.0, 29.0, 105.0, 150.0]
2025-09-16 17:55:33,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 30 minutes, 54 seconds)
2025-09-16 17:57:51,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 17:57:53,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 675.68933 ± 311.274
2025-09-16 17:57:53,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [151.42274, 704.52637, 1158.5238, 663.01746, 685.68164, 119.22652, 873.9019, 731.27167, 1003.8732, 665.4485]
2025-09-16 17:57:53,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [29.0, 127.0, 219.0, 123.0, 124.0, 23.0, 172.0, 140.0, 189.0, 118.0]
2025-09-16 17:57:53,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 28 minutes, 32 seconds)
2025-09-16 18:00:09,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:00:12,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 777.37781 ± 297.233
2025-09-16 18:00:12,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [800.44525, 599.86316, 884.8412, 681.2416, 529.71875, 573.42303, 1338.1201, 639.1077, 428.0076, 1299.0092]
2025-09-16 18:00:12,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [146.0, 118.0, 163.0, 121.0, 99.0, 107.0, 269.0, 141.0, 78.0, 238.0]
2025-09-16 18:00:12,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (777.38) for latency 21
2025-09-16 18:00:12,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 25 minutes, 55 seconds)
2025-09-16 18:02:32,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:02:34,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 531.56018 ± 158.669
2025-09-16 18:02:34,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [507.4372, 455.1013, 611.7403, 523.4517, 471.46118, 668.30206, 753.16925, 456.56943, 701.4189, 166.95042]
2025-09-16 18:02:34,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 83.0, 116.0, 96.0, 85.0, 138.0, 138.0, 86.0, 130.0, 32.0]
2025-09-16 18:02:34,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 23 minutes, 28 seconds)
2025-09-16 18:04:57,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:04:59,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 641.71777 ± 357.151
2025-09-16 18:04:59,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [868.8366, 152.31952, 124.84507, 560.13116, 1155.4586, 157.38849, 998.3703, 810.2931, 725.1745, 864.3606]
2025-09-16 18:04:59,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [182.0, 29.0, 24.0, 106.0, 245.0, 30.0, 187.0, 155.0, 148.0, 164.0]
2025-09-16 18:04:59,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 21 minutes, 10 seconds)
2025-09-16 18:07:22,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:07:24,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 829.22913 ± 243.568
2025-09-16 18:07:24,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [858.9572, 923.59235, 513.9899, 698.50635, 1365.0911, 661.8594, 772.3139, 517.946, 990.328, 989.7064]
2025-09-16 18:07:24,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [159.0, 166.0, 96.0, 122.0, 233.0, 139.0, 142.0, 94.0, 191.0, 186.0]
2025-09-16 18:07:24,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (829.23) for latency 21
2025-09-16 18:07:24,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 18 minutes, 58 seconds)
2025-09-16 18:09:42,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:09:44,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 772.60291 ± 287.177
2025-09-16 18:09:44,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [779.3436, 1236.4381, 576.0589, 161.60577, 603.18494, 688.8904, 939.73566, 1090.8767, 708.4441, 941.45135]
2025-09-16 18:09:44,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [139.0, 249.0, 103.0, 31.0, 118.0, 133.0, 184.0, 204.0, 135.0, 185.0]
2025-09-16 18:09:44,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 16 minutes, 35 seconds)
2025-09-16 18:12:04,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:12:06,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 735.06311 ± 293.657
2025-09-16 18:12:06,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [124.72538, 1149.975, 995.56445, 829.9573, 748.9395, 532.43945, 397.7905, 958.6587, 919.5905, 692.99023]
2025-09-16 18:12:06,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [24.0, 210.0, 179.0, 159.0, 148.0, 94.0, 71.0, 176.0, 169.0, 128.0]
2025-09-16 18:12:06,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 14 minutes, 17 seconds)
2025-09-16 18:14:28,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:14:30,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 867.00476 ± 249.408
2025-09-16 18:14:30,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1189.7977, 569.9974, 685.28894, 1303.5815, 813.4663, 858.2183, 535.6509, 661.42615, 1037.6929, 1014.9278]
2025-09-16 18:14:30,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [220.0, 105.0, 120.0, 257.0, 148.0, 155.0, 97.0, 125.0, 191.0, 178.0]
2025-09-16 18:14:30,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (867.00) for latency 21
2025-09-16 18:14:31,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 11 minutes, 56 seconds)
2025-09-16 18:17:00,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:17:03,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 886.43976 ± 151.521
2025-09-16 18:17:03,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1019.7469, 815.679, 874.3341, 981.0602, 681.53625, 871.80237, 984.4047, 619.88794, 857.43604, 1158.5096]
2025-09-16 18:17:03,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [188.0, 147.0, 168.0, 179.0, 131.0, 173.0, 187.0, 118.0, 168.0, 229.0]
2025-09-16 18:17:03,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (886.44) for latency 21
2025-09-16 18:17:03,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 9 minutes, 38 seconds)
2025-09-16 18:19:26,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:19:29,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 886.94495 ± 241.686
2025-09-16 18:19:29,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1138.7518, 1310.4323, 780.65283, 829.5353, 628.96484, 1120.0105, 595.3099, 691.6454, 1090.4382, 683.70844]
2025-09-16 18:19:29,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [207.0, 243.0, 141.0, 147.0, 128.0, 221.0, 109.0, 127.0, 199.0, 124.0]
2025-09-16 18:19:29,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (886.94) for latency 21
2025-09-16 18:19:29,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 7 minutes, 14 seconds)
2025-09-16 18:21:49,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:21:51,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 583.68140 ± 273.661
2025-09-16 18:21:51,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1021.3213, 591.5423, 913.03235, 669.56635, 140.2152, 145.85117, 613.68475, 763.83344, 518.73047, 459.0367]
2025-09-16 18:21:51,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [192.0, 106.0, 185.0, 118.0, 27.0, 28.0, 109.0, 149.0, 109.0, 84.0]
2025-09-16 18:21:51,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 50 seconds)
2025-09-16 18:24:12,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:24:16,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1038.41479 ± 466.643
2025-09-16 18:24:16,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1662.0704, 515.0347, 1217.5144, 833.9808, 666.50714, 799.63965, 811.83984, 1361.8765, 553.1352, 1962.5503]
2025-09-16 18:24:16,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [298.0, 93.0, 212.0, 155.0, 132.0, 160.0, 143.0, 250.0, 102.0, 377.0]
2025-09-16 18:24:16,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (1038.41) for latency 21
2025-09-16 18:24:16,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 25 seconds)
2025-09-16 18:26:44,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 21...
2025-09-16 18:26:47,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 973.81152 ± 401.680
2025-09-16 18:26:47,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [774.47894, 1579.673, 720.68054, 888.79553, 1189.1613, 1289.2987, 646.0108, 1159.8024, 135.68034, 1354.5333]
2025-09-16 18:26:47,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [143.0, 302.0, 145.0, 159.0, 218.0, 244.0, 113.0, 217.0, 26.0, 237.0]
2025-09-16 18:26:47,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1251 [DEBUG]: Training session finished
