2025-09-16 11:02:02,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.050-delay_3
2025-09-16 11:02:02,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.050-delay_3
2025-09-16 11:02:02,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'3': <latency_env.delayed_mdp.ConstantDelay object at 0x152a161b8510>}
2025-09-16 11:02:02,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 11:02:02,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 11:02:02,361 baseline-bpql-noisepromille50-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=427, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 11:02:02,361 baseline-bpql-noisepromille50-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 11:02:03,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 11:02:03,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 11:03:50,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:03:50,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 84.62761 ± 3.023
2025-09-16 11:03:50,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [80.935265, 80.780426, 84.73529, 85.20013, 85.22104, 85.48107, 89.67339, 88.94801, 80.46089, 84.84055]
2025-09-16 11:03:50,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [17.0, 17.0, 18.0, 18.0, 18.0, 18.0, 19.0, 19.0, 17.0, 18.0]
2025-09-16 11:03:50,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (84.63) for latency 3
2025-09-16 11:03:50,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 55 minutes, 40 seconds)
2025-09-16 11:05:45,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:05:46,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 333.40677 ± 28.839
2025-09-16 11:05:46,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [320.31485, 321.98373, 380.4193, 294.18814, 367.89273, 331.3624, 361.65518, 305.29144, 299.41382, 351.54596]
2025-09-16 11:05:46,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [63.0, 61.0, 75.0, 59.0, 69.0, 65.0, 71.0, 60.0, 57.0, 68.0]
2025-09-16 11:05:46,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (333.41) for latency 3
2025-09-16 11:05:46,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 1 minute, 39 seconds)
2025-09-16 11:07:42,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:07:43,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 488.68100 ± 121.333
2025-09-16 11:07:43,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [492.98047, 722.1329, 332.21945, 371.57132, 414.15497, 469.133, 656.7447, 490.67758, 369.82724, 567.36847]
2025-09-16 11:07:43,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [95.0, 140.0, 66.0, 79.0, 78.0, 87.0, 131.0, 92.0, 69.0, 111.0]
2025-09-16 11:07:43,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (488.68) for latency 3
2025-09-16 11:07:43,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 2 minutes, 53 seconds)
2025-09-16 11:09:39,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:09:40,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 474.60196 ± 112.813
2025-09-16 11:09:40,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [489.62268, 579.3314, 440.32733, 429.55576, 365.08932, 713.6887, 334.44113, 532.2147, 338.88083, 522.8675]
2025-09-16 11:09:40,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 117.0, 84.0, 80.0, 67.0, 138.0, 74.0, 100.0, 63.0, 96.0]
2025-09-16 11:09:40,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 2 minutes, 38 seconds)
2025-09-16 11:11:37,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:11:38,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 478.78131 ± 94.320
2025-09-16 11:11:38,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [602.4185, 388.5988, 371.56232, 392.14426, 409.1154, 539.4386, 455.65387, 420.22934, 639.83923, 568.8124]
2025-09-16 11:11:38,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [115.0, 72.0, 70.0, 72.0, 85.0, 104.0, 85.0, 77.0, 116.0, 105.0]
2025-09-16 11:11:38,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 1 minute, 52 seconds)
2025-09-16 11:13:34,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:13:36,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 473.53656 ± 80.150
2025-09-16 11:13:36,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [508.00882, 488.2641, 483.29288, 331.3319, 512.14795, 643.4929, 372.28033, 460.2955, 439.36636, 496.88507]
2025-09-16 11:13:36,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [98.0, 94.0, 92.0, 65.0, 99.0, 125.0, 72.0, 99.0, 85.0, 94.0]
2025-09-16 11:13:36,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 3 minutes, 31 seconds)
2025-09-16 11:15:32,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:15:33,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 460.32852 ± 110.594
2025-09-16 11:15:33,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [440.08762, 384.3863, 469.376, 328.02335, 732.9595, 505.18033, 460.3102, 357.8926, 386.03445, 539.0351]
2025-09-16 11:15:33,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 80.0, 86.0, 71.0, 144.0, 98.0, 98.0, 78.0, 72.0, 101.0]
2025-09-16 11:15:33,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 2 minutes, 7 seconds)
2025-09-16 11:17:29,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:17:30,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 507.49237 ± 84.926
2025-09-16 11:17:30,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [580.18774, 554.9944, 451.63052, 547.57117, 687.6197, 448.84787, 415.49005, 514.8479, 384.53302, 489.201]
2025-09-16 11:17:30,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [111.0, 113.0, 85.0, 117.0, 149.0, 86.0, 91.0, 98.0, 73.0, 108.0]
2025-09-16 11:17:30,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (507.49) for latency 3
2025-09-16 11:17:30,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 12 seconds)
2025-09-16 11:19:27,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:19:28,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 539.84460 ± 115.846
2025-09-16 11:19:28,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [729.51624, 419.67776, 524.1712, 345.27267, 530.4752, 644.7491, 513.6119, 514.8136, 707.70905, 468.44943]
2025-09-16 11:19:28,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [154.0, 80.0, 99.0, 67.0, 103.0, 123.0, 96.0, 98.0, 139.0, 89.0]
2025-09-16 11:19:28,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (539.84) for latency 3
2025-09-16 11:19:28,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 58 minutes, 18 seconds)
2025-09-16 11:21:24,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:21:26,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 503.93961 ± 60.448
2025-09-16 11:21:26,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [521.34015, 513.0321, 500.02902, 533.9806, 476.1117, 637.8674, 510.75232, 471.42444, 380.0933, 494.76526]
2025-09-16 11:21:26,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [114.0, 97.0, 94.0, 118.0, 93.0, 123.0, 96.0, 89.0, 71.0, 94.0]
2025-09-16 11:21:26,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 56 minutes, 19 seconds)
2025-09-16 11:23:22,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:23:23,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 480.20807 ± 155.286
2025-09-16 11:23:23,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [669.1743, 589.91266, 415.90997, 475.83698, 352.0321, 816.70715, 323.24036, 461.9228, 349.63672, 347.70755]
2025-09-16 11:23:23,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [132.0, 129.0, 87.0, 95.0, 76.0, 158.0, 67.0, 94.0, 75.0, 75.0]
2025-09-16 11:23:23,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 54 minutes, 20 seconds)
2025-09-16 11:25:21,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:25:22,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 572.58386 ± 149.506
2025-09-16 11:25:22,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [793.7132, 457.10724, 740.23535, 705.78046, 505.00278, 368.84943, 561.37146, 689.5377, 567.30994, 336.9305]
2025-09-16 11:25:22,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [156.0, 102.0, 153.0, 135.0, 108.0, 77.0, 107.0, 132.0, 107.0, 71.0]
2025-09-16 11:25:22,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (572.58) for latency 3
2025-09-16 11:25:22,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 52 minutes, 40 seconds)
2025-09-16 11:27:20,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:27:22,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 513.03827 ± 39.775
2025-09-16 11:27:22,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [477.4544, 574.2715, 516.51715, 534.67267, 540.27716, 564.8901, 450.42673, 462.9208, 491.91904, 517.03394]
2025-09-16 11:27:22,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [91.0, 109.0, 110.0, 100.0, 104.0, 107.0, 96.0, 86.0, 104.0, 97.0]
2025-09-16 11:27:22,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 51 minutes, 27 seconds)
2025-09-16 11:29:18,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:29:19,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 592.21790 ± 120.279
2025-09-16 11:29:19,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [673.21893, 398.94696, 571.17834, 564.64136, 689.7401, 499.54706, 403.81204, 740.5714, 641.663, 738.85944]
2025-09-16 11:29:19,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [127.0, 86.0, 110.0, 107.0, 131.0, 91.0, 78.0, 147.0, 142.0, 152.0]
2025-09-16 11:29:19,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (592.22) for latency 3
2025-09-16 11:29:19,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 49 minutes, 34 seconds)
2025-09-16 11:31:16,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:31:18,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 572.60187 ± 135.794
2025-09-16 11:31:18,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [687.0549, 445.49207, 367.51923, 561.6736, 413.80334, 534.2553, 667.04126, 839.37024, 650.21686, 559.59186]
2025-09-16 11:31:18,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [149.0, 97.0, 78.0, 107.0, 91.0, 117.0, 142.0, 168.0, 126.0, 125.0]
2025-09-16 11:31:18,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 47 minutes, 50 seconds)
2025-09-16 11:33:14,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:33:16,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 582.65417 ± 85.551
2025-09-16 11:33:16,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [702.1102, 543.5221, 552.02765, 603.74805, 707.79095, 639.3285, 447.48016, 451.25858, 562.2577, 617.0179]
2025-09-16 11:33:16,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [144.0, 104.0, 103.0, 117.0, 137.0, 123.0, 94.0, 86.0, 106.0, 119.0]
2025-09-16 11:33:16,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 45 minutes, 50 seconds)
2025-09-16 11:35:12,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:35:14,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 599.56195 ± 108.008
2025-09-16 11:35:14,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [611.4253, 568.45746, 726.47736, 531.56616, 602.33466, 511.42172, 520.7594, 534.5955, 865.5563, 523.02606]
2025-09-16 11:35:14,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 110.0, 137.0, 101.0, 114.0, 96.0, 98.0, 103.0, 167.0, 99.0]
2025-09-16 11:35:14,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (599.56) for latency 3
2025-09-16 11:35:14,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 43 minutes, 46 seconds)
2025-09-16 11:37:11,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:37:12,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 588.96674 ± 104.814
2025-09-16 11:37:12,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [551.90784, 474.33575, 677.0246, 436.3759, 511.16766, 782.27625, 593.7608, 729.5005, 558.95056, 574.3678]
2025-09-16 11:37:12,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [118.0, 93.0, 133.0, 96.0, 109.0, 167.0, 114.0, 141.0, 109.0, 122.0]
2025-09-16 11:37:12,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 41 minutes, 27 seconds)
2025-09-16 11:39:10,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:39:12,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 677.60718 ± 174.239
2025-09-16 11:39:12,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [676.85114, 943.16437, 535.3005, 902.8323, 757.33563, 646.9967, 343.33362, 635.88904, 810.54254, 523.8261]
2025-09-16 11:39:12,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [142.0, 189.0, 115.0, 178.0, 151.0, 127.0, 69.0, 120.0, 159.0, 101.0]
2025-09-16 11:39:12,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (677.61) for latency 3
2025-09-16 11:39:12,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 39 minutes, 51 seconds)
2025-09-16 11:41:08,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:41:10,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 630.40601 ± 108.974
2025-09-16 11:41:10,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [913.00116, 682.96063, 602.37225, 542.91595, 653.499, 629.5256, 537.9455, 553.71716, 523.0522, 665.0705]
2025-09-16 11:41:10,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [194.0, 137.0, 127.0, 116.0, 140.0, 122.0, 114.0, 118.0, 112.0, 145.0]
2025-09-16 11:41:10,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 37 minutes, 56 seconds)
2025-09-16 11:43:08,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:43:10,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 803.40906 ± 175.824
2025-09-16 11:43:10,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [744.74567, 661.61615, 658.5937, 772.59045, 789.83374, 849.8577, 932.4005, 656.26904, 703.2466, 1264.9369]
2025-09-16 11:43:10,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [145.0, 126.0, 140.0, 149.0, 152.0, 166.0, 197.0, 127.0, 150.0, 249.0]
2025-09-16 11:43:10,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (803.41) for latency 3
2025-09-16 11:43:10,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 36 minutes, 25 seconds)
2025-09-16 11:45:07,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:45:09,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 752.15161 ± 222.005
2025-09-16 11:45:09,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [712.42, 888.84064, 587.8942, 1330.4652, 612.6359, 573.9186, 720.50714, 692.7896, 542.61084, 859.43396]
2025-09-16 11:45:09,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [139.0, 165.0, 117.0, 263.0, 114.0, 127.0, 134.0, 133.0, 117.0, 166.0]
2025-09-16 11:45:09,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 34 minutes, 43 seconds)
2025-09-16 11:47:06,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:47:07,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 624.42480 ± 131.654
2025-09-16 11:47:07,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [690.4482, 551.33203, 554.27264, 611.79877, 978.50146, 647.9252, 602.6958, 484.2107, 513.6864, 609.377]
2025-09-16 11:47:07,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [132.0, 106.0, 104.0, 114.0, 193.0, 124.0, 115.0, 92.0, 97.0, 118.0]
2025-09-16 11:47:07,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 32 minutes, 40 seconds)
2025-09-16 11:49:04,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:49:06,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 805.20471 ± 250.199
2025-09-16 11:49:06,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [895.34894, 843.2804, 930.0589, 693.28503, 514.4578, 683.07745, 765.1755, 876.4724, 449.38876, 1401.5018]
2025-09-16 11:49:06,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [172.0, 178.0, 182.0, 145.0, 108.0, 130.0, 146.0, 177.0, 88.0, 270.0]
2025-09-16 11:49:06,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (805.20) for latency 3
2025-09-16 11:49:06,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 30 minutes, 40 seconds)
2025-09-16 11:51:04,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:51:06,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 810.24316 ± 200.081
2025-09-16 11:51:06,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [818.98206, 853.15216, 557.92, 906.46625, 520.06305, 1203.9274, 591.38947, 992.63025, 875.9985, 781.9026]
2025-09-16 11:51:06,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [160.0, 172.0, 106.0, 176.0, 98.0, 232.0, 113.0, 194.0, 176.0, 148.0]
2025-09-16 11:51:06,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (810.24) for latency 3
2025-09-16 11:51:06,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 28 minutes, 59 seconds)
2025-09-16 11:53:03,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:53:05,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 626.61438 ± 99.419
2025-09-16 11:53:05,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [645.008, 591.53217, 717.3064, 502.7591, 720.88477, 589.772, 811.147, 500.11517, 668.83185, 518.7875]
2025-09-16 11:53:05,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [131.0, 125.0, 138.0, 103.0, 156.0, 115.0, 173.0, 99.0, 134.0, 101.0]
2025-09-16 11:53:05,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 26 minutes, 49 seconds)
2025-09-16 11:55:02,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:55:04,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 733.34656 ± 112.153
2025-09-16 11:55:04,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [754.45447, 728.7322, 660.44586, 1007.60254, 802.38727, 740.7361, 681.3902, 758.4215, 578.6109, 620.6844]
2025-09-16 11:55:04,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [146.0, 143.0, 127.0, 211.0, 154.0, 143.0, 133.0, 145.0, 113.0, 136.0]
2025-09-16 11:55:04,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 24 minutes, 49 seconds)
2025-09-16 11:57:03,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:57:05,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 870.44208 ± 256.510
2025-09-16 11:57:05,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [667.64264, 1218.7913, 865.11975, 859.5506, 503.07715, 610.4875, 666.1937, 1273.416, 1168.2828, 871.8599]
2025-09-16 11:57:05,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [145.0, 240.0, 170.0, 164.0, 99.0, 119.0, 126.0, 245.0, 233.0, 170.0]
2025-09-16 11:57:05,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (870.44) for latency 3
2025-09-16 11:57:05,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 23 minutes, 25 seconds)
2025-09-16 11:59:01,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:59:03,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 845.39880 ± 791.367
2025-09-16 11:59:03,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [541.13574, 591.479, 534.2633, 450.38696, 660.4931, 588.46826, 661.5618, 559.9917, 654.2251, 3211.983]
2025-09-16 11:59:03,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [110.0, 110.0, 100.0, 84.0, 127.0, 123.0, 130.0, 105.0, 136.0, 665.0]
2025-09-16 11:59:03,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 21 minutes, 16 seconds)
2025-09-16 12:01:01,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:01:04,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1005.40051 ± 290.906
2025-09-16 12:01:04,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1036.0227, 802.737, 1481.814, 818.4129, 1277.4272, 807.7512, 1292.1046, 1262.1187, 711.64, 563.9776]
2025-09-16 12:01:04,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [223.0, 158.0, 297.0, 168.0, 249.0, 161.0, 261.0, 249.0, 137.0, 109.0]
2025-09-16 12:01:04,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (1005.40) for latency 3
2025-09-16 12:01:04,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 19 minutes, 31 seconds)
2025-09-16 12:03:02,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:03:04,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 975.02454 ± 347.585
2025-09-16 12:03:04,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [744.027, 1754.9551, 1097.2527, 549.12195, 1335.3872, 1147.7106, 930.69617, 755.37823, 781.1365, 654.5804]
2025-09-16 12:03:04,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [160.0, 346.0, 216.0, 115.0, 264.0, 228.0, 181.0, 147.0, 150.0, 137.0]
2025-09-16 12:03:04,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 17 minutes, 53 seconds)
2025-09-16 12:05:03,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:05:07,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1444.94385 ± 984.113
2025-09-16 12:05:07,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [2117.2498, 1856.656, 940.20447, 1120.2404, 863.4889, 1030.5032, 4067.9648, 1022.26965, 902.4692, 528.3923]
2025-09-16 12:05:07,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [423.0, 387.0, 197.0, 234.0, 162.0, 213.0, 836.0, 219.0, 175.0, 99.0]
2025-09-16 12:05:07,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (1444.94) for latency 3
2025-09-16 12:05:07,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 16 minutes, 34 seconds)
2025-09-16 12:07:08,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:07:10,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 872.79138 ± 254.800
2025-09-16 12:07:10,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1207.5283, 701.1491, 1068.5468, 774.1969, 1004.5648, 607.8628, 834.9929, 1322.8292, 496.5337, 709.70966]
2025-09-16 12:07:10,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [253.0, 134.0, 204.0, 146.0, 193.0, 116.0, 159.0, 257.0, 94.0, 145.0]
2025-09-16 12:07:10,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 15 minutes, 8 seconds)
2025-09-16 12:09:06,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:09:10,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1479.41345 ± 728.036
2025-09-16 12:09:10,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [759.63184, 1365.9796, 1174.5857, 2912.4556, 2642.7095, 1221.6617, 800.0612, 1929.2584, 809.7046, 1178.0876]
2025-09-16 12:09:10,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [164.0, 291.0, 255.0, 585.0, 530.0, 244.0, 162.0, 391.0, 174.0, 252.0]
2025-09-16 12:09:10,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (1479.41) for latency 3
2025-09-16 12:09:10,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 13 minutes, 29 seconds)
2025-09-16 12:11:09,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:11:11,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 966.71826 ± 391.445
2025-09-16 12:11:11,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [675.7147, 1038.7413, 966.9606, 700.38916, 1114.6741, 1006.666, 484.19656, 612.40656, 1117.3829, 1950.0508]
2025-09-16 12:11:11,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [128.0, 214.0, 201.0, 133.0, 241.0, 203.0, 98.0, 128.0, 220.0, 400.0]
2025-09-16 12:11:11,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 11 minutes, 35 seconds)
2025-09-16 12:13:11,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:13:14,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1247.13159 ± 523.617
2025-09-16 12:13:14,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [790.87915, 1077.9037, 1177.483, 1664.0807, 1195.7614, 2531.266, 673.4467, 940.9972, 856.92004, 1562.577]
2025-09-16 12:13:14,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [152.0, 211.0, 248.0, 326.0, 234.0, 506.0, 129.0, 201.0, 162.0, 315.0]
2025-09-16 12:13:14,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 10 minutes, 8 seconds)
2025-09-16 12:15:17,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:15:20,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1213.87549 ± 504.729
2025-09-16 12:15:20,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1960.364, 2233.7112, 915.10736, 1003.88043, 830.5861, 556.2072, 1421.9972, 1365.083, 874.0748, 977.74274]
2025-09-16 12:15:20,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [404.0, 450.0, 190.0, 201.0, 170.0, 114.0, 281.0, 269.0, 179.0, 191.0]
2025-09-16 12:15:20,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 8 minutes, 50 seconds)
2025-09-16 12:17:12,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:17:15,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1094.15820 ± 307.088
2025-09-16 12:17:15,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [876.9392, 556.5574, 880.251, 1208.074, 933.1339, 1555.8815, 952.6973, 1565.1123, 1061.6494, 1351.2856]
2025-09-16 12:17:15,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [168.0, 119.0, 171.0, 236.0, 180.0, 305.0, 203.0, 303.0, 206.0, 263.0]
2025-09-16 12:17:15,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 5 minutes, 2 seconds)
2025-09-16 12:19:14,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:19:17,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1377.37744 ± 614.683
2025-09-16 12:19:17,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [840.5393, 1112.8643, 821.4021, 2099.5981, 1134.0248, 1205.5479, 1426.8569, 1066.3319, 2911.1013, 1155.5068]
2025-09-16 12:19:17,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [171.0, 218.0, 165.0, 439.0, 223.0, 239.0, 289.0, 213.0, 597.0, 226.0]
2025-09-16 12:19:17,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 3 minutes, 29 seconds)
2025-09-16 12:21:13,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:21:17,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 1563.06482 ± 526.128
2025-09-16 12:21:17,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [922.6701, 1365.3528, 1017.5307, 1594.735, 1141.3491, 2064.6462, 1106.6357, 1973.2106, 1806.7743, 2637.7434]
2025-09-16 12:21:17,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [196.0, 272.0, 197.0, 323.0, 226.0, 423.0, 232.0, 385.0, 368.0, 545.0]
2025-09-16 12:21:17,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (1563.06) for latency 3
2025-09-16 12:21:17,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 1 minute, 7 seconds)
2025-09-16 12:23:18,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:23:25,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 2437.35278 ± 664.387
2025-09-16 12:23:25,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [3338.334, 1677.6747, 2199.7463, 3796.7812, 2208.4888, 1674.612, 2850.4348, 2305.6895, 2430.8079, 1890.9562]
2025-09-16 12:23:25,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [699.0, 327.0, 431.0, 753.0, 436.0, 326.0, 564.0, 460.0, 490.0, 379.0]
2025-09-16 12:23:25,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (2437.35) for latency 3
2025-09-16 12:23:25,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 10 seconds)
2025-09-16 12:25:28,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:25:35,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 2484.50439 ± 1592.385
2025-09-16 12:25:35,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [4948.3433, 1540.8512, 2712.4558, 880.9165, 4339.07, 2037.3513, 1758.4331, 656.4173, 1015.5995, 4955.608]
2025-09-16 12:25:35,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 305.0, 543.0, 171.0, 887.0, 423.0, 349.0, 132.0, 195.0, 1000.0]
2025-09-16 12:25:35,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (2484.50) for latency 3
2025-09-16 12:25:35,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 58 minutes, 54 seconds)
2025-09-16 12:27:30,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:27:39,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 2871.76025 ± 1560.471
2025-09-16 12:27:39,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [3407.3179, 4970.956, 4496.585, 3440.0166, 1366.9878, 1102.5021, 1590.6434, 2715.2424, 5007.6646, 619.6903]
2025-09-16 12:27:39,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [705.0, 1000.0, 927.0, 682.0, 267.0, 232.0, 316.0, 552.0, 1000.0, 121.0]
2025-09-16 12:27:39,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (2871.76) for latency 3
2025-09-16 12:27:39,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 58 minutes, 31 seconds)
2025-09-16 12:29:33,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:29:42,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 2783.85400 ± 1612.495
2025-09-16 12:29:42,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [730.01685, 4928.445, 3412.2913, 820.2591, 4948.5728, 4836.8374, 1599.5236, 1758.8994, 3204.174, 1599.5214]
2025-09-16 12:29:42,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [142.0, 1000.0, 692.0, 177.0, 1000.0, 1000.0, 319.0, 350.0, 658.0, 332.0]
2025-09-16 12:29:42,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 56 minutes, 30 seconds)
2025-09-16 12:31:37,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:31:48,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 3319.75464 ± 1144.778
2025-09-16 12:31:48,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [4142.1675, 2642.6572, 2629.341, 2635.9373, 3787.4026, 4904.5806, 4153.497, 1412.7357, 2042.3527, 4846.874]
2025-09-16 12:31:48,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [853.0, 552.0, 516.0, 541.0, 779.0, 1000.0, 855.0, 307.0, 420.0, 1000.0]
2025-09-16 12:31:48,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (3319.75) for latency 3
2025-09-16 12:31:48,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 55 minutes, 34 seconds)
2025-09-16 12:33:57,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:34:07,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 3195.82349 ± 1712.691
2025-09-16 12:34:07,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1465.9391, 4887.681, 1632.9349, 4909.1777, 4902.887, 1809.3965, 1412.05, 1132.9177, 4904.3193, 4900.93]
2025-09-16 12:34:07,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [301.0, 1000.0, 349.0, 1000.0, 1000.0, 368.0, 305.0, 243.0, 1000.0, 1000.0]
2025-09-16 12:34:07,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 55 minutes, 27 seconds)
2025-09-16 12:35:55,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:36:03,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 2625.87012 ± 1438.479
2025-09-16 12:36:03,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1071.9326, 5018.851, 5018.9053, 1185.8169, 1693.6711, 3836.1086, 2520.272, 2125.856, 1157.5609, 2629.727]
2025-09-16 12:36:03,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [207.0, 1000.0, 1000.0, 228.0, 342.0, 762.0, 499.0, 415.0, 240.0, 529.0]
2025-09-16 12:36:03,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 50 minutes, 49 seconds)
2025-09-16 12:38:06,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:38:20,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4753.59473 ± 876.394
2025-09-16 12:38:20,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5051.1733, 5025.408, 5043.6074, 2125.3442, 5059.7217, 4988.5986, 5039.2075, 5072.9688, 5061.8706, 5068.043]
2025-09-16 12:38:20,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 415.0, 1000.0, 1000.0, 990.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:38:20,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (4753.59) for latency 3
2025-09-16 12:38:20,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 51 minutes, 7 seconds)
2025-09-16 12:40:23,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:40:37,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4481.56885 ± 1126.262
2025-09-16 12:40:37,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5034.8623, 5036.0615, 5063.5537, 5085.452, 5028.446, 2471.7705, 5037.4814, 2006.1752, 5022.264, 5029.6187]
2025-09-16 12:40:37,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 482.0, 1000.0, 388.0, 1000.0, 1000.0]
2025-09-16 12:40:37,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 51 minutes, 21 seconds)
2025-09-16 12:42:35,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:42:49,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4248.44727 ± 1391.137
2025-09-16 12:42:49,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [4928.2456, 4892.516, 4929.8203, 4946.176, 2154.0286, 4938.8955, 4940.3486, 4930.1343, 893.92615, 4930.3843]
2025-09-16 12:42:49,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 441.0, 1000.0, 1000.0, 1000.0, 177.0, 1000.0]
2025-09-16 12:42:49,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 50 minutes, 10 seconds)
2025-09-16 12:44:40,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:44:53,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4447.43555 ± 909.538
2025-09-16 12:44:53,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [3176.1829, 4981.429, 4996.896, 2295.5935, 4176.959, 4989.8193, 4969.3394, 5001.5933, 4927.4224, 4959.122]
2025-09-16 12:44:53,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [630.0, 1000.0, 1000.0, 454.0, 826.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:44:53,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 45 minutes, 34 seconds)
2025-09-16 12:47:01,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:47:12,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 3552.46802 ± 1528.479
2025-09-16 12:47:12,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [3868.3955, 1534.2308, 4938.8237, 772.58636, 4041.8015, 5001.008, 4973.033, 5000.7046, 1796.1923, 3597.9036]
2025-09-16 12:47:12,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [764.0, 296.0, 1000.0, 151.0, 799.0, 1000.0, 1000.0, 1000.0, 350.0, 721.0]
2025-09-16 12:47:12,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 47 minutes, 3 seconds)
2025-09-16 12:49:10,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:49:25,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5020.61035 ± 8.670
2025-09-16 12:49:25,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5023.1436, 5021.0073, 5010.923, 5034.9185, 5026.778, 5026.7725, 5026.998, 5018.855, 5011.974, 5004.7334]
2025-09-16 12:49:25,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:49:25,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (5020.61) for latency 3
2025-09-16 12:49:25,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 44 minutes, 11 seconds)
2025-09-16 12:51:16,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:51:27,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 3659.93042 ± 1302.141
2025-09-16 12:51:27,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1369.4937, 2115.0774, 3065.5515, 3256.643, 2513.7615, 4333.2974, 5017.7803, 4952.7646, 4994.615, 4980.321]
2025-09-16 12:51:27,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [269.0, 415.0, 631.0, 643.0, 507.0, 878.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:51:27,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 39 minutes, 46 seconds)
2025-09-16 12:53:33,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:53:48,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5024.66357 ± 10.737
2025-09-16 12:53:48,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5022.01, 5033.286, 5035.4775, 5034.67, 5014.486, 5029.866, 5008.089, 5039.6475, 5017.706, 5011.397]
2025-09-16 12:53:48,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:53:48,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (5024.66) for latency 3
2025-09-16 12:53:48,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 38 minutes, 51 seconds)
2025-09-16 12:55:47,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:56:03,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5034.80176 ± 9.595
2025-09-16 12:56:03,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5047.9307, 5034.8784, 5043.9126, 5043.4062, 5030.103, 5017.739, 5020.6685, 5041.9326, 5037.263, 5030.1865]
2025-09-16 12:56:03,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:56:03,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (5034.80) for latency 3
2025-09-16 12:56:03,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 38 minutes, 11 seconds)
2025-09-16 12:58:03,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:58:18,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5020.51660 ± 15.957
2025-09-16 12:58:18,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5009.333, 4986.702, 5032.3413, 5038.5903, 5027.164, 5001.466, 5021.9023, 5040.0166, 5023.994, 5023.659]
2025-09-16 12:58:18,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:58:18,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 35 minutes, 31 seconds)
2025-09-16 13:00:08,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:00:20,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4187.83643 ± 1616.267
2025-09-16 13:00:20,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5011.4097, 5012.785, 5011.705, 5016.385, 702.8276, 1226.4293, 4893.2593, 5015.9355, 4968.0083, 5019.616]
2025-09-16 13:00:20,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 133.0, 237.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:00:20,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 31 minutes, 45 seconds)
2025-09-16 13:02:21,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:02:36,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4996.97217 ± 11.414
2025-09-16 13:02:36,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [4995.786, 4994.72, 5010.857, 4984.2026, 4984.7183, 4997.431, 5012.5264, 4982.706, 4991.9346, 5014.8403]
2025-09-16 13:02:36,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:02:36,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 31 minutes, 22 seconds)
2025-09-16 13:04:35,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:04:50,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5005.00928 ± 317.810
2025-09-16 13:04:50,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5115.601, 5116.877, 5090.796, 5113.4546, 5121.11, 5110.3374, 5112.0366, 5108.672, 4051.8547, 5109.3555]
2025-09-16 13:04:50,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 791.0, 1000.0]
2025-09-16 13:04:50,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 28 minutes, 18 seconds)
2025-09-16 13:06:56,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:07:09,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4241.22461 ± 1625.466
2025-09-16 13:07:09,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5065.4224, 5044.0537, 5047.516, 5064.1655, 5026.685, 5048.5225, 1198.6017, 5078.0215, 792.3622, 5046.893]
2025-09-16 13:07:09,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 251.0, 1000.0, 152.0, 1000.0]
2025-09-16 13:07:09,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 26 minutes, 38 seconds)
2025-09-16 13:09:09,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:09:24,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5022.82422 ± 19.949
2025-09-16 13:09:24,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5015.2905, 4979.528, 5023.21, 5044.2153, 5006.3716, 5030.5254, 5056.046, 5034.141, 5021.202, 5017.711]
2025-09-16 13:09:24,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:09:24,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 24 minutes, 17 seconds)
2025-09-16 13:11:14,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:11:27,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4183.79688 ± 1724.586
2025-09-16 13:11:27,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5074.752, 5043.6206, 5038.8696, 5054.0713, 5037.3447, 862.82526, 5064.54, 610.59, 4994.6826, 5056.6704]
2025-09-16 13:11:27,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 164.0, 1000.0, 117.0, 1000.0, 1000.0]
2025-09-16 13:11:27,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 22 minutes, 10 seconds)
2025-09-16 13:13:26,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:13:40,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4673.31152 ± 1132.261
2025-09-16 13:13:40,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5077.6704, 5043.345, 4982.7183, 5052.165, 5074.848, 5025.26, 1277.6595, 5084.234, 5039.323, 5075.8955]
2025-09-16 13:13:40,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 254.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:13:40,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 19 minutes, 40 seconds)
2025-09-16 13:15:40,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:15:55,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4975.28027 ± 363.654
2025-09-16 13:15:55,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5099.2993, 5095.484, 5098.892, 5098.299, 5083.4326, 5120.7026, 5083.024, 3884.7512, 5101.061, 5087.8613]
2025-09-16 13:15:55,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 779.0, 1000.0, 1000.0]
2025-09-16 13:15:55,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 17 minutes, 33 seconds)
2025-09-16 13:17:59,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:18:13,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4650.83496 ± 1284.507
2025-09-16 13:18:13,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5074.725, 5066.055, 797.3814, 5073.3623, 5090.336, 5082.455, 5083.4834, 5078.5205, 5090.8022, 5071.2314]
2025-09-16 13:18:13,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 168.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:18:13,375 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 15 minutes, 13 seconds)
2025-09-16 13:20:10,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:20:24,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5083.58936 ± 11.261
2025-09-16 13:20:24,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5085.4937, 5088.5337, 5093.3135, 5082.2075, 5103.9297, 5070.145, 5069.3203, 5066.296, 5089.182, 5087.47]
2025-09-16 13:20:24,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:20:24,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (5083.59) for latency 3
2025-09-16 13:20:24,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 12 minutes, 40 seconds)
2025-09-16 13:22:23,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:22:37,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4707.47949 ± 885.602
2025-09-16 13:22:37,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5001.7456, 2051.3225, 4968.3774, 5040.995, 4990.083, 5018.6196, 4991.3813, 5024.3647, 4990.385, 4997.524]
2025-09-16 13:22:37,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 434.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:22:37,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 11 minutes, 30 seconds)
2025-09-16 13:24:37,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:24:50,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4380.37598 ± 1000.377
2025-09-16 13:24:50,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5111.962, 2572.0676, 4156.183, 5122.729, 5143.0356, 3726.7246, 5100.731, 5095.7217, 5127.305, 2647.2998]
2025-09-16 13:24:50,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 508.0, 819.0, 1000.0, 1000.0, 733.0, 1000.0, 1000.0, 1000.0, 532.0]
2025-09-16 13:24:50,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 9 minutes, 14 seconds)
2025-09-16 13:26:54,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:27:07,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4395.03174 ± 1329.160
2025-09-16 13:27:07,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5055.996, 2008.5638, 5064.58, 5056.1597, 5063.7275, 5042.4663, 5047.1436, 5063.399, 1485.5527, 5062.7227]
2025-09-16 13:27:07,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 392.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 304.0, 1000.0]
2025-09-16 13:27:07,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 7 minutes, 16 seconds)
2025-09-16 13:29:10,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:29:25,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5056.56592 ± 11.822
2025-09-16 13:29:25,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5056.527, 5049.1104, 5077.8506, 5053.7295, 5066.9497, 5060.017, 5071.4136, 5041.2495, 5046.6934, 5042.12]
2025-09-16 13:29:25,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:29:25,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 4 minutes, 56 seconds)
2025-09-16 13:31:23,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:31:38,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4892.40381 ± 397.424
2025-09-16 13:31:38,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5035.649, 5050.409, 5069.552, 5052.54, 4796.818, 3722.3616, 5046.9985, 5042.072, 5039.576, 5068.0625]
2025-09-16 13:31:38,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 949.0, 735.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:31:38,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 2 minutes, 48 seconds)
2025-09-16 13:33:34,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:33:49,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5134.47266 ± 7.700
2025-09-16 13:33:49,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5128.231, 5148.733, 5135.144, 5137.259, 5140.5117, 5134.6074, 5135.969, 5129.984, 5117.7617, 5136.527]
2025-09-16 13:33:49,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:33:49,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (5134.47) for latency 3
2025-09-16 13:33:49,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 28 seconds)
2025-09-16 13:35:42,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:35:56,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4773.64062 ± 902.055
2025-09-16 13:35:56,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5062.668, 5072.479, 5078.5767, 5075.9873, 5118.473, 5086.0347, 5068.6646, 5054.3467, 2068.0115, 5051.1636]
2025-09-16 13:35:56,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 388.0, 1000.0]
2025-09-16 13:35:56,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 57 minutes, 42 seconds)
2025-09-16 13:37:59,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:38:11,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4338.77539 ± 1427.156
2025-09-16 13:38:11,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [777.89484, 4938.1255, 2374.4885, 5074.239, 5067.8975, 5020.7627, 5038.004, 5045.7896, 5000.316, 5050.233]
2025-09-16 13:38:11,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [160.0, 1000.0, 469.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:38:11,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 55 minutes, 19 seconds)
2025-09-16 13:40:00,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:40:15,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5046.28711 ± 9.875
2025-09-16 13:40:15,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5063.999, 5053.3037, 5040.8027, 5041.9307, 5029.486, 5048.223, 5044.8066, 5038.934, 5041.1514, 5060.2354]
2025-09-16 13:40:15,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:40:15,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 52 minutes, 1 second)
2025-09-16 13:42:18,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:42:33,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5078.70312 ± 8.806
2025-09-16 13:42:33,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5079.053, 5065.41, 5090.3447, 5064.3916, 5082.7075, 5072.549, 5086.573, 5085.808, 5073.4585, 5086.7363]
2025-09-16 13:42:33,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:42:33,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 50 minutes, 14 seconds)
2025-09-16 13:44:29,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:44:43,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4755.12988 ± 971.597
2025-09-16 13:44:43,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5086.134, 5038.1284, 5081.56, 5081.1113, 5095.3438, 5084.259, 1840.6729, 5072.555, 5085.3706, 5086.16]
2025-09-16 13:44:43,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 378.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:44:43,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 47 minutes, 58 seconds)
2025-09-16 13:46:33,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:46:46,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4522.84521 ± 1020.696
2025-09-16 13:46:46,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5117.449, 5060.404, 4334.393, 2743.6838, 5149.831, 5086.698, 5138.3438, 5117.0527, 2342.6167, 5137.9824]
2025-09-16 13:46:46,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 839.0, 532.0, 1000.0, 1000.0, 1000.0, 1000.0, 475.0, 1000.0]
2025-09-16 13:46:46,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 45 minutes, 29 seconds)
2025-09-16 13:48:44,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:48:58,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4710.63965 ± 1150.840
2025-09-16 13:48:58,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [1258.1725, 5107.2905, 5097.981, 5102.3394, 5090.2065, 5092.3247, 5086.089, 5087.137, 5090.3975, 5094.4575]
2025-09-16 13:48:58,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [237.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:48:58,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 43 minutes, 6 seconds)
2025-09-16 13:50:57,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:51:12,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4945.75537 ± 482.852
2025-09-16 13:51:12,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5121.6836, 5100.684, 3497.5168, 5102.327, 5109.0205, 5084.9165, 5117.365, 5104.1743, 5117.5176, 5102.3525]
2025-09-16 13:51:12,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 678.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:51:12,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 41 minutes, 36 seconds)
2025-09-16 13:53:08,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:53:23,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5007.88867 ± 5.771
2025-09-16 13:53:23,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5019.556, 5004.344, 5012.662, 5012.534, 5009.6313, 5009.586, 5005.837, 5000.772, 5004.1514, 4999.8135]
2025-09-16 13:53:23,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:53:23,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 39 minutes)
2025-09-16 13:55:16,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:55:31,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5071.43066 ± 4.595
2025-09-16 13:55:31,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5069.473, 5078.2974, 5065.1235, 5070.8677, 5067.2896, 5069.474, 5066.4595, 5078.905, 5075.4434, 5072.9736]
2025-09-16 13:55:31,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:55:31,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 36 minutes, 42 seconds)
2025-09-16 13:57:31,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:57:46,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5023.84229 ± 6.732
2025-09-16 13:57:46,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5026.834, 5019.97, 5028.343, 5024.2817, 5007.5356, 5021.9043, 5022.1626, 5032.275, 5023.228, 5031.8926]
2025-09-16 13:57:46,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:57:46,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 35 minutes, 12 seconds)
2025-09-16 13:59:47,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:00:02,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5132.59912 ± 12.489
2025-09-16 14:00:02,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5143.1216, 5135.865, 5124.518, 5136.3467, 5153.532, 5118.6206, 5108.748, 5126.542, 5135.557, 5143.1426]
2025-09-16 14:00:02,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:00:02,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 33 minutes, 12 seconds)
2025-09-16 14:01:59,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:02:12,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4503.92188 ± 1051.418
2025-09-16 14:02:12,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [2156.0205, 4731.699, 5265.633, 5316.9707, 3004.013, 5273.5845, 5231.9316, 4007.466, 4819.288, 5232.6084]
2025-09-16 14:02:12,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [416.0, 899.0, 1000.0, 1000.0, 568.0, 1000.0, 1000.0, 755.0, 910.0, 1000.0]
2025-09-16 14:02:12,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 30 minutes, 47 seconds)
2025-09-16 14:04:10,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:04:25,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5108.37598 ± 14.930
2025-09-16 14:04:25,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5104.4463, 5140.2393, 5100.989, 5106.0415, 5084.547, 5122.5776, 5117.724, 5104.5615, 5092.173, 5110.461]
2025-09-16 14:04:25,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:04:25,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 28 minutes, 40 seconds)
2025-09-16 14:06:25,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:06:39,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5194.99219 ± 8.397
2025-09-16 14:06:39,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5198.215, 5206.575, 5187.87, 5200.6514, 5195.873, 5197.367, 5200.5454, 5193.8013, 5194.9844, 5174.037]
2025-09-16 14:06:39,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:06:39,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1226 [INFO]: New best (5194.99) for latency 3
2025-09-16 14:06:40,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 26 minutes, 44 seconds)
2025-09-16 14:08:40,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:08:54,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5010.69385 ± 49.924
2025-09-16 14:08:54,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5025.33, 5030.048, 5032.2827, 5028.7964, 4862.0435, 5026.8965, 5011.0625, 5035.4365, 5026.304, 5028.7383]
2025-09-16 14:08:54,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:08:54,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 24 minutes, 31 seconds)
2025-09-16 14:10:48,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:11:01,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4718.56445 ± 1050.995
2025-09-16 14:11:01,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5223.357, 5196.8057, 5128.2, 5145.9824, 5189.01, 4358.611, 4934.539, 1653.8135, 5170.017, 5185.3135]
2025-09-16 14:11:01,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 836.0, 959.0, 318.0, 1000.0, 1000.0]
2025-09-16 14:11:01,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 21 minutes, 58 seconds)
2025-09-16 14:13:01,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:13:16,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5058.96436 ± 6.873
2025-09-16 14:13:16,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5068.172, 5054.7056, 5062.416, 5064.2407, 5056.177, 5061.9634, 5041.732, 5057.2754, 5061.9497, 5061.017]
2025-09-16 14:13:16,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:13:16,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 19 minutes, 56 seconds)
2025-09-16 14:15:19,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:15:34,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5044.84326 ± 8.027
2025-09-16 14:15:34,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5047.8403, 5046.9307, 5046.077, 5031.4536, 5048.3486, 5033.7935, 5040.6597, 5040.2314, 5057.796, 5055.2993]
2025-09-16 14:15:34,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:15:34,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 17 minutes, 50 seconds)
2025-09-16 14:17:31,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:17:45,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4729.97314 ± 792.781
2025-09-16 14:17:45,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5095.695, 4926.9766, 4913.4414, 5022.532, 5029.9697, 2359.7878, 4901.6587, 5058.2812, 4936.716, 5054.672]
2025-09-16 14:17:45,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 468.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:17:45,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 15 minutes, 31 seconds)
2025-09-16 14:19:46,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:20:01,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4956.56152 ± 399.217
2025-09-16 14:20:01,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5083.5024, 5093.8643, 5087.2827, 5075.9683, 5085.368, 5096.1743, 5094.4653, 5101.404, 5088.4976, 3759.0864]
2025-09-16 14:20:01,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 727.0]
2025-09-16 14:20:01,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 13 minutes, 19 seconds)
2025-09-16 14:22:00,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:22:15,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4916.58447 ± 41.588
2025-09-16 14:22:15,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [4951.609, 4965.3936, 4869.56, 4881.7173, 4905.0034, 4872.435, 4985.7544, 4943.766, 4924.8574, 4865.7495]
2025-09-16 14:22:15,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:22:15,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 11 minutes, 13 seconds)
2025-09-16 14:24:10,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:24:23,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4654.74854 ± 993.580
2025-09-16 14:24:23,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5083.966, 3968.4812, 5111.87, 5121.62, 4938.9087, 1852.5916, 5113.0947, 5133.9683, 5104.5347, 5118.4473]
2025-09-16 14:24:23,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 772.0, 1000.0, 1000.0, 1000.0, 352.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:24:23,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes, 53 seconds)
2025-09-16 14:26:22,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:26:36,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4931.39990 ± 524.158
2025-09-16 14:26:36,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5102.3804, 5103.172, 5117.4785, 5102.964, 5125.588, 5078.9224, 3359.4878, 5087.4785, 5122.9507, 5113.5786]
2025-09-16 14:26:36,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 660.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:26:36,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 37 seconds)
2025-09-16 14:28:35,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:28:50,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 4958.19873 ± 27.372
2025-09-16 14:28:50,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [4970.803, 4929.639, 4982.6255, 4975.674, 4986.1597, 4933.002, 4896.711, 4974.386, 4963.429, 4969.559]
2025-09-16 14:28:50,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:28:50,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 25 seconds)
2025-09-16 14:30:43,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:30:58,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5010.19336 ± 10.408
2025-09-16 14:30:58,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5009.7886, 5009.0386, 5030.378, 4992.7485, 5020.159, 5006.399, 5008.502, 5003.053, 5000.901, 5020.966]
2025-09-16 14:30:58,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:30:58,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 11 seconds)
2025-09-16 14:32:54,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:33:08,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1221 [DEBUG]: Total Reward: 5146.77393 ± 22.459
2025-09-16 14:33:08,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1222 [DEBUG]: All rewards: [5129.1123, 5181.5264, 5182.3457, 5161.9736, 5139.6133, 5146.616, 5105.8325, 5145.4077, 5145.7915, 5129.5205]
2025-09-16 14:33:08,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:33:08,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-humanoid):1251 [DEBUG]: Training session finished
