2025-09-16 11:09:07,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.075-delay_3
2025-09-16 11:09:07,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.075-delay_3
2025-09-16 11:09:07,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'3': <latency_env.delayed_mdp.ConstantDelay object at 0x14d332214710>}
2025-09-16 11:09:07,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 11:09:07,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 11:09:07,824 baseline-bpql-noisepromille75-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=427, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 11:09:07,824 baseline-bpql-noisepromille75-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 11:09:09,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 11:09:09,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 11:10:54,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:10:55,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 296.77921 ± 28.152
2025-09-16 11:10:55,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [284.48996, 301.89658, 272.888, 285.09216, 281.24448, 291.09543, 374.74704, 310.6403, 275.50107, 290.19714]
2025-09-16 11:10:55,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [54.0, 56.0, 51.0, 54.0, 54.0, 55.0, 70.0, 58.0, 52.0, 54.0]
2025-09-16 11:10:55,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (296.78) for latency 3
2025-09-16 11:10:55,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 55 minutes, 21 seconds)
2025-09-16 11:12:51,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:12:52,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 428.57715 ± 62.335
2025-09-16 11:12:52,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [364.09402, 543.5455, 439.1938, 482.66925, 467.06216, 378.4534, 434.00656, 318.92407, 394.30527, 463.51764]
2025-09-16 11:12:52,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [75.0, 121.0, 85.0, 92.0, 90.0, 74.0, 85.0, 62.0, 75.0, 93.0]
2025-09-16 11:12:52,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (428.58) for latency 3
2025-09-16 11:12:52,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 2 minutes, 1 second)
2025-09-16 11:14:47,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:14:48,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 498.23163 ± 113.056
2025-09-16 11:14:48,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [654.6529, 557.1419, 346.88074, 412.91113, 346.9916, 551.37225, 460.83502, 517.86346, 438.77374, 694.8936]
2025-09-16 11:14:48,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [125.0, 110.0, 67.0, 78.0, 65.0, 104.0, 87.0, 97.0, 86.0, 138.0]
2025-09-16 11:14:48,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (498.23) for latency 3
2025-09-16 11:14:48,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 2 minutes, 38 seconds)
2025-09-16 11:16:44,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:16:45,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 387.63885 ± 91.549
2025-09-16 11:16:45,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [348.39746, 433.07895, 411.523, 299.3645, 278.74884, 381.83267, 295.67032, 415.56384, 612.5192, 399.68964]
2025-09-16 11:16:45,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [66.0, 92.0, 82.0, 61.0, 57.0, 72.0, 62.0, 84.0, 110.0, 78.0]
2025-09-16 11:16:45,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 2 minutes, 26 seconds)
2025-09-16 11:18:40,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:18:41,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 435.44800 ± 111.618
2025-09-16 11:18:41,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [298.86633, 712.6077, 507.9175, 383.8707, 445.96118, 328.40494, 438.9594, 339.90527, 441.10443, 456.88257]
2025-09-16 11:18:41,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [67.0, 141.0, 100.0, 84.0, 88.0, 71.0, 85.0, 74.0, 97.0, 97.0]
2025-09-16 11:18:41,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 1 minute, 15 seconds)
2025-09-16 11:20:37,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:20:38,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 418.03418 ± 64.298
2025-09-16 11:20:38,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [366.88098, 358.57492, 357.73575, 478.46402, 392.9273, 492.19302, 356.07263, 456.4225, 380.387, 540.68353]
2025-09-16 11:20:38,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 66.0, 79.0, 89.0, 84.0, 93.0, 66.0, 85.0, 84.0, 104.0]
2025-09-16 11:20:38,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 2 minutes, 36 seconds)
2025-09-16 11:22:33,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:22:34,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 463.98798 ± 134.321
2025-09-16 11:22:34,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [392.0888, 475.08417, 333.19858, 495.016, 822.0078, 343.97696, 362.0044, 474.72058, 426.90714, 514.8756]
2025-09-16 11:22:34,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [74.0, 91.0, 65.0, 90.0, 159.0, 72.0, 71.0, 89.0, 82.0, 98.0]
2025-09-16 11:22:34,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 36 seconds)
2025-09-16 11:24:30,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:24:32,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 539.07153 ± 67.360
2025-09-16 11:24:32,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [488.74158, 491.38547, 661.02313, 486.921, 620.079, 514.38495, 626.02167, 524.41785, 450.218, 527.5228]
2025-09-16 11:24:32,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [107.0, 93.0, 125.0, 90.0, 132.0, 115.0, 119.0, 101.0, 85.0, 97.0]
2025-09-16 11:24:32,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (539.07) for latency 3
2025-09-16 11:24:32,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 59 minutes)
2025-09-16 11:26:28,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:26:30,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 546.49792 ± 135.070
2025-09-16 11:26:30,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [438.77777, 422.87814, 562.7358, 688.621, 626.5854, 284.6839, 624.02075, 752.47485, 607.2013, 457.0009]
2025-09-16 11:26:30,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [84.0, 92.0, 109.0, 136.0, 122.0, 58.0, 129.0, 147.0, 116.0, 85.0]
2025-09-16 11:26:30,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (546.50) for latency 3
2025-09-16 11:26:30,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 57 minutes, 17 seconds)
2025-09-16 11:28:28,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:28:29,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 539.40729 ± 152.868
2025-09-16 11:28:29,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [449.15018, 315.46112, 660.6503, 349.63028, 585.38574, 584.5261, 348.6515, 658.6003, 676.4527, 765.5645]
2025-09-16 11:28:29,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 69.0, 126.0, 79.0, 128.0, 110.0, 76.0, 125.0, 142.0, 148.0]
2025-09-16 11:28:29,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 56 minutes, 18 seconds)
2025-09-16 11:30:28,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:30:29,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 563.59216 ± 94.379
2025-09-16 11:30:29,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [458.36328, 535.7055, 473.50284, 457.6278, 608.19434, 587.4519, 510.77945, 577.04016, 774.16284, 653.09375]
2025-09-16 11:30:29,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 100.0, 87.0, 85.0, 113.0, 112.0, 97.0, 110.0, 168.0, 122.0]
2025-09-16 11:30:29,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (563.59) for latency 3
2025-09-16 11:30:29,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 55 minutes, 18 seconds)
2025-09-16 11:32:27,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:32:28,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 537.42004 ± 76.288
2025-09-16 11:32:28,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [419.77527, 569.31213, 644.30426, 586.044, 395.572, 587.01495, 585.39886, 571.9407, 536.39014, 478.4476]
2025-09-16 11:32:28,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [78.0, 108.0, 123.0, 116.0, 74.0, 116.0, 116.0, 120.0, 103.0, 93.0]
2025-09-16 11:32:28,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 54 minutes, 14 seconds)
2025-09-16 11:34:27,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:34:29,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 680.66547 ± 130.490
2025-09-16 11:34:29,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [695.42017, 892.8183, 724.9966, 874.7598, 594.4459, 581.6116, 497.85263, 666.47845, 763.82306, 514.4478]
2025-09-16 11:34:29,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [132.0, 169.0, 141.0, 170.0, 112.0, 112.0, 95.0, 144.0, 153.0, 110.0]
2025-09-16 11:34:29,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (680.67) for latency 3
2025-09-16 11:34:29,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 53 minutes, 11 seconds)
2025-09-16 11:36:27,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:36:28,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 590.09045 ± 110.157
2025-09-16 11:36:28,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [647.0142, 532.02893, 806.4949, 524.2417, 464.28326, 605.0332, 476.43094, 641.7638, 476.00122, 727.6123]
2025-09-16 11:36:28,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [125.0, 117.0, 165.0, 99.0, 91.0, 117.0, 107.0, 127.0, 100.0, 139.0]
2025-09-16 11:36:28,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 51 minutes, 39 seconds)
2025-09-16 11:38:24,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:38:26,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 714.01770 ± 233.406
2025-09-16 11:38:26,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [588.2646, 399.8113, 1096.4277, 703.9053, 640.9974, 1108.4213, 575.5845, 427.58917, 739.7139, 859.462]
2025-09-16 11:38:26,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [120.0, 86.0, 227.0, 132.0, 120.0, 205.0, 107.0, 89.0, 157.0, 161.0]
2025-09-16 11:38:26,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (714.02) for latency 3
2025-09-16 11:38:26,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 49 minutes, 10 seconds)
2025-09-16 11:40:22,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:40:23,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 550.44818 ± 103.028
2025-09-16 11:40:23,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [453.69028, 720.0849, 514.1564, 602.659, 517.21027, 613.97595, 717.8636, 467.6286, 408.63528, 488.57755]
2025-09-16 11:40:23,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [85.0, 137.0, 97.0, 114.0, 97.0, 117.0, 145.0, 87.0, 89.0, 89.0]
2025-09-16 11:40:23,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 46 minutes, 20 seconds)
2025-09-16 11:42:19,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:42:20,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 511.23022 ± 76.267
2025-09-16 11:42:20,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [526.29346, 612.2319, 533.3784, 574.51575, 570.05347, 454.76254, 445.27252, 346.17596, 572.93555, 476.6827]
2025-09-16 11:42:20,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 114.0, 103.0, 109.0, 111.0, 87.0, 85.0, 67.0, 109.0, 89.0]
2025-09-16 11:42:20,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 43 minutes, 45 seconds)
2025-09-16 11:44:16,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:44:18,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 647.66290 ± 139.890
2025-09-16 11:44:18,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [544.57007, 850.4954, 545.0631, 539.66223, 661.8316, 725.69385, 880.0937, 511.21838, 467.91342, 750.0873]
2025-09-16 11:44:18,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [102.0, 170.0, 106.0, 97.0, 129.0, 140.0, 189.0, 107.0, 87.0, 143.0]
2025-09-16 11:44:18,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 41 minutes)
2025-09-16 11:46:14,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:46:17,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 841.58252 ± 235.210
2025-09-16 11:46:17,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [890.48364, 719.73145, 769.4833, 849.7589, 917.159, 1066.3301, 771.1155, 543.69635, 1368.1902, 519.87665]
2025-09-16 11:46:17,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [178.0, 147.0, 149.0, 181.0, 176.0, 206.0, 161.0, 103.0, 275.0, 107.0]
2025-09-16 11:46:17,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (841.58) for latency 3
2025-09-16 11:46:17,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 38 minutes, 50 seconds)
2025-09-16 11:48:12,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:48:14,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 799.42218 ± 333.396
2025-09-16 11:48:14,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [468.36893, 449.08823, 1280.5474, 538.6412, 1510.0717, 591.0787, 907.5037, 708.13513, 839.61145, 701.1759]
2025-09-16 11:48:14,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [102.0, 102.0, 275.0, 102.0, 298.0, 115.0, 179.0, 135.0, 165.0, 133.0]
2025-09-16 11:48:14,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 36 minutes, 51 seconds)
2025-09-16 11:50:12,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:50:14,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 805.92108 ± 157.143
2025-09-16 11:50:14,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [641.382, 712.6896, 962.89075, 693.1893, 1074.7465, 743.3855, 915.49536, 994.436, 598.0214, 722.9746]
2025-09-16 11:50:14,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [131.0, 133.0, 186.0, 131.0, 209.0, 144.0, 188.0, 206.0, 133.0, 138.0]
2025-09-16 11:50:14,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 35 minutes, 40 seconds)
2025-09-16 11:52:11,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:52:13,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1011.66327 ± 247.317
2025-09-16 11:52:13,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1152.8407, 1072.7661, 751.8371, 1279.1036, 1293.9385, 1204.5138, 556.6552, 667.5887, 1077.7095, 1059.6792]
2025-09-16 11:52:13,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [222.0, 207.0, 148.0, 247.0, 273.0, 235.0, 107.0, 127.0, 216.0, 204.0]
2025-09-16 11:52:13,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (1011.66) for latency 3
2025-09-16 11:52:13,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 34 minutes, 9 seconds)
2025-09-16 11:54:10,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:54:13,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 906.02277 ± 339.302
2025-09-16 11:54:13,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [679.0329, 547.698, 935.76776, 1783.5092, 693.625, 609.7343, 923.2059, 926.511, 1147.5135, 813.6303]
2025-09-16 11:54:13,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [131.0, 106.0, 176.0, 347.0, 133.0, 123.0, 201.0, 179.0, 218.0, 167.0]
2025-09-16 11:54:13,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 32 minutes, 41 seconds)
2025-09-16 11:56:06,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:56:08,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 731.00409 ± 129.155
2025-09-16 11:56:08,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [816.87524, 781.9249, 654.5643, 960.1061, 638.7279, 751.14325, 897.32367, 518.9905, 678.74976, 611.6346]
2025-09-16 11:56:08,613 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [160.0, 171.0, 130.0, 196.0, 122.0, 142.0, 175.0, 97.0, 136.0, 138.0]
2025-09-16 11:56:08,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 29 minutes, 51 seconds)
2025-09-16 11:58:06,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 11:58:08,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 941.79749 ± 250.203
2025-09-16 11:58:08,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [617.78925, 1229.8206, 681.63336, 923.5961, 1073.5598, 1303.2139, 1288.2722, 847.6367, 715.656, 736.7966]
2025-09-16 11:58:08,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [118.0, 241.0, 126.0, 177.0, 206.0, 252.0, 250.0, 161.0, 134.0, 139.0]
2025-09-16 11:58:08,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 28 minutes, 31 seconds)
2025-09-16 12:00:05,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:00:08,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1014.56464 ± 340.982
2025-09-16 12:00:08,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [713.9784, 903.70807, 725.74036, 1291.2358, 1006.41504, 1127.9546, 1895.7672, 782.804, 881.55273, 816.48975]
2025-09-16 12:00:08,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [155.0, 176.0, 135.0, 244.0, 193.0, 211.0, 368.0, 148.0, 167.0, 155.0]
2025-09-16 12:00:08,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (1014.56) for latency 3
2025-09-16 12:00:08,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 26 minutes, 26 seconds)
2025-09-16 12:02:05,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:02:08,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1236.51917 ± 534.916
2025-09-16 12:02:08,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [755.3835, 1075.5358, 1726.2354, 1353.6998, 1196.0621, 1311.4108, 2408.289, 297.2408, 945.9986, 1295.3348]
2025-09-16 12:02:08,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [145.0, 203.0, 340.0, 282.0, 232.0, 253.0, 483.0, 65.0, 183.0, 267.0]
2025-09-16 12:02:08,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (1236.52) for latency 3
2025-09-16 12:02:08,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 24 minutes, 42 seconds)
2025-09-16 12:04:04,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:04:07,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1039.60583 ± 311.395
2025-09-16 12:04:07,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1100.1854, 863.77295, 1711.9487, 634.0844, 833.8906, 1311.7253, 1239.6084, 977.03156, 643.7923, 1080.0181]
2025-09-16 12:04:07,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [212.0, 190.0, 329.0, 143.0, 160.0, 258.0, 232.0, 185.0, 124.0, 206.0]
2025-09-16 12:04:07,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 22 minutes, 34 seconds)
2025-09-16 12:06:08,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:06:11,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1297.77966 ± 295.493
2025-09-16 12:06:11,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1051.4778, 874.53186, 1558.6769, 1181.5447, 927.18726, 1064.5042, 1567.3862, 1587.0084, 1439.0728, 1726.4066]
2025-09-16 12:06:11,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [216.0, 172.0, 310.0, 226.0, 190.0, 207.0, 331.0, 319.0, 305.0, 357.0]
2025-09-16 12:06:11,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (1297.78) for latency 3
2025-09-16 12:06:11,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 22 minutes, 47 seconds)
2025-09-16 12:08:05,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:08:09,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1622.97363 ± 842.346
2025-09-16 12:08:09,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [2688.9265, 1389.7244, 1147.3341, 822.8058, 1575.4023, 1991.7598, 917.61536, 3296.6301, 1988.1783, 411.3584]
2025-09-16 12:08:09,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [531.0, 290.0, 240.0, 157.0, 297.0, 387.0, 191.0, 631.0, 388.0, 91.0]
2025-09-16 12:08:09,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (1622.97) for latency 3
2025-09-16 12:08:09,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 20 minutes, 12 seconds)
2025-09-16 12:10:10,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:10:13,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1173.37891 ± 293.047
2025-09-16 12:10:13,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1228.5879, 1813.7638, 904.87695, 1070.9701, 928.2942, 917.2539, 824.18744, 1470.8682, 1308.6853, 1266.3013]
2025-09-16 12:10:13,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [235.0, 350.0, 171.0, 206.0, 173.0, 190.0, 162.0, 280.0, 262.0, 242.0]
2025-09-16 12:10:13,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 19 minutes, 17 seconds)
2025-09-16 12:12:12,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:12:17,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1688.40747 ± 940.410
2025-09-16 12:12:17,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [2041.3489, 1843.7871, 1306.2867, 907.5388, 1155.0367, 1252.3606, 4312.574, 1205.7937, 1104.4437, 1754.9042]
2025-09-16 12:12:17,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [406.0, 356.0, 254.0, 175.0, 221.0, 245.0, 855.0, 242.0, 224.0, 337.0]
2025-09-16 12:12:17,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (1688.41) for latency 3
2025-09-16 12:12:17,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 17 minutes, 55 seconds)
2025-09-16 12:14:21,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:14:26,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1574.89966 ± 525.250
2025-09-16 12:14:26,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [701.7853, 1340.6682, 1587.3447, 1076.3625, 1516.4835, 1741.6307, 1277.9486, 1930.4506, 1824.6455, 2751.6772]
2025-09-16 12:14:26,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [147.0, 257.0, 314.0, 207.0, 288.0, 345.0, 246.0, 376.0, 346.0, 544.0]
2025-09-16 12:14:26,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 18 minutes, 13 seconds)
2025-09-16 12:16:23,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:16:28,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1925.32166 ± 1080.033
2025-09-16 12:16:28,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [4175.924, 940.8885, 3090.675, 991.09753, 2000.5966, 1277.8679, 820.8052, 2958.5103, 1096.6792, 1900.1726]
2025-09-16 12:16:28,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [823.0, 183.0, 604.0, 197.0, 397.0, 265.0, 159.0, 572.0, 233.0, 384.0]
2025-09-16 12:16:28,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (1925.32) for latency 3
2025-09-16 12:16:28,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 15 minutes, 37 seconds)
2025-09-16 12:18:34,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:18:40,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1946.33240 ± 913.930
2025-09-16 12:18:40,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [3858.5295, 1031.4668, 1891.744, 1815.1876, 942.217, 2574.9375, 916.51636, 2949.356, 2119.3738, 1363.9954]
2025-09-16 12:18:40,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [752.0, 222.0, 410.0, 365.0, 177.0, 490.0, 172.0, 564.0, 415.0, 293.0]
2025-09-16 12:18:40,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (1946.33) for latency 3
2025-09-16 12:18:40,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 16 minutes, 32 seconds)
2025-09-16 12:20:33,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:20:37,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1548.85742 ± 921.095
2025-09-16 12:20:37,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [3294.3738, 1940.6938, 736.0645, 3110.2786, 670.6003, 917.79034, 729.83514, 1001.90546, 1392.6205, 1694.4135]
2025-09-16 12:20:37,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [651.0, 379.0, 154.0, 614.0, 143.0, 186.0, 147.0, 200.0, 276.0, 327.0]
2025-09-16 12:20:37,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 12 minutes, 59 seconds)
2025-09-16 12:22:37,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:22:41,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1691.07458 ± 665.307
2025-09-16 12:22:41,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1232.2399, 733.1676, 1986.9056, 1660.3683, 1224.0829, 1057.5035, 1912.899, 2742.586, 2871.9873, 1489.0055]
2025-09-16 12:22:41,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [251.0, 144.0, 394.0, 326.0, 231.0, 204.0, 377.0, 529.0, 548.0, 286.0]
2025-09-16 12:22:41,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 11 minutes, 12 seconds)
2025-09-16 12:24:45,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:24:51,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 2153.09717 ± 761.848
2025-09-16 12:24:51,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [2690.2144, 2621.3208, 1902.1221, 2404.854, 2466.0952, 1904.4006, 3679.8074, 1252.0913, 1768.2389, 841.82733]
2025-09-16 12:24:51,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [527.0, 512.0, 387.0, 491.0, 479.0, 386.0, 741.0, 267.0, 350.0, 177.0]
2025-09-16 12:24:51,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (2153.10) for latency 3
2025-09-16 12:24:51,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 9 minutes, 15 seconds)
2025-09-16 12:26:46,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:26:55,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 3107.63330 ± 1232.646
2025-09-16 12:26:55,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [2471.1355, 3447.3706, 4257.385, 1041.3064, 4168.618, 4524.7085, 4634.8745, 1795.104, 1736.0033, 2999.8252]
2025-09-16 12:26:55,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [497.0, 726.0, 866.0, 218.0, 866.0, 924.0, 973.0, 391.0, 350.0, 620.0]
2025-09-16 12:26:55,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (3107.63) for latency 3
2025-09-16 12:26:55,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 7 minutes, 34 seconds)
2025-09-16 12:28:52,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:28:57,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1849.10425 ± 1289.397
2025-09-16 12:28:57,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [988.0857, 5048.5723, 2126.2197, 931.7156, 1125.7664, 1683.4937, 2676.2742, 541.22766, 2614.8274, 754.85925]
2025-09-16 12:28:57,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [212.0, 1000.0, 440.0, 190.0, 216.0, 343.0, 535.0, 102.0, 535.0, 161.0]
2025-09-16 12:28:57,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 3 minutes, 29 seconds)
2025-09-16 12:31:00,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:31:09,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 3075.48975 ± 1184.080
2025-09-16 12:31:09,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [3718.3945, 2561.7383, 5025.158, 1761.8073, 4973.9595, 1253.2036, 2292.9006, 3361.785, 3165.0469, 2640.9036]
2025-09-16 12:31:09,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [730.0, 500.0, 1000.0, 336.0, 1000.0, 253.0, 483.0, 656.0, 643.0, 521.0]
2025-09-16 12:31:09,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 4 minutes, 18 seconds)
2025-09-16 12:33:14,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:33:26,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4318.67969 ± 918.154
2025-09-16 12:33:26,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [2557.35, 3708.4133, 3897.8357, 5113.151, 5193.9424, 4007.517, 5125.1597, 3236.1687, 5178.6562, 5168.6016]
2025-09-16 12:33:26,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [494.0, 727.0, 753.0, 1000.0, 1000.0, 808.0, 1000.0, 620.0, 1000.0, 1000.0]
2025-09-16 12:33:26,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (4318.68) for latency 3
2025-09-16 12:33:26,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 4 minutes, 40 seconds)
2025-09-16 12:35:22,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:35:34,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4115.61621 ± 1421.892
2025-09-16 12:35:34,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5103.305, 1237.3707, 5090.218, 4901.2905, 5015.7837, 2550.5432, 4943.394, 5116.751, 2200.6692, 4996.837]
2025-09-16 12:35:34,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 240.0, 1000.0, 975.0, 1000.0, 522.0, 985.0, 1000.0, 436.0, 1000.0]
2025-09-16 12:35:34,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 2 minutes, 3 seconds)
2025-09-16 12:37:32,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:37:47,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5089.67578 ± 41.792
2025-09-16 12:37:47,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5123.232, 5077.166, 5153.8765, 5046.218, 5102.318, 5104.7866, 5069.9907, 5140.2534, 5009.496, 5069.421]
2025-09-16 12:37:47,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:37:47,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (5089.68) for latency 3
2025-09-16 12:37:47,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 1 minute, 43 seconds)
2025-09-16 12:39:44,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:39:57,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4177.91357 ± 1555.122
2025-09-16 12:39:57,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5109.027, 1280.9297, 4606.496, 5022.631, 4663.183, 5070.34, 5057.9194, 5028.6675, 5041.078, 898.8646]
2025-09-16 12:39:57,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 273.0, 908.0, 1000.0, 942.0, 1000.0, 1000.0, 1000.0, 1000.0, 168.0]
2025-09-16 12:39:57,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 56 seconds)
2025-09-16 12:41:54,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:42:09,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4878.42236 ± 874.762
2025-09-16 12:42:09,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5167.889, 5159.984, 2254.8188, 5185.4014, 5193.729, 5181.4985, 5181.456, 5155.518, 5120.7407, 5183.185]
2025-09-16 12:42:09,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 432.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:42:09,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 58 minutes, 46 seconds)
2025-09-16 12:44:07,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:44:18,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4178.81494 ± 1225.701
2025-09-16 12:44:18,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [2446.6912, 1669.2417, 5170.937, 5169.05, 5113.7285, 4535.992, 3236.121, 4155.129, 5143.173, 5148.086]
2025-09-16 12:44:18,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [456.0, 320.0, 1000.0, 1000.0, 1000.0, 869.0, 633.0, 804.0, 1000.0, 1000.0]
2025-09-16 12:44:18,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 55 minutes, 13 seconds)
2025-09-16 12:46:23,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:46:32,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 3035.12817 ± 1636.316
2025-09-16 12:46:32,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [4865.9985, 4443.895, 1778.8982, 1285.4783, 1249.2977, 3890.1843, 5050.705, 1231.839, 4927.799, 1627.187]
2025-09-16 12:46:32,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 944.0, 381.0, 269.0, 268.0, 784.0, 1000.0, 258.0, 1000.0, 349.0]
2025-09-16 12:46:32,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 54 minutes, 4 seconds)
2025-09-16 12:48:32,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:48:47,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5111.69824 ± 43.419
2025-09-16 12:48:47,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5142.034, 5136.7417, 5118.0654, 5132.262, 5079.8555, 5155.3276, 5114.659, 5118.301, 4994.6816, 5125.0557]
2025-09-16 12:48:47,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:48:47,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (5111.70) for latency 3
2025-09-16 12:48:47,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 52 minutes, 6 seconds)
2025-09-16 12:50:47,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:51:01,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4710.82910 ± 1230.895
2025-09-16 12:51:01,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5145.918, 1024.4088, 5255.357, 5185.9785, 5041.096, 5004.587, 5141.753, 5080.775, 5170.806, 5057.608]
2025-09-16 12:51:01,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 223.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:51:01,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 50 minutes, 47 seconds)
2025-09-16 12:53:06,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:53:18,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4240.13867 ± 1227.461
2025-09-16 12:53:18,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [4836.902, 2369.686, 5167.3706, 5125.7065, 1519.6165, 3719.2659, 4931.3413, 4832.054, 5123.2036, 4776.2427]
2025-09-16 12:53:18,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 459.0, 1000.0, 1000.0, 323.0, 716.0, 1000.0, 1000.0, 1000.0, 927.0]
2025-09-16 12:53:18,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 49 minutes, 20 seconds)
2025-09-16 12:55:17,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:55:33,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5142.15723 ± 73.305
2025-09-16 12:55:33,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5181.262, 5183.577, 5187.4834, 5074.944, 5163.792, 5177.1265, 5180.116, 5166.9453, 4942.861, 5163.461]
2025-09-16 12:55:33,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:55:33,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (5142.16) for latency 3
2025-09-16 12:55:33,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 47 minutes, 52 seconds)
2025-09-16 12:57:31,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:57:47,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5022.56152 ± 78.385
2025-09-16 12:57:47,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5001.9473, 5035.9175, 5112.7407, 4990.125, 5109.0967, 4918.676, 5063.2783, 5130.3516, 4889.311, 4974.174]
2025-09-16 12:57:47,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:57:47,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 45 minutes, 43 seconds)
2025-09-16 12:59:44,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 12:59:59,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5065.93311 ± 234.119
2025-09-16 12:59:59,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5097.551, 4396.4087, 5157.2993, 5161.366, 5143.8354, 5191.478, 5195.1104, 4951.7314, 5145.058, 5219.4917]
2025-09-16 12:59:59,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 878.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:59:59,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 42 minutes, 59 seconds)
2025-09-16 13:01:57,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:02:11,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4552.35059 ± 1350.535
2025-09-16 13:02:11,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5076.0195, 4989.559, 5068.7036, 4971.6084, 513.38696, 4912.0913, 4731.8384, 5089.858, 5083.904, 5086.5347]
2025-09-16 13:02:11,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 105.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:02:11,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 40 minutes, 30 seconds)
2025-09-16 13:04:11,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:04:25,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4919.68896 ± 961.252
2025-09-16 13:04:25,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5229.279, 5251.1284, 5228.149, 5222.7383, 5218.475, 2036.2986, 5250.5723, 5234.4785, 5261.3027, 5264.469]
2025-09-16 13:04:25,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 396.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:04:25,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 37 minutes, 44 seconds)
2025-09-16 13:06:17,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:06:31,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4971.58545 ± 461.723
2025-09-16 13:06:31,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5211.523, 5213.482, 5192.316, 5201.7324, 5196.3022, 3660.6052, 5128.5757, 5111.3726, 5112.694, 4687.2515]
2025-09-16 13:06:31,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 700.0, 1000.0, 1000.0, 1000.0, 894.0]
2025-09-16 13:06:31,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 34 minutes, 23 seconds)
2025-09-16 13:08:30,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:08:44,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4839.23486 ± 621.000
2025-09-16 13:08:44,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5152.854, 5147.685, 5157.166, 5163.494, 3262.0781, 5012.8716, 5143.666, 5136.1445, 5178.947, 4037.4412]
2025-09-16 13:08:44,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 648.0, 1000.0, 1000.0, 1000.0, 1000.0, 776.0]
2025-09-16 13:08:44,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 32 minutes)
2025-09-16 13:10:43,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:10:57,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4995.42676 ± 511.765
2025-09-16 13:10:57,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5169.6826, 5209.056, 5215.655, 3518.6528, 5219.7515, 5251.823, 5154.2646, 5212.499, 4751.26, 5251.626]
2025-09-16 13:10:57,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 677.0, 1000.0, 1000.0, 1000.0, 1000.0, 909.0, 1000.0]
2025-09-16 13:10:57,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 29 minutes, 57 seconds)
2025-09-16 13:12:55,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:13:11,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5018.63574 ± 43.375
2025-09-16 13:13:11,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5002.4844, 5059.144, 4985.4, 4911.7153, 5019.1436, 5025.935, 5019.913, 5054.3506, 5037.9434, 5070.324]
2025-09-16 13:13:11,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:13:11,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 27 minutes, 53 seconds)
2025-09-16 13:15:10,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:15:25,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5101.73975 ± 45.350
2025-09-16 13:15:25,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5087.795, 5072.1953, 5011.524, 5102.6855, 5146.1914, 5073.2495, 5150.679, 5166.584, 5134.6055, 5071.8853]
2025-09-16 13:15:25,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:15:25,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 25 minutes, 48 seconds)
2025-09-16 13:17:24,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:17:39,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5260.68994 ± 17.430
2025-09-16 13:17:39,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5254.3945, 5281.2915, 5276.0054, 5236.1274, 5252.9326, 5267.0737, 5285.079, 5269.3486, 5230.267, 5254.3823]
2025-09-16 13:17:39,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:17:39,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (5260.69) for latency 3
2025-09-16 13:17:39,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 24 minutes, 34 seconds)
2025-09-16 13:19:41,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:19:55,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4949.21338 ± 674.742
2025-09-16 13:19:55,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5264.8677, 5295.013, 5325.9043, 3089.9268, 5250.056, 4396.278, 5294.591, 5244.332, 5304.2314, 5026.9326]
2025-09-16 13:19:55,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 578.0, 1000.0, 837.0, 1000.0, 1000.0, 1000.0, 955.0]
2025-09-16 13:19:55,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 22 minutes, 43 seconds)
2025-09-16 13:21:47,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:22:02,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5219.34717 ± 44.991
2025-09-16 13:22:02,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5202.551, 5199.062, 5192.7095, 5214.3696, 5292.729, 5315.513, 5179.7227, 5192.6973, 5176.532, 5227.589]
2025-09-16 13:22:02,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:22:02,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 19 minutes, 51 seconds)
2025-09-16 13:24:07,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:24:22,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5246.83057 ± 19.903
2025-09-16 13:24:22,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5266.3105, 5241.907, 5233.2554, 5252.333, 5272.5635, 5221.917, 5241.5557, 5274.1865, 5252.786, 5211.494]
2025-09-16 13:24:22,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:24:23,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 18 minutes, 23 seconds)
2025-09-16 13:26:17,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:26:32,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5172.13525 ± 15.801
2025-09-16 13:26:32,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5149.8037, 5180.2974, 5151.7017, 5180.3027, 5154.716, 5174.4946, 5188.1143, 5200.497, 5176.814, 5164.61]
2025-09-16 13:26:32,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:26:32,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 15 minutes, 38 seconds)
2025-09-16 13:28:31,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:28:46,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5172.09668 ± 108.268
2025-09-16 13:28:46,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5217.995, 5219.6035, 5208.1987, 5206.1367, 5199.1333, 5201.399, 5226.352, 5179.367, 4849.44, 5213.3457]
2025-09-16 13:28:46,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:28:46,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 13 minutes, 25 seconds)
2025-09-16 13:30:44,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:30:59,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5172.00684 ± 25.605
2025-09-16 13:30:59,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5169.7925, 5174.9746, 5210.4663, 5147.291, 5157.4126, 5169.0615, 5135.83, 5225.079, 5167.2427, 5162.924]
2025-09-16 13:30:59,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:30:59,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 10 minutes, 53 seconds)
2025-09-16 13:32:57,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:33:12,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5151.40527 ± 8.529
2025-09-16 13:33:12,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5162.6577, 5158.293, 5142.4663, 5159.646, 5148.724, 5143.9272, 5155.743, 5133.9785, 5153.061, 5155.551]
2025-09-16 13:33:12,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:33:12,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 9 minutes, 13 seconds)
2025-09-16 13:35:13,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:35:20,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 2800.88379 ± 920.509
2025-09-16 13:35:20,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [2013.4866, 2368.0916, 2615.0557, 5266.846, 2229.4067, 2500.9043, 3225.3914, 2235.8477, 2214.16, 3339.647]
2025-09-16 13:35:20,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [397.0, 449.0, 497.0, 1000.0, 428.0, 474.0, 621.0, 436.0, 427.0, 636.0]
2025-09-16 13:35:20,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 5 minutes, 45 seconds)
2025-09-16 13:37:11,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:37:25,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4968.00879 ± 616.808
2025-09-16 13:37:25,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5177.9937, 5218.0933, 5199.2456, 5182.5566, 3121.7307, 5108.521, 5208.0317, 5118.41, 5118.7554, 5226.7446]
2025-09-16 13:37:25,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 608.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:37:25,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 3 minutes, 4 seconds)
2025-09-16 13:39:31,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:39:46,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5115.09863 ± 70.696
2025-09-16 13:39:46,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5133.51, 5132.33, 5161.5493, 4908.8735, 5159.1846, 5136.971, 5109.851, 5143.3105, 5112.8086, 5152.6]
2025-09-16 13:39:46,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:39:46,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 1 minute, 32 seconds)
2025-09-16 13:41:44,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:41:58,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5200.97412 ± 23.060
2025-09-16 13:41:58,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5209.8306, 5158.935, 5215.5923, 5201.233, 5176.5615, 5229.067, 5169.8774, 5217.5176, 5205.403, 5225.7275]
2025-09-16 13:41:58,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:41:58,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 59 minutes, 19 seconds)
2025-09-16 13:43:51,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:44:06,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5286.78369 ± 9.472
2025-09-16 13:44:06,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5282.8647, 5301.7974, 5288.717, 5279.5986, 5295.4976, 5291.165, 5273.609, 5297.0654, 5285.7607, 5271.762]
2025-09-16 13:44:06,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:44:06,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (5286.78) for latency 3
2025-09-16 13:44:06,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 56 minutes, 38 seconds)
2025-09-16 13:46:06,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:46:13,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 2558.28125 ± 1011.819
2025-09-16 13:46:13,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [3265.7612, 1611.9049, 1554.952, 3317.468, 4721.7, 2131.6653, 1220.1809, 2118.796, 2449.15, 3191.234]
2025-09-16 13:46:13,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [620.0, 303.0, 303.0, 670.0, 901.0, 418.0, 236.0, 452.0, 475.0, 613.0]
2025-09-16 13:46:13,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 54 minutes, 23 seconds)
2025-09-16 13:48:11,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:48:25,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4770.26807 ± 1233.758
2025-09-16 13:48:25,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5222.4873, 5188.3794, 5176.7476, 5190.125, 5179.0967, 5179.005, 5128.6367, 5167.093, 5201.481, 1069.6296]
2025-09-16 13:48:25,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 203.0]
2025-09-16 13:48:25,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 52 minutes, 48 seconds)
2025-09-16 13:50:23,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:50:37,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5181.49414 ± 46.350
2025-09-16 13:50:37,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5212.368, 5193.2217, 5239.6055, 5182.6055, 5191.4585, 5065.2637, 5198.3223, 5132.3765, 5195.6787, 5204.0405]
2025-09-16 13:50:37,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:50:37,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 49 minutes, 56 seconds)
2025-09-16 13:52:40,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:52:55,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5146.80176 ± 11.595
2025-09-16 13:52:55,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5156.269, 5121.365, 5150.0, 5149.347, 5142.9404, 5162.273, 5147.494, 5159.9473, 5144.0815, 5134.3022]
2025-09-16 13:52:55,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:52:55,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 48 minutes, 9 seconds)
2025-09-16 13:54:52,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:55:06,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5104.01709 ± 771.362
2025-09-16 13:55:06,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [2790.7683, 5372.0635, 5366.6294, 5359.5464, 5333.6216, 5355.1147, 5319.1006, 5379.4053, 5368.11, 5395.8135]
2025-09-16 13:55:06,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [524.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:55:06,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 46 minutes, 12 seconds)
2025-09-16 13:57:05,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:57:19,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5031.75977 ± 565.247
2025-09-16 13:57:19,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5254.1157, 5302.811, 5285.871, 5317.758, 3594.0667, 5295.604, 5345.651, 5348.463, 5271.4917, 4301.7666]
2025-09-16 13:57:19,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 676.0, 1000.0, 1000.0, 1000.0, 1000.0, 803.0]
2025-09-16 13:57:19,027 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 44 minutes, 22 seconds)
2025-09-16 13:59:16,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 13:59:31,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5000.78516 ± 59.624
2025-09-16 13:59:31,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [4940.8755, 5068.4185, 4980.6494, 4977.45, 4867.7666, 5063.1646, 4993.8613, 5040.773, 5021.379, 5053.5127]
2025-09-16 13:59:31,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 982.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 13:59:31,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 42 minutes, 11 seconds)
2025-09-16 14:01:29,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:01:43,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4900.12207 ± 753.940
2025-09-16 14:01:43,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5147.447, 5165.822, 5128.6196, 5180.512, 5156.7856, 2638.9062, 5149.8545, 5115.505, 5156.8696, 5160.898]
2025-09-16 14:01:43,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 514.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:01:43,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 39 minutes, 55 seconds)
2025-09-16 14:03:40,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:03:54,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4637.09766 ± 1252.606
2025-09-16 14:03:54,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5083.4565, 5052.7827, 5033.768, 5064.6543, 5061.7954, 879.6443, 5019.402, 5043.682, 5064.9346, 5066.8574]
2025-09-16 14:03:54,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 177.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:03:54,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 37 minutes, 22 seconds)
2025-09-16 14:05:46,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:06:00,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4928.90674 ± 550.250
2025-09-16 14:06:00,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5097.268, 3280.4822, 5084.103, 5127.4844, 5125.8545, 5122.4443, 5075.3906, 5069.06, 5166.9546, 5140.0273]
2025-09-16 14:06:00,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 668.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:06:00,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 34 minutes, 53 seconds)
2025-09-16 14:07:58,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:08:13,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5090.20801 ± 29.778
2025-09-16 14:08:13,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5088.819, 5067.8604, 5105.2183, 5098.1006, 5110.1265, 5102.703, 5117.0635, 5097.674, 5105.095, 5009.42]
2025-09-16 14:08:13,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:08:13,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 32 minutes, 44 seconds)
2025-09-16 14:10:11,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:10:27,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5104.81445 ± 31.905
2025-09-16 14:10:27,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5084.6475, 5081.899, 5086.189, 5114.993, 5054.7876, 5179.975, 5113.7217, 5100.9385, 5128.194, 5102.802]
2025-09-16 14:10:27,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:10:27,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 30 minutes, 35 seconds)
2025-09-16 14:12:24,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:12:39,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5324.26270 ± 12.181
2025-09-16 14:12:39,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5325.4937, 5328.316, 5318.3564, 5327.398, 5349.268, 5297.6357, 5316.4595, 5328.199, 5324.028, 5327.476]
2025-09-16 14:12:39,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:12:39,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (5324.26) for latency 3
2025-09-16 14:12:39,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 28 minutes, 26 seconds)
2025-09-16 14:14:37,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:14:52,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5248.51611 ± 21.190
2025-09-16 14:14:52,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5243.821, 5242.528, 5260.3647, 5238.3438, 5267.515, 5271.09, 5257.0586, 5244.681, 5265.4795, 5194.2725]
2025-09-16 14:14:52,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:14:52,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 26 minutes, 17 seconds)
2025-09-16 14:16:50,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:17:02,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4530.94824 ± 1355.817
2025-09-16 14:17:02,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [733.74146, 5199.2183, 4595.445, 5212.3433, 5213.2285, 5185.683, 5180.9106, 3604.921, 5177.0103, 5206.976]
2025-09-16 14:17:02,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [152.0, 1000.0, 873.0, 1000.0, 1000.0, 1000.0, 1000.0, 679.0, 1000.0, 1000.0]
2025-09-16 14:17:02,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 24 minutes, 16 seconds)
2025-09-16 14:19:00,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:19:14,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5126.01416 ± 169.640
2025-09-16 14:19:14,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5192.5396, 5181.5566, 5167.996, 5217.808, 5213.1807, 5181.691, 5037.757, 4640.1265, 5206.2017, 5221.28]
2025-09-16 14:19:14,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 887.0, 1000.0, 1000.0]
2025-09-16 14:19:14,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 22 minutes, 1 second)
2025-09-16 14:21:12,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:21:26,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4966.34180 ± 496.999
2025-09-16 14:21:26,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [3486.6626, 5074.436, 5158.7197, 5131.5864, 5176.73, 4995.401, 5158.335, 5230.7495, 5158.224, 5092.5757]
2025-09-16 14:21:26,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [729.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:21:26,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 19 minutes, 47 seconds)
2025-09-16 14:23:24,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:23:39,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5244.57324 ± 18.008
2025-09-16 14:23:39,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5268.305, 5240.7603, 5256.905, 5245.1763, 5214.7026, 5241.2344, 5261.7563, 5214.0693, 5263.4565, 5239.3643]
2025-09-16 14:23:39,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:23:39,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 17 minutes, 35 seconds)
2025-09-16 14:25:37,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:25:51,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5340.64600 ± 22.882
2025-09-16 14:25:51,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5372.181, 5324.875, 5345.795, 5356.4624, 5323.688, 5298.415, 5355.665, 5372.887, 5322.2334, 5334.2583]
2025-09-16 14:25:51,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:25:51,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (5340.65) for latency 3
2025-09-16 14:25:51,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 15 minutes, 22 seconds)
2025-09-16 14:27:49,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:28:04,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5340.79834 ± 33.973
2025-09-16 14:28:04,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5369.2764, 5350.9727, 5319.4517, 5299.6904, 5365.826, 5268.6016, 5342.1914, 5339.997, 5385.04, 5366.937]
2025-09-16 14:28:04,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:28:04,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (5340.80) for latency 3
2025-09-16 14:28:04,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 13 minutes, 13 seconds)
2025-09-16 14:30:02,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:30:18,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5069.81836 ± 23.089
2025-09-16 14:30:18,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5059.92, 5086.599, 5004.9004, 5080.6123, 5078.014, 5079.0312, 5089.4824, 5068.9004, 5072.2188, 5078.5034]
2025-09-16 14:30:18,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:30:18,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 11 minutes, 3 seconds)
2025-09-16 14:32:16,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:32:31,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5236.04004 ± 17.819
2025-09-16 14:32:31,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5240.003, 5259.6616, 5213.9165, 5225.422, 5231.68, 5238.869, 5211.998, 5271.593, 5241.3267, 5225.933]
2025-09-16 14:32:31,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:32:31,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes, 52 seconds)
2025-09-16 14:34:29,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:34:43,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5362.92627 ± 10.583
2025-09-16 14:34:43,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5359.38, 5345.74, 5379.942, 5364.025, 5375.829, 5361.241, 5366.055, 5351.4688, 5373.0605, 5352.5205]
2025-09-16 14:34:43,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:34:43,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (5362.93) for latency 3
2025-09-16 14:34:43,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 38 seconds)
2025-09-16 14:36:36,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:36:49,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4897.04541 ± 1127.956
2025-09-16 14:36:49,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5253.9214, 5244.4023, 5230.9785, 5281.3604, 1514.4972, 5274.998, 5284.551, 5350.126, 5248.1323, 5287.486]
2025-09-16 14:36:49,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 290.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:36:50,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 23 seconds)
2025-09-16 14:38:50,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:39:01,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4328.59473 ± 1783.380
2025-09-16 14:39:01,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5235.055, 5217.885, 5196.838, 5247.5674, 678.05176, 5208.6587, 5173.628, 5242.6606, 847.7521, 5237.852]
2025-09-16 14:39:01,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 151.0, 1000.0, 1000.0, 1000.0, 156.0, 1000.0]
2025-09-16 14:39:01,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 11 seconds)
2025-09-16 14:41:02,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 3...
2025-09-16 14:41:16,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5122.83887 ± 14.807
2025-09-16 14:41:16,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5120.4097, 5143.6787, 5125.574, 5147.122, 5117.0527, 5129.8853, 5116.057, 5092.5107, 5123.5386, 5112.5615]
2025-09-16 14:41:16,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:41:16,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1251 [DEBUG]: Training session finished
