2025-09-16 12:14:43,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1108 [DEBUG]: logdir: _logs/noise-eval-v2/humanoid/bpql-noise_0.075-delay_12
2025-09-16 12:14:43,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1109 [DEBUG]: trainer_prefix: noise-eval-v2/humanoid/bpql-noise_0.075-delay_12
2025-09-16 12:14:43,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1110 [DEBUG]: args.trainer_eval_latencies: {'12': <latency_env.delayed_mdp.ConstantDelay object at 0x14d393468890>}
2025-09-16 12:14:43,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1111 [DEBUG]: using device: cuda
2025-09-16 12:14:43,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1133 [INFO]: Creating new trainer
2025-09-16 12:14:43,377 baseline-bpql-noisepromille75-humanoid:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=580, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=17, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(17,))
  )
  (tanh_refit): NNTanhRefit(
    scale: tensor([[0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000,
             0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000, 0.8000]]), shift: tensor([[-0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000, -0.4000,
             -0.4000]])
  )
)
2025-09-16 12:14:43,377 baseline-bpql-noisepromille75-humanoid:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=393, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 12:14:45,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1194 [DEBUG]: Starting training session...
2025-09-16 12:14:45,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 1/100
2025-09-16 12:16:28,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:16:29,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 152.42664 ± 13.983
2025-09-16 12:16:29,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [145.35466, 179.2242, 165.24936, 148.5603, 140.57613, 173.79039, 144.13174, 139.84068, 147.88696, 139.6519]
2025-09-16 12:16:29,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [30.0, 38.0, 36.0, 31.0, 29.0, 37.0, 30.0, 29.0, 29.0, 29.0]
2025-09-16 12:16:29,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (152.43) for latency 12
2025-09-16 12:16:29,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 51 minutes, 33 seconds)
2025-09-16 12:18:21,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:18:22,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 366.04547 ± 81.508
2025-09-16 12:18:22,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [338.65164, 342.7139, 353.00092, 160.178, 377.2813, 479.62012, 452.70078, 384.15115, 364.5914, 407.56564]
2025-09-16 12:18:22,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [63.0, 63.0, 66.0, 31.0, 78.0, 91.0, 86.0, 82.0, 69.0, 76.0]
2025-09-16 12:18:22,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (366.05) for latency 12
2025-09-16 12:18:22,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 57 minutes, 38 seconds)
2025-09-16 12:20:14,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:20:15,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 398.77985 ± 87.050
2025-09-16 12:20:15,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [335.32257, 619.1126, 409.7095, 465.73502, 296.4073, 399.59186, 415.54422, 335.3977, 349.38535, 361.5922]
2025-09-16 12:20:15,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [69.0, 131.0, 77.0, 97.0, 59.0, 77.0, 83.0, 67.0, 70.0, 76.0]
2025-09-16 12:20:15,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (398.78) for latency 12
2025-09-16 12:20:15,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 58 minutes, 20 seconds)
2025-09-16 12:22:10,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:22:11,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 432.13217 ± 80.854
2025-09-16 12:22:11,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [570.20276, 334.18668, 346.6398, 432.77252, 429.52267, 326.04498, 427.63092, 563.6284, 460.9304, 429.76266]
2025-09-16 12:22:11,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [113.0, 72.0, 65.0, 81.0, 90.0, 68.0, 80.0, 107.0, 87.0, 81.0]
2025-09-16 12:22:11,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (432.13) for latency 12
2025-09-16 12:22:11,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 58 minutes, 37 seconds)
2025-09-16 12:24:04,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:24:05,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 418.89072 ± 97.639
2025-09-16 12:24:05,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [402.62378, 582.4136, 278.66797, 381.8552, 381.3008, 429.2348, 275.03897, 497.5482, 555.60535, 404.61865]
2025-09-16 12:24:05,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [76.0, 113.0, 55.0, 72.0, 74.0, 81.0, 54.0, 110.0, 110.0, 76.0]
2025-09-16 12:24:05,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 57 minutes, 36 seconds)
2025-09-16 12:26:00,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:26:00,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 351.93057 ± 100.643
2025-09-16 12:26:00,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [229.14119, 322.11743, 335.66232, 295.9988, 297.21713, 475.72336, 207.34172, 411.30676, 543.6617, 401.13535]
2025-09-16 12:26:00,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [45.0, 61.0, 63.0, 56.0, 58.0, 88.0, 39.0, 79.0, 104.0, 75.0]
2025-09-16 12:26:00,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 59 minutes, 13 seconds)
2025-09-16 12:27:53,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:27:54,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 451.46356 ± 79.263
2025-09-16 12:27:54,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [510.37585, 539.9927, 344.51474, 499.50754, 511.33545, 455.31073, 338.97623, 539.3041, 440.86258, 334.45563]
2025-09-16 12:27:54,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [96.0, 103.0, 71.0, 91.0, 108.0, 89.0, 67.0, 103.0, 94.0, 60.0]
2025-09-16 12:27:54,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (451.46) for latency 12
2025-09-16 12:27:54,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 57 minutes, 25 seconds)
2025-09-16 12:29:48,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:29:49,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 445.57318 ± 123.789
2025-09-16 12:29:49,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [373.28467, 389.37735, 503.42654, 450.88983, 591.0262, 488.42224, 468.72714, 444.83197, 141.1633, 604.58264]
2025-09-16 12:29:49,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [70.0, 74.0, 94.0, 84.0, 121.0, 93.0, 87.0, 84.0, 27.0, 115.0]
2025-09-16 12:29:49,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 55 minutes, 53 seconds)
2025-09-16 12:31:44,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:31:45,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 417.82983 ± 113.862
2025-09-16 12:31:45,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [491.79492, 329.1675, 413.66876, 552.4298, 482.6113, 558.4102, 416.20532, 156.33437, 343.25174, 434.42468]
2025-09-16 12:31:45,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 71.0, 80.0, 108.0, 103.0, 104.0, 89.0, 30.0, 75.0, 81.0]
2025-09-16 12:31:45,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 53 minutes, 58 seconds)
2025-09-16 12:33:39,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:33:40,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 442.64127 ± 158.919
2025-09-16 12:33:40,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [166.38539, 436.6099, 597.73047, 614.8086, 699.3117, 458.62143, 217.79166, 448.75037, 372.14355, 414.25955]
2025-09-16 12:33:40,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [32.0, 84.0, 128.0, 118.0, 134.0, 86.0, 42.0, 82.0, 71.0, 78.0]
2025-09-16 12:33:40,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 52 minutes, 20 seconds)
2025-09-16 12:35:33,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:35:35,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 468.78311 ± 149.968
2025-09-16 12:35:35,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [475.4193, 412.56802, 416.16312, 513.1572, 452.85336, 134.16797, 668.6157, 469.87286, 425.3831, 719.6306]
2025-09-16 12:35:35,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 87.0, 81.0, 95.0, 86.0, 26.0, 126.0, 99.0, 81.0, 137.0]
2025-09-16 12:35:35,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (468.78) for latency 12
2025-09-16 12:35:35,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 50 minutes, 18 seconds)
2025-09-16 12:37:28,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:37:29,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 487.30209 ± 95.124
2025-09-16 12:37:29,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [375.13248, 433.9402, 504.6876, 514.1485, 711.0097, 546.6414, 438.688, 529.1825, 363.2393, 456.35165]
2025-09-16 12:37:29,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [69.0, 80.0, 95.0, 97.0, 134.0, 103.0, 82.0, 102.0, 79.0, 88.0]
2025-09-16 12:37:29,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (487.30) for latency 12
2025-09-16 12:37:30,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 48 minutes, 41 seconds)
2025-09-16 12:39:23,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:39:25,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 457.27280 ± 193.232
2025-09-16 12:39:25,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [874.34937, 698.56, 325.51675, 493.50665, 425.81485, 134.05814, 409.75647, 427.67917, 339.9691, 443.51776]
2025-09-16 12:39:25,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [183.0, 148.0, 69.0, 106.0, 78.0, 26.0, 90.0, 93.0, 62.0, 93.0]
2025-09-16 12:39:25,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 46 minutes, 57 seconds)
2025-09-16 12:41:19,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:41:21,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 568.40784 ± 221.518
2025-09-16 12:41:21,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [888.4397, 320.4977, 463.2258, 438.17654, 577.2491, 420.1361, 970.64874, 516.0122, 780.51935, 309.17346]
2025-09-16 12:41:21,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [168.0, 70.0, 90.0, 80.0, 127.0, 90.0, 188.0, 110.0, 148.0, 66.0]
2025-09-16 12:41:21,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (568.41) for latency 12
2025-09-16 12:41:21,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 45 minutes, 7 seconds)
2025-09-16 12:43:15,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:43:16,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 544.63074 ± 142.249
2025-09-16 12:43:16,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [666.57996, 662.3259, 599.68365, 414.2449, 392.00906, 699.1929, 769.9766, 446.51102, 436.2176, 359.5655]
2025-09-16 12:43:16,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [123.0, 130.0, 115.0, 86.0, 85.0, 126.0, 165.0, 84.0, 79.0, 67.0]
2025-09-16 12:43:16,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 43 minutes, 16 seconds)
2025-09-16 12:45:10,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:45:11,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 516.92297 ± 99.311
2025-09-16 12:45:11,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [521.7391, 463.86566, 369.04553, 625.57184, 447.1155, 523.3148, 414.50275, 512.42255, 563.78796, 727.864]
2025-09-16 12:45:11,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [110.0, 85.0, 68.0, 134.0, 94.0, 98.0, 76.0, 112.0, 103.0, 141.0]
2025-09-16 12:45:11,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 41 minutes, 32 seconds)
2025-09-16 12:47:06,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:47:08,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 549.31073 ± 109.046
2025-09-16 12:47:08,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [580.1391, 430.32077, 615.0831, 373.73965, 609.4447, 593.5929, 765.1618, 529.30597, 424.51282, 571.8068]
2025-09-16 12:47:08,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [123.0, 80.0, 115.0, 70.0, 113.0, 129.0, 163.0, 100.0, 79.0, 118.0]
2025-09-16 12:47:08,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 39 minutes, 57 seconds)
2025-09-16 12:49:03,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:49:04,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 476.46329 ± 94.970
2025-09-16 12:49:04,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [340.47617, 625.56384, 422.6095, 496.48306, 435.3344, 596.99915, 428.45593, 352.0154, 586.66205, 480.03333]
2025-09-16 12:49:04,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [63.0, 135.0, 78.0, 108.0, 80.0, 110.0, 81.0, 73.0, 107.0, 89.0]
2025-09-16 12:49:04,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 38 minutes, 22 seconds)
2025-09-16 12:50:59,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:51:01,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 486.38055 ± 81.479
2025-09-16 12:51:01,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [531.5403, 497.80328, 587.68286, 440.85944, 357.21484, 597.48663, 546.4184, 397.7461, 385.56726, 521.4868]
2025-09-16 12:51:01,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [110.0, 101.0, 130.0, 90.0, 68.0, 109.0, 100.0, 73.0, 73.0, 97.0]
2025-09-16 12:51:01,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 36 minutes, 35 seconds)
2025-09-16 12:52:55,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:52:57,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 465.13290 ± 61.318
2025-09-16 12:52:57,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [577.0551, 477.5756, 432.0779, 422.82272, 430.91327, 441.69098, 552.32336, 522.94745, 390.84747, 403.07532]
2025-09-16 12:52:57,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [110.0, 87.0, 91.0, 79.0, 80.0, 81.0, 107.0, 98.0, 72.0, 74.0]
2025-09-16 12:52:57,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 34 minutes, 47 seconds)
2025-09-16 12:54:52,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:54:54,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 506.56683 ± 137.404
2025-09-16 12:54:54,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [509.38846, 444.56863, 370.44775, 304.46204, 676.4572, 529.0907, 800.26294, 445.3552, 438.87143, 546.76434]
2025-09-16 12:54:54,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 94.0, 68.0, 65.0, 144.0, 111.0, 151.0, 94.0, 99.0, 103.0]
2025-09-16 12:54:54,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 33 minutes, 18 seconds)
2025-09-16 12:56:48,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:56:49,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 596.37256 ± 237.426
2025-09-16 12:56:49,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1211.2991, 830.6182, 538.7245, 596.82007, 355.52216, 444.9411, 465.01044, 448.9313, 544.27057, 527.5881]
2025-09-16 12:56:49,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [240.0, 156.0, 103.0, 122.0, 67.0, 85.0, 98.0, 82.0, 118.0, 99.0]
2025-09-16 12:56:49,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (596.37) for latency 12
2025-09-16 12:56:50,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 31 minutes, 17 seconds)
2025-09-16 12:58:45,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 12:58:46,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 608.95374 ± 206.402
2025-09-16 12:58:46,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [405.74695, 356.87704, 469.9877, 632.38324, 945.2886, 970.6175, 739.63983, 574.89984, 581.0579, 413.0381]
2025-09-16 12:58:46,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [77.0, 78.0, 92.0, 124.0, 172.0, 194.0, 134.0, 111.0, 124.0, 75.0]
2025-09-16 12:58:46,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (608.95) for latency 12
2025-09-16 12:58:46,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 29 minutes, 22 seconds)
2025-09-16 13:00:42,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:00:44,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 573.31567 ± 126.224
2025-09-16 13:00:44,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [572.5115, 519.2617, 601.51215, 742.87555, 517.29877, 719.72485, 579.2855, 719.89746, 436.6268, 324.16272]
2025-09-16 13:00:44,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [116.0, 97.0, 128.0, 153.0, 93.0, 131.0, 108.0, 153.0, 95.0, 62.0]
2025-09-16 13:00:44,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 27 minutes, 39 seconds)
2025-09-16 13:02:39,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:02:40,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 574.14630 ± 96.920
2025-09-16 13:02:40,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [555.9485, 588.3389, 596.6369, 437.87326, 493.22382, 516.18164, 581.55963, 823.5081, 541.0715, 607.1209]
2025-09-16 13:02:40,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [108.0, 115.0, 122.0, 82.0, 89.0, 98.0, 106.0, 160.0, 100.0, 113.0]
2025-09-16 13:02:40,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 25 minutes, 53 seconds)
2025-09-16 13:04:36,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:04:38,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 578.51245 ± 126.397
2025-09-16 13:04:38,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [521.343, 607.9518, 329.47574, 590.5516, 603.2987, 640.42035, 398.81024, 737.6883, 601.39435, 754.1901]
2025-09-16 13:04:38,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [95.0, 125.0, 71.0, 110.0, 122.0, 131.0, 74.0, 139.0, 113.0, 144.0]
2025-09-16 13:04:38,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 24 minutes, 2 seconds)
2025-09-16 13:06:32,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:06:33,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 577.09235 ± 131.601
2025-09-16 13:06:33,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [457.4595, 844.9542, 413.87598, 776.18744, 483.7533, 591.10345, 503.40973, 569.4167, 509.80978, 620.9537]
2025-09-16 13:06:33,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 160.0, 86.0, 145.0, 103.0, 114.0, 101.0, 107.0, 101.0, 133.0]
2025-09-16 13:06:33,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 22 minutes)
2025-09-16 13:08:29,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:08:31,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 609.98114 ± 70.423
2025-09-16 13:08:31,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [682.44055, 504.80356, 694.5774, 729.3287, 559.1059, 526.70404, 587.55286, 598.6763, 637.6422, 578.9802]
2025-09-16 13:08:31,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [130.0, 91.0, 133.0, 140.0, 104.0, 95.0, 117.0, 122.0, 121.0, 113.0]
2025-09-16 13:08:31,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (609.98) for latency 12
2025-09-16 13:08:31,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 20 minutes, 16 seconds)
2025-09-16 13:10:25,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:10:27,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 701.84045 ± 228.682
2025-09-16 13:10:27,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [587.0881, 867.3645, 602.4028, 638.4055, 655.97595, 384.867, 608.27136, 656.59503, 719.96747, 1297.4661]
2025-09-16 13:10:27,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [107.0, 161.0, 116.0, 133.0, 123.0, 85.0, 117.0, 123.0, 130.0, 269.0]
2025-09-16 13:10:27,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (701.84) for latency 12
2025-09-16 13:10:27,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 18 minutes, 10 seconds)
2025-09-16 13:12:23,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:12:25,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 684.01160 ± 283.211
2025-09-16 13:12:25,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [435.17667, 548.37054, 714.939, 900.7438, 523.0393, 1207.8146, 1135.8833, 376.33212, 443.34973, 554.46735]
2025-09-16 13:12:25,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [94.0, 108.0, 126.0, 182.0, 109.0, 238.0, 221.0, 73.0, 83.0, 104.0]
2025-09-16 13:12:25,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 16 minutes, 29 seconds)
2025-09-16 13:14:20,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:14:22,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 648.12354 ± 211.789
2025-09-16 13:14:22,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [446.31793, 869.3439, 410.6435, 465.49026, 977.6726, 603.07117, 505.00684, 458.3338, 848.74945, 896.6058]
2025-09-16 13:14:22,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [82.0, 169.0, 77.0, 87.0, 202.0, 128.0, 93.0, 91.0, 167.0, 182.0]
2025-09-16 13:14:22,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 14 minutes, 25 seconds)
2025-09-16 13:16:17,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:16:19,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 697.56702 ± 171.009
2025-09-16 13:16:19,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [754.282, 603.3781, 715.5758, 1178.0979, 629.5358, 672.2371, 563.97974, 563.6229, 686.50604, 608.45526]
2025-09-16 13:16:19,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [153.0, 124.0, 145.0, 230.0, 131.0, 125.0, 102.0, 105.0, 128.0, 129.0]
2025-09-16 13:16:19,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 12 minutes, 49 seconds)
2025-09-16 13:18:14,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:18:16,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 670.82727 ± 123.120
2025-09-16 13:18:16,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [642.01337, 536.63245, 932.33307, 443.8555, 698.2806, 727.9727, 635.02814, 724.8654, 729.99084, 637.3004]
2025-09-16 13:18:16,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [130.0, 110.0, 188.0, 81.0, 135.0, 151.0, 115.0, 144.0, 149.0, 117.0]
2025-09-16 13:18:16,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 10 minutes, 45 seconds)
2025-09-16 13:20:11,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:20:13,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 673.00250 ± 124.510
2025-09-16 13:20:13,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [790.83246, 776.8677, 637.8955, 784.59625, 878.95795, 591.73016, 571.5985, 620.9728, 442.4709, 634.10284]
2025-09-16 13:20:13,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [148.0, 146.0, 119.0, 143.0, 178.0, 110.0, 106.0, 111.0, 81.0, 119.0]
2025-09-16 13:20:13,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 8 minutes, 54 seconds)
2025-09-16 13:22:10,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:22:12,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 750.19495 ± 97.736
2025-09-16 13:22:12,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [763.13727, 769.0552, 866.3874, 761.76685, 722.1851, 826.948, 638.64087, 618.2742, 916.4787, 619.076]
2025-09-16 13:22:12,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [157.0, 168.0, 166.0, 153.0, 146.0, 159.0, 119.0, 114.0, 168.0, 116.0]
2025-09-16 13:22:12,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (750.19) for latency 12
2025-09-16 13:22:12,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 7 minutes, 9 seconds)
2025-09-16 13:24:07,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:24:09,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 722.80017 ± 308.605
2025-09-16 13:24:09,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [491.5488, 647.1798, 607.84515, 746.55164, 668.1207, 393.09604, 794.73413, 1571.2281, 525.0335, 782.6633]
2025-09-16 13:24:09,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [92.0, 123.0, 124.0, 138.0, 124.0, 72.0, 152.0, 324.0, 103.0, 161.0]
2025-09-16 13:24:09,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 5 minutes, 11 seconds)
2025-09-16 13:26:04,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:26:06,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 626.15948 ± 155.511
2025-09-16 13:26:06,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [450.88263, 453.1572, 845.1212, 630.5746, 452.3253, 552.73285, 903.8251, 727.7663, 551.31323, 693.8967]
2025-09-16 13:26:06,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [93.0, 80.0, 157.0, 114.0, 87.0, 102.0, 203.0, 157.0, 101.0, 126.0]
2025-09-16 13:26:06,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 3 minutes, 11 seconds)
2025-09-16 13:28:03,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:28:05,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 693.48810 ± 183.070
2025-09-16 13:28:05,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [638.9022, 811.0625, 980.1628, 741.74097, 851.4167, 639.09894, 538.80414, 475.25635, 880.33344, 378.10254]
2025-09-16 13:28:05,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [140.0, 149.0, 188.0, 142.0, 156.0, 121.0, 100.0, 89.0, 158.0, 68.0]
2025-09-16 13:28:05,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 1 minute, 45 seconds)
2025-09-16 13:30:00,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:30:02,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 833.44464 ± 214.793
2025-09-16 13:30:02,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1071.6536, 675.1713, 1051.7058, 809.712, 791.17206, 698.54694, 836.37054, 1247.1464, 512.58075, 640.38745]
2025-09-16 13:30:02,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [205.0, 134.0, 200.0, 155.0, 148.0, 131.0, 155.0, 255.0, 91.0, 127.0]
2025-09-16 13:30:02,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (833.44) for latency 12
2025-09-16 13:30:02,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 59 minutes, 41 seconds)
2025-09-16 13:31:58,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:32:01,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 830.86700 ± 234.937
2025-09-16 13:32:01,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [780.69946, 799.51056, 732.8831, 1195.7914, 466.03015, 979.9166, 800.6713, 1031.9333, 445.05917, 1076.175]
2025-09-16 13:32:01,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [146.0, 154.0, 157.0, 233.0, 85.0, 193.0, 159.0, 191.0, 89.0, 209.0]
2025-09-16 13:32:01,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 57 minutes, 43 seconds)
2025-09-16 13:33:57,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:33:59,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 846.38116 ± 311.965
2025-09-16 13:33:59,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [452.19528, 1097.3914, 1309.8899, 665.5584, 1202.2319, 890.7501, 766.9667, 1138.6631, 513.4436, 426.72192]
2025-09-16 13:33:59,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [83.0, 222.0, 275.0, 138.0, 227.0, 172.0, 141.0, 209.0, 96.0, 78.0]
2025-09-16 13:33:59,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (846.38) for latency 12
2025-09-16 13:33:59,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 56 minutes, 6 seconds)
2025-09-16 13:35:54,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:35:56,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 831.81836 ± 162.920
2025-09-16 13:35:56,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [806.694, 1028.1428, 956.77374, 703.4026, 823.39, 528.5628, 851.8289, 840.55695, 1107.3647, 671.46655]
2025-09-16 13:35:56,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [172.0, 202.0, 204.0, 146.0, 152.0, 97.0, 158.0, 157.0, 211.0, 128.0]
2025-09-16 13:35:56,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 54 minutes, 5 seconds)
2025-09-16 13:37:52,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:37:54,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 797.77130 ± 290.918
2025-09-16 13:37:54,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [808.90845, 1115.5955, 334.3695, 971.91846, 588.6369, 1004.8579, 384.2169, 737.3002, 747.9884, 1283.9211]
2025-09-16 13:37:54,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [151.0, 206.0, 62.0, 177.0, 116.0, 180.0, 69.0, 136.0, 153.0, 253.0]
2025-09-16 13:37:54,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 51 minutes, 54 seconds)
2025-09-16 13:39:50,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:39:52,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1073.94800 ± 380.837
2025-09-16 13:39:52,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [664.98627, 832.77045, 1023.59393, 558.7875, 1143.7074, 818.847, 1872.3054, 1349.3379, 992.29376, 1482.8494]
2025-09-16 13:39:52,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [123.0, 158.0, 191.0, 102.0, 218.0, 146.0, 345.0, 249.0, 185.0, 274.0]
2025-09-16 13:39:52,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (1073.95) for latency 12
2025-09-16 13:39:52,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 50 minutes, 12 seconds)
2025-09-16 13:41:48,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:41:51,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 922.29620 ± 448.468
2025-09-16 13:41:51,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [858.82874, 1044.3384, 1886.9813, 818.8827, 269.6104, 1535.2324, 674.7196, 667.2307, 894.38336, 572.75385]
2025-09-16 13:41:51,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [159.0, 208.0, 350.0, 155.0, 54.0, 318.0, 120.0, 141.0, 176.0, 119.0]
2025-09-16 13:41:51,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 48 minutes, 10 seconds)
2025-09-16 13:43:49,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:43:52,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1236.09424 ± 833.943
2025-09-16 13:43:52,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [777.20056, 658.02234, 1003.3853, 1226.3468, 919.3645, 1527.9197, 709.13654, 3630.58, 966.34735, 942.6387]
2025-09-16 13:43:52,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [143.0, 121.0, 200.0, 237.0, 170.0, 296.0, 146.0, 711.0, 180.0, 178.0]
2025-09-16 13:43:52,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (1236.09) for latency 12
2025-09-16 13:43:52,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 46 minutes, 46 seconds)
2025-09-16 13:45:46,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:45:49,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 963.97229 ± 250.951
2025-09-16 13:45:49,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1197.6571, 787.5642, 830.2557, 846.38855, 1563.2823, 685.6718, 1103.1781, 729.3936, 968.177, 928.15436]
2025-09-16 13:45:49,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [223.0, 151.0, 160.0, 157.0, 313.0, 126.0, 210.0, 139.0, 191.0, 169.0]
2025-09-16 13:45:49,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 44 minutes, 45 seconds)
2025-09-16 13:47:49,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:47:52,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1178.90491 ± 674.675
2025-09-16 13:47:52,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [326.0551, 1132.1904, 1320.8713, 2762.7659, 510.17163, 1035.2998, 1034.3909, 523.5909, 1389.3809, 1754.3325]
2025-09-16 13:47:52,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [61.0, 219.0, 238.0, 538.0, 89.0, 211.0, 191.0, 92.0, 251.0, 322.0]
2025-09-16 13:47:52,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 43 minutes, 37 seconds)
2025-09-16 13:49:48,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:49:52,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1392.51135 ± 800.153
2025-09-16 13:49:52,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1032.3903, 3692.5715, 1548.2393, 703.5646, 1330.6542, 1017.2944, 1102.3268, 1167.2906, 1389.9738, 940.8071]
2025-09-16 13:49:52,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [192.0, 715.0, 289.0, 128.0, 261.0, 196.0, 206.0, 211.0, 259.0, 176.0]
2025-09-16 13:49:52,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (1392.51) for latency 12
2025-09-16 13:49:52,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 41 minutes, 53 seconds)
2025-09-16 13:51:45,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:51:49,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1376.09912 ± 791.090
2025-09-16 13:51:49,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1360.0573, 1520.2609, 1357.5681, 770.8448, 919.47644, 1909.8851, 1590.7131, 623.55835, 386.77768, 3321.8506]
2025-09-16 13:51:49,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [254.0, 286.0, 241.0, 153.0, 169.0, 346.0, 277.0, 109.0, 67.0, 631.0]
2025-09-16 13:51:49,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 39 minutes, 39 seconds)
2025-09-16 13:53:49,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:53:54,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1778.82947 ± 1303.655
2025-09-16 13:53:54,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1487.4312, 1030.7518, 1857.176, 746.53687, 2640.6382, 5256.244, 1681.5968, 429.74713, 1652.9181, 1005.254]
2025-09-16 13:53:54,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [292.0, 186.0, 342.0, 131.0, 501.0, 984.0, 309.0, 76.0, 296.0, 189.0]
2025-09-16 13:53:54,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (1778.83) for latency 12
2025-09-16 13:53:54,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 38 minutes, 16 seconds)
2025-09-16 13:55:48,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:55:51,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1301.55115 ± 523.242
2025-09-16 13:55:51,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [762.77466, 1762.3851, 1213.1301, 608.6379, 1427.9952, 2312.8115, 586.6339, 1741.2247, 1345.6122, 1254.3063]
2025-09-16 13:55:51,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [140.0, 314.0, 242.0, 114.0, 271.0, 429.0, 105.0, 325.0, 245.0, 231.0]
2025-09-16 13:55:51,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 36 minutes, 24 seconds)
2025-09-16 13:57:50,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:57:55,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 1719.33594 ± 738.342
2025-09-16 13:57:55,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [2205.1619, 2518.0203, 1057.9952, 2596.0515, 536.2029, 616.867, 1466.433, 1596.005, 2267.9727, 2332.6504]
2025-09-16 13:57:55,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [431.0, 464.0, 215.0, 501.0, 102.0, 112.0, 272.0, 298.0, 442.0, 477.0]
2025-09-16 13:57:55,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 34 minutes, 29 seconds)
2025-09-16 13:59:50,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 13:59:56,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 2435.65088 ± 944.790
2025-09-16 13:59:56,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [2560.228, 1465.5801, 2661.379, 969.37305, 2521.6826, 2550.9934, 1696.3948, 4489.857, 3360.6443, 2080.3745]
2025-09-16 13:59:56,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [468.0, 289.0, 487.0, 192.0, 449.0, 459.0, 328.0, 833.0, 608.0, 370.0]
2025-09-16 13:59:56,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (2435.65) for latency 12
2025-09-16 13:59:56,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 32 minutes, 41 seconds)
2025-09-16 14:01:51,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:01:58,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 2490.10547 ± 1868.022
2025-09-16 14:01:58,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1908.6356, 1869.4385, 5555.5454, 385.3253, 708.9716, 4356.3774, 2231.7424, 694.345, 5589.668, 1601.0034]
2025-09-16 14:01:58,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [393.0, 353.0, 1000.0, 71.0, 126.0, 793.0, 398.0, 152.0, 1000.0, 287.0]
2025-09-16 14:01:58,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (2490.11) for latency 12
2025-09-16 14:01:58,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 31 minutes, 23 seconds)
2025-09-16 14:03:57,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:04:05,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 2771.89136 ± 1656.997
2025-09-16 14:04:05,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [4334.6626, 4734.6704, 352.90952, 4321.561, 2588.8735, 1614.1675, 929.5266, 2018.2617, 5249.6343, 1574.6469]
2025-09-16 14:04:05,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [824.0, 929.0, 68.0, 834.0, 494.0, 300.0, 168.0, 393.0, 1000.0, 324.0]
2025-09-16 14:04:05,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (2771.89) for latency 12
2025-09-16 14:04:05,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 29 minutes, 35 seconds)
2025-09-16 14:06:02,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:06:15,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4433.60400 ± 1603.934
2025-09-16 14:06:15,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5376.1343, 5413.0337, 955.74054, 1659.7411, 5367.124, 5368.9766, 5256.8047, 5222.31, 4274.1016, 5442.0723]
2025-09-16 14:06:15,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 197.0, 319.0, 1000.0, 1000.0, 1000.0, 1000.0, 796.0, 1000.0]
2025-09-16 14:06:15,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (4433.60) for latency 12
2025-09-16 14:06:15,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 29 minutes, 19 seconds)
2025-09-16 14:08:10,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:08:17,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 2810.71802 ± 1720.953
2025-09-16 14:08:17,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5264.0317, 1010.4258, 1030.0901, 2285.8586, 1204.2241, 3340.2505, 1499.8954, 5309.737, 2016.3069, 5146.3594]
2025-09-16 14:08:17,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 214.0, 201.0, 419.0, 216.0, 629.0, 291.0, 998.0, 386.0, 1000.0]
2025-09-16 14:08:17,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 27 minutes, 6 seconds)
2025-09-16 14:10:18,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:10:27,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 3216.26050 ± 1847.880
2025-09-16 14:10:27,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5388.97, 534.2711, 5361.333, 5161.6704, 2327.0369, 1052.6769, 2525.9612, 1445.3517, 2979.6326, 5385.7]
2025-09-16 14:10:27,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 107.0, 1000.0, 956.0, 433.0, 184.0, 458.0, 269.0, 542.0, 1000.0]
2025-09-16 14:10:27,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 26 minutes, 10 seconds)
2025-09-16 14:12:31,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:12:36,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 2070.41895 ± 927.046
2025-09-16 14:12:36,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1388.3099, 2499.9268, 2547.428, 1192.7579, 1895.1879, 3620.7832, 1958.4775, 1563.2333, 3474.5813, 563.5025]
2025-09-16 14:12:36,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [263.0, 470.0, 469.0, 215.0, 340.0, 685.0, 358.0, 286.0, 632.0, 108.0]
2025-09-16 14:12:36,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 25 minutes, 8 seconds)
2025-09-16 14:14:26,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:14:35,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 3301.18701 ± 1713.548
2025-09-16 14:14:35,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1850.2957, 1874.7609, 5339.898, 5236.0093, 2842.2334, 5270.2505, 5234.2705, 2157.435, 502.27765, 2704.4429]
2025-09-16 14:14:35,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [344.0, 365.0, 1000.0, 1000.0, 547.0, 1000.0, 1000.0, 413.0, 88.0, 510.0]
2025-09-16 14:14:35,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 21 minutes, 54 seconds)
2025-09-16 14:16:37,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:16:46,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 3225.22607 ± 1685.106
2025-09-16 14:16:46,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [997.0717, 748.35767, 2096.089, 4405.8105, 5238.7275, 5272.6626, 2124.3845, 5270.846, 2369.8325, 3728.478]
2025-09-16 14:16:46,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [193.0, 159.0, 381.0, 797.0, 1000.0, 1000.0, 402.0, 1000.0, 441.0, 700.0]
2025-09-16 14:16:46,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 19 minutes, 55 seconds)
2025-09-16 14:18:41,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:18:46,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 2032.66431 ± 1472.349
2025-09-16 14:18:46,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [478.50153, 946.89325, 5304.625, 3060.6655, 1993.2527, 909.41077, 651.8886, 3233.0964, 2785.2512, 963.0565]
2025-09-16 14:18:46,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [97.0, 178.0, 1000.0, 594.0, 369.0, 187.0, 139.0, 613.0, 533.0, 194.0]
2025-09-16 14:18:46,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 17 minutes, 34 seconds)
2025-09-16 14:20:48,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:20:58,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 3696.39893 ± 1639.281
2025-09-16 14:20:58,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5354.4395, 2077.3477, 5286.739, 3004.5886, 2618.9858, 3951.75, 5308.5957, 3937.653, 194.75125, 5229.1377]
2025-09-16 14:20:58,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 397.0, 1000.0, 546.0, 499.0, 721.0, 1000.0, 726.0, 37.0, 1000.0]
2025-09-16 14:20:58,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 15 minutes, 47 seconds)
2025-09-16 14:22:54,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:23:06,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4069.36719 ± 1680.150
2025-09-16 14:23:06,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5233.7583, 5229.886, 1066.9198, 5280.83, 3217.6316, 5279.431, 5235.046, 4065.9734, 875.87744, 5208.315]
2025-09-16 14:23:06,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 191.0, 1000.0, 614.0, 1000.0, 1000.0, 792.0, 174.0, 1000.0]
2025-09-16 14:23:06,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 13 minutes, 24 seconds)
2025-09-16 14:25:07,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:25:18,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 3972.54175 ± 1618.134
2025-09-16 14:25:18,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1954.1744, 5342.642, 5380.9536, 1597.4366, 5368.31, 5330.57, 2823.5623, 4788.0073, 5373.491, 1766.2688]
2025-09-16 14:25:18,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [347.0, 1000.0, 1000.0, 317.0, 1000.0, 1000.0, 510.0, 885.0, 1000.0, 345.0]
2025-09-16 14:25:18,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 12 minutes, 49 seconds)
2025-09-16 14:27:12,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:27:24,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 3940.68799 ± 1748.152
2025-09-16 14:27:24,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1968.0997, 5423.398, 5331.185, 2095.6548, 5403.405, 5316.3413, 1454.6276, 5401.2705, 5296.7417, 1716.1558]
2025-09-16 14:27:24,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [361.0, 1000.0, 1000.0, 391.0, 1000.0, 1000.0, 260.0, 1000.0, 1000.0, 312.0]
2025-09-16 14:27:24,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 10 minutes, 10 seconds)
2025-09-16 14:29:22,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:29:35,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4796.32129 ± 1078.258
2025-09-16 14:29:35,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5317.123, 4036.2942, 3916.532, 5400.7544, 5507.651, 5493.3467, 5402.5547, 5396.082, 2056.0874, 5436.7935]
2025-09-16 14:29:35,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 743.0, 706.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 381.0, 1000.0]
2025-09-16 14:29:35,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (4796.32) for latency 12
2025-09-16 14:29:35,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 9 minutes, 11 seconds)
2025-09-16 14:31:37,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:31:49,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4211.56152 ± 1453.088
2025-09-16 14:31:49,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1407.6395, 1762.8855, 5261.896, 3865.807, 5220.3013, 5269.0083, 5317.68, 5277.253, 3503.1826, 5229.965]
2025-09-16 14:31:49,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [287.0, 337.0, 1000.0, 737.0, 1000.0, 1000.0, 1000.0, 1000.0, 669.0, 1000.0]
2025-09-16 14:31:49,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 7 minutes, 16 seconds)
2025-09-16 14:33:42,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:33:56,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4940.53174 ± 797.511
2025-09-16 14:33:56,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [2756.7083, 5340.2764, 5312.044, 5264.1523, 4217.3545, 5319.075, 5313.602, 5322.8843, 5220.472, 5338.75]
2025-09-16 14:33:56,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [524.0, 1000.0, 1000.0, 1000.0, 783.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:33:56,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (4940.53) for latency 12
2025-09-16 14:33:56,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 5 minutes, 2 seconds)
2025-09-16 14:35:52,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:36:03,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 3977.16553 ± 1724.340
2025-09-16 14:36:03,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5367.849, 1634.3523, 5412.239, 5473.598, 4174.1797, 2460.4521, 3843.8743, 5476.4556, 581.5341, 5347.1226]
2025-09-16 14:36:03,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [979.0, 303.0, 1000.0, 1000.0, 775.0, 491.0, 718.0, 1000.0, 99.0, 1000.0]
2025-09-16 14:36:03,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 2 minutes, 22 seconds)
2025-09-16 14:38:05,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:38:19,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4793.69238 ± 1479.699
2025-09-16 14:38:19,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5315.057, 5327.2305, 5246.5605, 5278.3013, 5305.002, 5207.573, 5338.822, 5304.8057, 356.0773, 5257.497]
2025-09-16 14:38:19,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 65.0, 1000.0]
2025-09-16 14:38:19,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 1 minute, 8 seconds)
2025-09-16 14:40:13,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:40:26,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4932.27490 ± 1448.242
2025-09-16 14:40:26,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5381.3096, 5447.315, 5432.1465, 5365.2305, 5407.9194, 5416.7456, 5457.4927, 5363.1914, 588.7684, 5462.6284]
2025-09-16 14:40:26,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 109.0, 1000.0]
2025-09-16 14:40:26,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 74/100 (estimated time remaining: 58 minutes, 36 seconds)
2025-09-16 14:42:29,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:42:43,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5032.39795 ± 1153.400
2025-09-16 14:42:43,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [1575.1086, 5432.2573, 5395.0415, 5391.4517, 5503.4316, 5361.695, 5431.226, 5467.9424, 5331.1274, 5434.696]
2025-09-16 14:42:43,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [289.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:42:43,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (5032.40) for latency 12
2025-09-16 14:42:43,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 75/100 (estimated time remaining: 56 minutes, 39 seconds)
2025-09-16 14:44:43,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:44:57,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4941.93262 ± 1579.354
2025-09-16 14:44:57,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5505.781, 5435.884, 5473.3774, 5529.8403, 5476.3394, 5446.602, 5470.4937, 5405.447, 204.89415, 5470.665]
2025-09-16 14:44:57,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 41.0, 1000.0]
2025-09-16 14:44:57,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 76/100 (estimated time remaining: 55 minutes, 4 seconds)
2025-09-16 14:46:52,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:47:03,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 3944.26172 ± 1698.916
2025-09-16 14:47:03,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [3538.4023, 5348.3145, 5285.09, 4525.789, 5365.9883, 2570.9866, 5369.8706, 1669.5089, 514.7205, 5253.9473]
2025-09-16 14:47:03,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [689.0, 1000.0, 1000.0, 864.0, 1000.0, 510.0, 1000.0, 316.0, 93.0, 1000.0]
2025-09-16 14:47:03,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 77/100 (estimated time remaining: 52 minutes, 50 seconds)
2025-09-16 14:48:59,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:49:13,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4959.27441 ± 955.595
2025-09-16 14:49:13,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [2236.8616, 5396.6143, 5358.127, 5372.4883, 5399.1943, 5402.2886, 5334.571, 4371.196, 5349.8735, 5371.5317]
2025-09-16 14:49:13,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [413.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 811.0, 1000.0, 1000.0]
2025-09-16 14:49:13,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 78/100 (estimated time remaining: 50 minutes, 9 seconds)
2025-09-16 14:51:20,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:51:35,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5344.69824 ± 29.196
2025-09-16 14:51:35,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5378.014, 5336.4634, 5374.6465, 5281.383, 5332.7256, 5339.629, 5345.7314, 5372.241, 5315.6245, 5370.5195]
2025-09-16 14:51:35,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:51:35,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (5344.70) for latency 12
2025-09-16 14:51:35,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 79/100 (estimated time remaining: 49 minutes, 3 seconds)
2025-09-16 14:53:29,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:53:43,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5055.26855 ± 1169.681
2025-09-16 14:53:43,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5438.858, 5498.6143, 5440.2427, 5429.2466, 5452.5073, 5486.846, 5450.836, 5403.2583, 1547.3005, 5404.982]
2025-09-16 14:53:43,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 303.0, 1000.0]
2025-09-16 14:53:43,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 80/100 (estimated time remaining: 46 minutes, 11 seconds)
2025-09-16 14:55:36,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:55:50,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4925.48730 ± 760.762
2025-09-16 14:55:50,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5318.154, 5364.8936, 5164.3223, 5277.8247, 4232.728, 5260.2144, 2846.4775, 5233.334, 5300.2153, 5256.7124]
2025-09-16 14:55:50,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 820.0, 1000.0, 551.0, 1000.0, 1000.0, 1000.0]
2025-09-16 14:55:50,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 81/100 (estimated time remaining: 43 minutes, 33 seconds)
2025-09-16 14:57:56,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 14:58:07,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 3993.79419 ± 2016.127
2025-09-16 14:58:07,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5539.2793, 369.9403, 987.90546, 5477.267, 5481.444, 5547.4224, 5500.604, 3739.2424, 1818.66, 5476.1763]
2025-09-16 14:58:07,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 68.0, 191.0, 1000.0, 1000.0, 1000.0, 1000.0, 691.0, 369.0, 1000.0]
2025-09-16 14:58:07,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 82/100 (estimated time remaining: 42 minutes)
2025-09-16 14:59:59,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:00:13,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4865.23535 ± 1466.689
2025-09-16 15:00:13,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5369.291, 5183.0522, 5381.5264, 5386.169, 471.05313, 5244.162, 5437.6353, 5369.0054, 5375.7954, 5434.663]
2025-09-16 15:00:13,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 81.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:00:13,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 83/100 (estimated time remaining: 39 minutes, 36 seconds)
2025-09-16 15:02:12,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:02:26,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4873.79199 ± 1407.982
2025-09-16 15:02:26,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5322.4375, 5393.078, 5350.8613, 5322.0625, 5336.2163, 650.2349, 5331.1973, 5340.3584, 5344.6997, 5346.778]
2025-09-16 15:02:26,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 118.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:02:26,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 84/100 (estimated time remaining: 36 minutes, 50 seconds)
2025-09-16 15:04:33,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:04:46,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4835.30713 ± 1330.396
2025-09-16 15:04:46,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5491.308, 5475.808, 5518.905, 5476.48, 4756.1577, 5416.5977, 1010.6646, 5474.2295, 4319.495, 5413.426]
2025-09-16 15:04:46,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 867.0, 1000.0, 200.0, 1000.0, 805.0, 1000.0]
2025-09-16 15:04:46,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 85/100 (estimated time remaining: 35 minutes, 21 seconds)
2025-09-16 15:06:45,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:07:01,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5362.81445 ± 45.138
2025-09-16 15:07:01,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5401.677, 5362.7334, 5372.8203, 5377.6777, 5308.541, 5397.3345, 5413.5635, 5258.258, 5347.166, 5388.3794]
2025-09-16 15:07:01,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:07:01,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (5362.81) for latency 12
2025-09-16 15:07:01,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 86/100 (estimated time remaining: 33 minutes, 30 seconds)
2025-09-16 15:08:48,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:08:57,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 3468.71558 ± 2248.254
2025-09-16 15:08:57,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5609.5425, 1209.1017, 5581.372, 397.10056, 1360.5181, 3226.7983, 5618.8584, 5602.5273, 499.6457, 5581.689]
2025-09-16 15:08:57,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 220.0, 1000.0, 70.0, 271.0, 581.0, 1000.0, 1000.0, 102.0, 1000.0]
2025-09-16 15:08:57,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 87/100 (estimated time remaining: 30 minutes, 20 seconds)
2025-09-16 15:10:59,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:11:12,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4958.94238 ± 1527.024
2025-09-16 15:11:12,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5442.543, 5469.9673, 5429.661, 5467.0366, 5477.3003, 379.54962, 5503.4224, 5398.323, 5561.357, 5460.26]
2025-09-16 15:11:12,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 72.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:11:12,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 88/100 (estimated time remaining: 28 minutes, 34 seconds)
2025-09-16 15:13:13,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:13:27,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4912.69287 ± 1482.227
2025-09-16 15:13:27,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5373.875, 467.324, 5410.5513, 5443.004, 5439.493, 5439.8794, 5350.201, 5452.284, 5351.524, 5398.795]
2025-09-16 15:13:27,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 83.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:13:27,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 89/100 (estimated time remaining: 26 minutes, 27 seconds)
2025-09-16 15:15:26,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:15:39,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5064.63672 ± 1054.776
2025-09-16 15:15:39,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5422.974, 5446.1, 5399.0356, 5465.763, 5374.4883, 5536.6367, 1906.7788, 5265.9297, 5376.0957, 5452.562]
2025-09-16 15:15:39,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 364.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:15:39,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 90/100 (estimated time remaining: 23 minutes, 57 seconds)
2025-09-16 15:17:35,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:17:47,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4368.92529 ± 1801.399
2025-09-16 15:17:47,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5522.0264, 5431.633, 5431.8, 5448.1064, 738.25934, 5428.489, 5483.045, 1156.0475, 3581.4932, 5468.357]
2025-09-16 15:17:47,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 137.0, 1000.0, 1000.0, 221.0, 658.0, 1000.0]
2025-09-16 15:17:47,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 91/100 (estimated time remaining: 21 minutes, 32 seconds)
2025-09-16 15:19:44,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:19:57,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4827.75098 ± 1431.222
2025-09-16 15:19:57,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5193.227, 539.0533, 5314.6675, 5391.6157, 5192.886, 5371.1387, 5376.2124, 5325.21, 5332.4956, 5241.007]
2025-09-16 15:19:57,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 95.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:19:57,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 92/100 (estimated time remaining: 19 minutes, 48 seconds)
2025-09-16 15:21:53,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:22:08,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5280.78662 ± 48.817
2025-09-16 15:22:08,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5292.658, 5313.2646, 5262.1016, 5371.113, 5318.751, 5268.0195, 5270.296, 5198.2266, 5209.612, 5303.8257]
2025-09-16 15:22:08,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:22:08,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 93/100 (estimated time remaining: 17 minutes, 29 seconds)
2025-09-16 15:24:06,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:24:19,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4492.06738 ± 1857.274
2025-09-16 15:24:19,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5452.942, 5416.871, 5403.1455, 570.6325, 5341.945, 995.636, 5437.973, 5473.7095, 5453.5547, 5374.269]
2025-09-16 15:24:19,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 104.0, 1000.0, 174.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:24:19,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 94/100 (estimated time remaining: 15 minutes, 12 seconds)
2025-09-16 15:26:23,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:26:38,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5430.81348 ± 33.138
2025-09-16 15:26:38,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5444.0244, 5345.1357, 5450.962, 5411.2866, 5449.9834, 5420.1445, 5462.2534, 5452.0503, 5454.868, 5417.4297]
2025-09-16 15:26:38,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:26:38,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1226 [INFO]: New best (5430.81) for latency 12
2025-09-16 15:26:38,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 95/100 (estimated time remaining: 13 minutes, 10 seconds)
2025-09-16 15:28:30,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:28:43,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4953.38525 ± 1526.027
2025-09-16 15:28:43,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5402.74, 5520.56, 5381.3843, 5498.466, 377.0002, 5480.894, 5436.2554, 5495.8584, 5460.9062, 5479.7896]
2025-09-16 15:28:43,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 69.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:28:43,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 96/100 (estimated time remaining: 10 minutes, 56 seconds)
2025-09-16 15:30:37,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:30:51,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4899.46533 ± 1590.892
2025-09-16 15:30:51,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5390.024, 5421.605, 5458.0874, 5451.9004, 5340.7456, 5523.07, 5430.682, 5422.64, 128.63942, 5427.259]
2025-09-16 15:30:51,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 25.0, 1000.0]
2025-09-16 15:30:51,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes, 42 seconds)
2025-09-16 15:32:53,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:33:06,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4416.85498 ± 1742.935
2025-09-16 15:33:06,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5219.9346, 5288.1606, 5247.049, 5305.1274, 5339.5825, 5354.8887, 481.54758, 5164.5493, 1436.4629, 5331.247]
2025-09-16 15:33:06,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 89.0, 1000.0, 287.0, 1000.0]
2025-09-16 15:33:06,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 34 seconds)
2025-09-16 15:34:58,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:35:12,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4814.44043 ± 1461.424
2025-09-16 15:35:12,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5207.692, 5335.519, 5303.947, 5306.1143, 5332.1206, 5333.354, 5269.3184, 5334.916, 431.62726, 5289.794]
2025-09-16 15:35:12,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 77.0, 1000.0]
2025-09-16 15:35:12,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 21 seconds)
2025-09-16 15:37:17,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:37:30,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 4623.80908 ± 1731.597
2025-09-16 15:37:30,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5440.902, 155.04344, 5441.2783, 5488.5723, 5424.4517, 5410.637, 5511.9634, 5452.278, 2489.386, 5423.5767]
2025-09-16 15:37:30,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 30.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 453.0, 1000.0]
2025-09-16 15:37:30,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 10 seconds)
2025-09-16 15:39:22,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1214 [DEBUG]: Evaluating for latency 12...
2025-09-16 15:39:37,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1221 [DEBUG]: Total Reward: 5259.52979 ± 58.460
2025-09-16 15:39:37,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1222 [DEBUG]: All rewards: [5325.202, 5262.2603, 5237.772, 5312.6343, 5268.787, 5245.473, 5301.1323, 5282.923, 5104.691, 5254.4233]
2025-09-16 15:39:37,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 15:39:37,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-humanoid):1251 [DEBUG]: Training session finished
