2025-09-14 08:43:01,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1108 [DEBUG]: logdir: _logs/noise-eval-v2/halfcheetah/bpql-noise_0.025-delay_3
2025-09-14 08:43:01,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1109 [DEBUG]: trainer_prefix: noise-eval-v2/halfcheetah/bpql-noise_0.025-delay_3
2025-09-14 08:43:01,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'3': <latency_env.delayed_mdp.ConstantDelay object at 0x7fc8b8217b60>}
2025-09-14 08:43:01,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1111 [DEBUG]: using device: cpu
2025-09-14 08:43:01,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-14 08:43:01,654 baseline-bpql-noisepromille25-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=35, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-14 08:43:01,655 baseline-bpql-noisepromille25-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-14 08:43:03,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-14 08:43:03,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-14 08:45:54,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 08:45:59,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: -507.10385 ± 1.783
2025-09-14 08:45:59,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-507.34265), np.float32(-510.18433), np.float32(-503.81653), np.float32(-507.11606), np.float32(-505.63818), np.float32(-505.00098), np.float32(-507.45374), np.float32(-508.9255), np.float32(-507.26996), np.float32(-508.29034)]
2025-09-14 08:45:59,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 08:45:59,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (-507.10) for latency 3
2025-09-14 08:45:59,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 4 hours, 51 minutes, 12 seconds)
2025-09-14 08:48:52,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 08:48:57,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: -84.68288 ± 84.971
2025-09-14 08:48:57,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(72.69105), np.float32(-152.64279), np.float32(-142.0819), np.float32(-168.81003), np.float32(63.432926), np.float32(-46.82452), np.float32(-137.49774), np.float32(-101.29971), np.float32(-72.5604), np.float32(-161.23567)]
2025-09-14 08:48:57,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 08:48:57,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (-84.68) for latency 3
2025-09-14 08:48:57,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 4 hours, 49 minutes, 24 seconds)
2025-09-14 08:51:41,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 08:51:46,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 656.76691 ± 37.754
2025-09-14 08:51:46,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(689.4281), np.float32(567.7436), np.float32(657.92096), np.float32(634.96625), np.float32(682.68994), np.float32(715.36536), np.float32(663.1595), np.float32(667.2573), np.float32(633.8281), np.float32(655.30963)]
2025-09-14 08:51:46,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 08:51:46,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (656.77) for latency 3
2025-09-14 08:51:46,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 4 hours, 41 minutes, 55 seconds)
2025-09-14 08:54:26,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 08:54:32,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 1832.79663 ± 46.529
2025-09-14 08:54:32,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1856.467), np.float32(1790.1696), np.float32(1797.3943), np.float32(1955.0236), np.float32(1828.6323), np.float32(1801.2643), np.float32(1825.3671), np.float32(1789.936), np.float32(1839.4789), np.float32(1844.2341)]
2025-09-14 08:54:32,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 08:54:32,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (1832.80) for latency 3
2025-09-14 08:54:32,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 4 hours, 35 minutes, 29 seconds)
2025-09-14 08:57:26,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 08:57:32,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 2080.86108 ± 158.727
2025-09-14 08:57:32,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1993.7269), np.float32(2070.3157), np.float32(2248.0906), np.float32(1782.3278), np.float32(1923.78), np.float32(1998.5708), np.float32(2348.052), np.float32(2210.1118), np.float32(2166.925), np.float32(2066.709)]
2025-09-14 08:57:32,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 08:57:32,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (2080.86) for latency 3
2025-09-14 08:57:32,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 4 hours, 35 minutes, 7 seconds)
2025-09-14 09:00:33,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 09:00:38,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 2561.46631 ± 113.879
2025-09-14 09:00:38,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2546.383), np.float32(2605.3657), np.float32(2383.7322), np.float32(2579.9194), np.float32(2701.5085), np.float32(2764.9807), np.float32(2479.9866), np.float32(2402.8708), np.float32(2540.2065), np.float32(2609.7085)]
2025-09-14 09:00:38,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:00:38,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (2561.47) for latency 3
2025-09-14 09:00:38,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 35 minutes, 27 seconds)
2025-09-14 09:03:24,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 09:03:29,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 2925.84521 ± 197.643
2025-09-14 09:03:29,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3138.3416), np.float32(2952.5244), np.float32(2539.4126), np.float32(3033.3447), np.float32(2800.4285), np.float32(2846.7644), np.float32(3223.4336), np.float32(2716.1155), np.float32(2911.5308), np.float32(3096.5554)]
2025-09-14 09:03:29,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:03:29,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (2925.85) for latency 3
2025-09-14 09:03:29,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 30 minutes, 20 seconds)
2025-09-14 09:06:19,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 09:06:25,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3157.41602 ± 79.443
2025-09-14 09:06:25,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3136.2012), np.float32(3085.8413), np.float32(3112.6082), np.float32(3222.1768), np.float32(3245.4387), np.float32(3218.365), np.float32(3080.568), np.float32(3124.6553), np.float32(3301.3052), np.float32(3047.0)]
2025-09-14 09:06:25,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:06:25,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (3157.42) for latency 3
2025-09-14 09:06:25,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 29 minutes, 26 seconds)
2025-09-14 09:09:01,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 09:09:07,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3617.44287 ± 125.795
2025-09-14 09:09:07,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3399.3992), np.float32(3512.115), np.float32(3574.3127), np.float32(3603.6057), np.float32(3694.156), np.float32(3868.762), np.float32(3654.0195), np.float32(3733.5237), np.float32(3501.457), np.float32(3633.08)]
2025-09-14 09:09:07,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:09:07,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (3617.44) for latency 3
2025-09-14 09:09:07,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 4 hours, 25 minutes, 24 seconds)
2025-09-14 09:11:37,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 09:11:42,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 3842.84375 ± 168.411
2025-09-14 09:11:42,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3735.2822), np.float32(4032.6418), np.float32(3932.4707), np.float32(3943.177), np.float32(3576.2305), np.float32(3764.8892), np.float32(4041.2122), np.float32(3887.4019), np.float32(3963.5898), np.float32(3551.5422)]
2025-09-14 09:11:42,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:11:42,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (3842.84) for latency 3
2025-09-14 09:11:42,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 15 minutes, 9 seconds)
2025-09-14 09:14:13,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 09:14:18,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4096.82227 ± 123.715
2025-09-14 09:14:18,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4349.853), np.float32(4101.213), np.float32(4043.5923), np.float32(4154.838), np.float32(3931.2368), np.float32(4109.113), np.float32(4209.479), np.float32(3925.7039), np.float32(4149.045), np.float32(3994.1523)]
2025-09-14 09:14:18,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:14:18,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (4096.82) for latency 3
2025-09-14 09:14:18,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 3 minutes, 15 seconds)
2025-09-14 09:16:54,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 09:17:00,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4171.13818 ± 140.624
2025-09-14 09:17:00,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4067.6697), np.float32(4221.4214), np.float32(4097.8237), np.float32(4057.7017), np.float32(3891.7256), np.float32(4209.908), np.float32(4410.761), np.float32(4182.3867), np.float32(4248.7656), np.float32(4323.217)]
2025-09-14 09:17:00,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:17:00,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (4171.14) for latency 3
2025-09-14 09:17:00,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 3 hours, 57 minutes, 41 seconds)
2025-09-14 09:19:31,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 09:19:36,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4469.94629 ± 147.773
2025-09-14 09:19:36,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4458.3486), np.float32(4540.205), np.float32(4583.893), np.float32(4464.797), np.float32(4279.132), np.float32(4800.365), np.float32(4436.814), np.float32(4288.0356), np.float32(4514.1514), np.float32(4333.722)]
2025-09-14 09:19:36,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:19:36,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (4469.95) for latency 3
2025-09-14 09:19:36,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 3 hours, 49 minutes, 33 seconds)
2025-09-14 09:22:36,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 09:22:42,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4705.87988 ± 135.591
2025-09-14 09:22:42,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4484.7817), np.float32(4600.52), np.float32(4645.8066), np.float32(4943.657), np.float32(4939.503), np.float32(4664.8564), np.float32(4757.2817), np.float32(4683.8027), np.float32(4640.5996), np.float32(4697.985)]
2025-09-14 09:22:42,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:22:42,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (4705.88) for latency 3
2025-09-14 09:22:42,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 3 hours, 53 minutes, 42 seconds)
2025-09-14 09:25:42,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 09:25:47,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4581.61572 ± 86.428
2025-09-14 09:25:47,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4475.762), np.float32(4753.3096), np.float32(4596.501), np.float32(4684.3496), np.float32(4553.5527), np.float32(4570.884), np.float32(4633.225), np.float32(4451.287), np.float32(4532.6294), np.float32(4564.656)]
2025-09-14 09:25:47,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:25:47,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 3 hours, 59 minutes, 22 seconds)
2025-09-14 09:28:30,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 09:28:35,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4867.75293 ± 94.502
2025-09-14 09:28:35,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4871.7075), np.float32(4776.594), np.float32(4865.404), np.float32(4825.024), np.float32(5072.4346), np.float32(4814.7417), np.float32(4785.1772), np.float32(4957.305), np.float32(4756.443), np.float32(4952.6978)]
2025-09-14 09:28:35,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:28:35,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (4867.75) for latency 3
2025-09-14 09:28:35,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 3 hours, 59 minutes, 52 seconds)
2025-09-14 09:30:52,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 09:30:57,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4885.17383 ± 86.113
2025-09-14 09:30:57,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4966.63), np.float32(4835.2617), np.float32(4986.6426), np.float32(4767.433), np.float32(4882.995), np.float32(4883.6), np.float32(4971.4263), np.float32(4720.57), np.float32(4963.834), np.float32(4873.34)]
2025-09-14 09:30:57,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:30:57,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (4885.17) for latency 3
2025-09-14 09:30:57,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 3 hours, 51 minutes, 35 seconds)
2025-09-14 09:33:20,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 09:33:26,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4942.38428 ± 108.675
2025-09-14 09:33:26,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4881.9663), np.float32(4882.7925), np.float32(5040.359), np.float32(4949.244), np.float32(4958.5864), np.float32(5122.3867), np.float32(4701.389), np.float32(4917.458), np.float32(4928.7603), np.float32(5040.8994)]
2025-09-14 09:33:26,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:33:26,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (4942.38) for latency 3
2025-09-14 09:33:26,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 3 hours, 46 minutes, 41 seconds)
2025-09-14 09:35:43,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 09:35:48,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5117.73730 ± 77.003
2025-09-14 09:35:48,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5013.4795), np.float32(5170.9546), np.float32(5070.7607), np.float32(5162.1963), np.float32(5249.8325), np.float32(4971.039), np.float32(5108.827), np.float32(5142.614), np.float32(5130.0537), np.float32(5157.6133)]
2025-09-14 09:35:48,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:35:48,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (5117.74) for latency 3
2025-09-14 09:35:48,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 3 hours, 32 minutes, 20 seconds)
2025-09-14 09:38:12,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 09:38:17,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5137.11670 ± 184.290
2025-09-14 09:38:17,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5078.8716), np.float32(5421.8105), np.float32(4968.8936), np.float32(5300.0923), np.float32(5074.662), np.float32(5068.0054), np.float32(5068.763), np.float32(5268.252), np.float32(4779.694), np.float32(5342.12)]
2025-09-14 09:38:17,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:38:17,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (5137.12) for latency 3
2025-09-14 09:38:17,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 3 hours, 19 minutes, 59 seconds)
2025-09-14 09:40:36,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 09:40:41,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4811.52197 ± 1299.280
2025-09-14 09:40:41,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5092.0713), np.float32(5348.3633), np.float32(5401.3296), np.float32(5256.263), np.float32(930.86945), np.float32(5426.1113), np.float32(5025.222), np.float32(5268.9224), np.float32(5200.2227), np.float32(5165.8423)]
2025-09-14 09:40:41,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:40:41,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 3 hours, 11 minutes, 7 seconds)
2025-09-14 09:43:02,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 09:43:07,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5245.16309 ± 112.667
2025-09-14 09:43:07,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5292.807), np.float32(5199.467), np.float32(5481.123), np.float32(5290.3013), np.float32(5207.9517), np.float32(5274.459), np.float32(5051.493), np.float32(5156.2515), np.float32(5156.203), np.float32(5341.576)]
2025-09-14 09:43:07,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:43:07,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (5245.16) for latency 3
2025-09-14 09:43:07,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 3 hours, 9 minutes, 47 seconds)
2025-09-14 09:45:48,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 09:45:53,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 4937.50879 ± 712.465
2025-09-14 09:45:53,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5360.6777), np.float32(3597.192), np.float32(5039.2134), np.float32(5228.6553), np.float32(5404.453), np.float32(5497.603), np.float32(5476.964), np.float32(5691.691), np.float32(4053.3464), np.float32(4025.2896)]
2025-09-14 09:45:53,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:45:53,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 3 hours, 11 minutes, 52 seconds)
2025-09-14 09:48:32,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 09:48:37,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5460.86035 ± 162.387
2025-09-14 09:48:37,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5658.2876), np.float32(5266.3496), np.float32(5436.801), np.float32(5618.2715), np.float32(5466.1226), np.float32(5291.391), np.float32(5524.1724), np.float32(5609.572), np.float32(5580.608), np.float32(5157.0234)]
2025-09-14 09:48:37,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:48:37,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (5460.86) for latency 3
2025-09-14 09:48:37,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 3 hours, 14 minutes, 49 seconds)
2025-09-14 09:51:21,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 09:51:26,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5518.25781 ± 164.407
2025-09-14 09:51:26,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5700.415), np.float32(5589.4575), np.float32(5566.79), np.float32(5327.5293), np.float32(5180.554), np.float32(5622.909), np.float32(5580.3774), np.float32(5398.2886), np.float32(5477.9136), np.float32(5738.343)]
2025-09-14 09:51:26,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:51:26,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (5518.26) for latency 3
2025-09-14 09:51:26,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 3 hours, 17 minutes, 17 seconds)
2025-09-14 09:54:08,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 09:54:13,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5476.21484 ± 212.354
2025-09-14 09:54:13,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5682.2607), np.float32(5259.1655), np.float32(5483.948), np.float32(5484.728), np.float32(5173.5073), np.float32(5750.6987), np.float32(5702.868), np.float32(5237.78), np.float32(5286.9023), np.float32(5700.2896)]
2025-09-14 09:54:13,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:54:13,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 20 minutes, 22 seconds)
2025-09-14 09:56:51,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 09:56:57,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5572.18457 ± 102.203
2025-09-14 09:56:57,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5536.906), np.float32(5477.3696), np.float32(5564.3403), np.float32(5592.507), np.float32(5538.9336), np.float32(5586.454), np.float32(5464.353), np.float32(5505.766), np.float32(5609.105), np.float32(5846.1084)]
2025-09-14 09:56:57,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:56:57,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (5572.18) for latency 3
2025-09-14 09:56:57,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 21 minutes, 56 seconds)
2025-09-14 09:59:39,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 09:59:44,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5734.84521 ± 115.650
2025-09-14 09:59:44,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5968.2456), np.float32(5694.6963), np.float32(5806.8022), np.float32(5712.2534), np.float32(5842.4355), np.float32(5637.0796), np.float32(5791.5957), np.float32(5657.4253), np.float32(5533.0317), np.float32(5704.89)]
2025-09-14 09:59:44,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:59:44,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (5734.85) for latency 3
2025-09-14 09:59:44,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 3 hours, 19 minutes, 19 seconds)
2025-09-14 10:02:27,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 10:02:33,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5753.12793 ± 104.570
2025-09-14 10:02:33,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5683.9653), np.float32(5794.3047), np.float32(5684.1665), np.float32(5765.1245), np.float32(5653.6353), np.float32(5925.6597), np.float32(5589.1294), np.float32(5840.0083), np.float32(5897.3877), np.float32(5697.902)]
2025-09-14 10:02:33,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:02:33,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (5753.13) for latency 3
2025-09-14 10:02:33,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 17 minutes, 43 seconds)
2025-09-14 10:05:12,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 10:05:17,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5899.93652 ± 103.890
2025-09-14 10:05:17,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5947.1953), np.float32(5716.297), np.float32(5932.1187), np.float32(5901.397), np.float32(5822.5396), np.float32(5820.637), np.float32(5889.368), np.float32(5847.775), np.float32(6105.347), np.float32(6016.691)]
2025-09-14 10:05:17,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:05:17,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (5899.94) for latency 3
2025-09-14 10:05:17,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 13 minutes, 51 seconds)
2025-09-14 10:07:56,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 10:08:01,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5484.04785 ± 1570.021
2025-09-14 10:08:01,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5901.0625), np.float32(5982.0938), np.float32(790.69794), np.float32(6037.9404), np.float32(6139.95), np.float32(6166.623), np.float32(5826.834), np.float32(6081.4688), np.float32(6151.402), np.float32(5762.4077)]
2025-09-14 10:08:01,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:08:01,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 10 minutes, 22 seconds)
2025-09-14 10:10:44,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 10:10:49,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5967.37646 ± 105.421
2025-09-14 10:10:49,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6054.764), np.float32(5953.3486), np.float32(5994.046), np.float32(5893.7285), np.float32(5993.6196), np.float32(6134.9604), np.float32(5876.661), np.float32(5868.2705), np.float32(5794.2593), np.float32(6110.1104)]
2025-09-14 10:10:49,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:10:49,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (5967.38) for latency 3
2025-09-14 10:10:49,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 8 minutes, 41 seconds)
2025-09-14 10:13:27,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 10:13:32,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 6106.80176 ± 80.132
2025-09-14 10:13:32,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6012.6055), np.float32(6075.1143), np.float32(6110.1973), np.float32(6069.081), np.float32(6084.1626), np.float32(6057.812), np.float32(6033.949), np.float32(6301.006), np.float32(6182.788), np.float32(6141.298)]
2025-09-14 10:13:32,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:13:32,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (6106.80) for latency 3
2025-09-14 10:13:32,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 4 minutes, 58 seconds)
2025-09-14 10:16:14,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 10:16:19,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 6151.88379 ± 109.334
2025-09-14 10:16:19,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6239.84), np.float32(5997.172), np.float32(5930.478), np.float32(6232.8784), np.float32(6151.987), np.float32(6267.576), np.float32(6185.4897), np.float32(6272.8447), np.float32(6096.3647), np.float32(6144.2075)]
2025-09-14 10:16:19,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:16:19,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (6151.88) for latency 3
2025-09-14 10:16:19,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 1 minute, 41 seconds)
2025-09-14 10:19:15,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 10:19:21,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 6165.62256 ± 140.708
2025-09-14 10:19:21,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6036.0513), np.float32(6348.6494), np.float32(6256.6885), np.float32(6133.8433), np.float32(5971.543), np.float32(6399.6797), np.float32(6260.777), np.float32(6095.8994), np.float32(5986.6533), np.float32(6166.442)]
2025-09-14 10:19:21,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:19:21,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (6165.62) for latency 3
2025-09-14 10:19:21,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 2 minutes, 48 seconds)
2025-09-14 10:22:17,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 10:22:23,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 6051.21191 ± 95.056
2025-09-14 10:22:23,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(5953.169), np.float32(6155.244), np.float32(5902.284), np.float32(6196.2476), np.float32(5996.924), np.float32(6174.29), np.float32(6090.947), np.float32(6057.277), np.float32(6002.5415), np.float32(5983.195)]
2025-09-14 10:22:23,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:22:23,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 3 minutes, 55 seconds)
2025-09-14 10:25:12,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 10:25:17,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 6155.74902 ± 157.963
2025-09-14 10:25:17,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6169.294), np.float32(6198.137), np.float32(6322.2266), np.float32(6191.9927), np.float32(5950.5244), np.float32(6115.949), np.float32(6194.197), np.float32(6184.2993), np.float32(5824.5166), np.float32(6406.3584)]
2025-09-14 10:25:17,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:25:17,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 2 minutes, 15 seconds)
2025-09-14 10:27:57,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 10:28:02,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 6298.42969 ± 96.936
2025-09-14 10:28:02,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6270.901), np.float32(6300.4585), np.float32(6330.6616), np.float32(6351.704), np.float32(6337.5107), np.float32(6477.57), np.float32(6335.8657), np.float32(6093.572), np.float32(6187.3433), np.float32(6298.7114)]
2025-09-14 10:28:02,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:28:02,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (6298.43) for latency 3
2025-09-14 10:28:02,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 59 minutes, 48 seconds)
2025-09-14 10:30:56,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 10:31:02,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 6187.89551 ± 170.367
2025-09-14 10:31:02,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6297.558), np.float32(6498.823), np.float32(6161.843), np.float32(6092.284), np.float32(5817.3286), np.float32(6305.809), np.float32(6205.727), np.float32(6119.054), np.float32(6278.1167), np.float32(6102.414)]
2025-09-14 10:31:02,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:31:02,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 59 minutes, 32 seconds)
2025-09-14 10:33:58,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 10:34:04,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 6353.44434 ± 83.387
2025-09-14 10:34:04,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6309.765), np.float32(6308.718), np.float32(6257.517), np.float32(6466.244), np.float32(6276.2163), np.float32(6426.733), np.float32(6228.6216), np.float32(6467.8066), np.float32(6393.843), np.float32(6398.98)]
2025-09-14 10:34:04,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:34:04,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (6353.44) for latency 3
2025-09-14 10:34:04,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 56 minutes, 39 seconds)
2025-09-14 10:37:03,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 10:37:08,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 6279.67578 ± 181.068
2025-09-14 10:37:08,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6443.286), np.float32(6449.6953), np.float32(6534.1904), np.float32(6270.2734), np.float32(5982.8335), np.float32(6272.649), np.float32(5958.0244), np.float32(6392.1196), np.float32(6225.7437), np.float32(6267.9473)]
2025-09-14 10:37:08,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:37:08,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 54 minutes, 2 seconds)
2025-09-14 10:40:08,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 10:40:13,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 6509.41455 ± 124.232
2025-09-14 10:40:13,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6645.8623), np.float32(6262.047), np.float32(6442.3955), np.float32(6398.653), np.float32(6550.079), np.float32(6484.793), np.float32(6552.3022), np.float32(6482.054), np.float32(6538.3237), np.float32(6737.6313)]
2025-09-14 10:40:13,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:40:13,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (6509.41) for latency 3
2025-09-14 10:40:13,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 53 minutes, 15 seconds)
2025-09-14 10:43:12,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 10:43:17,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 5986.92822 ± 1542.369
2025-09-14 10:43:17,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6391.403), np.float32(1375.2809), np.float32(6695.121), np.float32(6392.3076), np.float32(6531.2207), np.float32(6256.905), np.float32(6691.255), np.float32(6502.335), np.float32(6513.7007), np.float32(6519.7534)]
2025-09-14 10:43:17,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:43:17,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 53 minutes, 52 seconds)
2025-09-14 10:46:16,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 10:46:22,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 6483.89551 ± 151.424
2025-09-14 10:46:22,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6140.11), np.float32(6741.8516), np.float32(6524.4976), np.float32(6457.2725), np.float32(6515.626), np.float32(6419.574), np.float32(6426.007), np.float32(6559.9062), np.float32(6637.0684), np.float32(6417.039)]
2025-09-14 10:46:22,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:46:22,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 51 minutes, 47 seconds)
2025-09-14 10:49:19,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 10:49:25,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 6581.50879 ± 103.060
2025-09-14 10:49:25,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6790.9253), np.float32(6479.6245), np.float32(6688.832), np.float32(6603.5923), np.float32(6545.611), np.float32(6500.1177), np.float32(6637.143), np.float32(6468.574), np.float32(6634.788), np.float32(6465.8774)]
2025-09-14 10:49:25,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:49:25,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (6581.51) for latency 3
2025-09-14 10:49:25,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 48 minutes, 46 seconds)
2025-09-14 10:52:25,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 10:52:30,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 6684.43262 ± 133.412
2025-09-14 10:52:30,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6389.5645), np.float32(6715.473), np.float32(6816.095), np.float32(6688.4897), np.float32(6846.068), np.float32(6639.933), np.float32(6610.1753), np.float32(6717.6084), np.float32(6575.094), np.float32(6845.828)]
2025-09-14 10:52:30,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:52:30,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (6684.43) for latency 3
2025-09-14 10:52:30,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 46 minutes)
2025-09-14 10:55:31,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 10:55:36,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 6751.23975 ± 104.654
2025-09-14 10:55:36,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6576.889), np.float32(6936.064), np.float32(6833.389), np.float32(6782.557), np.float32(6701.658), np.float32(6679.9624), np.float32(6742.5586), np.float32(6625.0347), np.float32(6867.1875), np.float32(6767.1)]
2025-09-14 10:55:36,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:55:36,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (6751.24) for latency 3
2025-09-14 10:55:36,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 43 minutes, 6 seconds)
2025-09-14 10:58:34,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 10:58:40,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 6701.39844 ± 211.101
2025-09-14 10:58:40,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6912.6323), np.float32(7151.0757), np.float32(6661.2256), np.float32(6667.9507), np.float32(6669.452), np.float32(6875.3174), np.float32(6498.554), np.float32(6412.892), np.float32(6504.583), np.float32(6660.3037)]
2025-09-14 10:58:40,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:58:40,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 39 minutes, 59 seconds)
2025-09-14 11:01:39,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 11:01:45,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 6660.37109 ± 149.404
2025-09-14 11:01:45,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6450.3374), np.float32(6789.552), np.float32(6511.3965), np.float32(6512.077), np.float32(6812.8994), np.float32(6678.982), np.float32(6746.471), np.float32(6820.2515), np.float32(6466.399), np.float32(6815.345)]
2025-09-14 11:01:45,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:01:45,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 36 minutes, 51 seconds)
2025-09-14 11:04:42,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 11:04:48,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 6780.55371 ± 167.006
2025-09-14 11:04:48,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6714.6025), np.float32(6712.3955), np.float32(6855.12), np.float32(6382.382), np.float32(6926.153), np.float32(6810.1826), np.float32(6834.352), np.float32(6800.673), np.float32(7056.1396), np.float32(6713.537)]
2025-09-14 11:04:48,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:04:48,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (6780.55) for latency 3
2025-09-14 11:04:48,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 33 minutes, 55 seconds)
2025-09-14 11:07:47,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 11:07:52,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 6829.76318 ± 197.351
2025-09-14 11:07:52,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6857.534), np.float32(7197.4785), np.float32(6856.8315), np.float32(6761.4243), np.float32(6891.656), np.float32(6965.287), np.float32(6936.577), np.float32(6643.505), np.float32(6408.865), np.float32(6778.4717)]
2025-09-14 11:07:52,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:07:52,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (6829.76) for latency 3
2025-09-14 11:07:52,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 30 minutes, 33 seconds)
2025-09-14 11:10:50,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 11:10:56,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 6735.03125 ± 133.303
2025-09-14 11:10:56,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6664.7007), np.float32(6872.6865), np.float32(6910.5166), np.float32(6720.7056), np.float32(6740.873), np.float32(6797.817), np.float32(6779.71), np.float32(6709.5503), np.float32(6395.0044), np.float32(6758.748)]
2025-09-14 11:10:56,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:10:56,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 27 minutes, 6 seconds)
2025-09-14 11:13:56,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 11:14:02,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 6886.20947 ± 112.030
2025-09-14 11:14:02,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6974.476), np.float32(6721.064), np.float32(6933.017), np.float32(7008.3535), np.float32(6756.2095), np.float32(7051.4136), np.float32(6881.8438), np.float32(6767.0454), np.float32(6970.691), np.float32(6797.9785)]
2025-09-14 11:14:02,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:14:02,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (6886.21) for latency 3
2025-09-14 11:14:02,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 24 minutes, 27 seconds)
2025-09-14 11:17:01,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 11:17:06,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 6524.62744 ± 1279.660
2025-09-14 11:17:06,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6779.1196), np.float32(7113.972), np.float32(6676.9243), np.float32(6788.657), np.float32(7204.2344), np.float32(6922.075), np.float32(7011.2666), np.float32(2713.913), np.float32(7068.832), np.float32(6967.277)]
2025-09-14 11:17:06,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:17:06,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 21 minutes, 16 seconds)
2025-09-14 11:20:07,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 11:20:13,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 7037.82031 ± 116.343
2025-09-14 11:20:13,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6864.3057), np.float32(7194.6865), np.float32(7031.129), np.float32(7138.149), np.float32(7133.2734), np.float32(6937.1455), np.float32(6836.8457), np.float32(7091.9404), np.float32(7126.0894), np.float32(7024.6406)]
2025-09-14 11:20:13,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:20:13,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (7037.82) for latency 3
2025-09-14 11:20:13,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 18 minutes, 40 seconds)
2025-09-14 11:23:12,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 11:23:18,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 6953.72803 ± 131.139
2025-09-14 11:23:18,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6719.149), np.float32(6988.448), np.float32(7099.7617), np.float32(7106.3687), np.float32(6996.5435), np.float32(6760.123), np.float32(7080.176), np.float32(6837.1475), np.float32(7000.602), np.float32(6948.964)]
2025-09-14 11:23:18,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:23:18,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 15 minutes, 44 seconds)
2025-09-14 11:26:15,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 11:26:21,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 7155.71484 ± 107.115
2025-09-14 11:26:21,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7235.9546), np.float32(7162.3105), np.float32(6998.205), np.float32(7348.4097), np.float32(7250.9946), np.float32(7101.625), np.float32(7049.3755), np.float32(7159.948), np.float32(7227.9194), np.float32(7022.409)]
2025-09-14 11:26:21,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:26:21,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (7155.71) for latency 3
2025-09-14 11:26:21,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 12 minutes, 35 seconds)
2025-09-14 11:29:21,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 11:29:26,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 7053.48926 ± 151.361
2025-09-14 11:29:26,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7206.2476), np.float32(7071.9644), np.float32(7183.0767), np.float32(7083.9556), np.float32(7008.6406), np.float32(6892.937), np.float32(6737.148), np.float32(7077.2773), np.float32(6986.1665), np.float32(7287.4688)]
2025-09-14 11:29:26,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:29:26,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 9 minutes, 22 seconds)
2025-09-14 11:32:24,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 11:32:30,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 7032.53125 ± 142.633
2025-09-14 11:32:30,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(6918.216), np.float32(6822.5425), np.float32(7071.344), np.float32(6919.5083), np.float32(6921.2354), np.float32(7046.877), np.float32(7236.4165), np.float32(7040.965), np.float32(7036.8486), np.float32(7311.362)]
2025-09-14 11:32:30,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:32:30,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 6 minutes, 16 seconds)
2025-09-14 11:35:30,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 11:35:36,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 7121.68506 ± 180.084
2025-09-14 11:35:36,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7190.6533), np.float32(6889.969), np.float32(7058.1025), np.float32(7274.7925), np.float32(6827.09), np.float32(7153.1567), np.float32(7035.96), np.float32(7047.845), np.float32(7283.5435), np.float32(7455.7383)]
2025-09-14 11:35:36,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:35:36,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 3 minutes, 2 seconds)
2025-09-14 11:38:36,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 11:38:41,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 7038.36035 ± 569.517
2025-09-14 11:38:41,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7373.31), np.float32(7312.343), np.float32(5374.2275), np.float32(7408.8115), np.float32(7175.4185), np.float32(7015.875), np.float32(7301.762), np.float32(7169.523), np.float32(7245.2646), np.float32(7007.0635)]
2025-09-14 11:38:41,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:38:41,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 3 seconds)
2025-09-14 11:41:40,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 11:41:46,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 7195.01270 ± 83.150
2025-09-14 11:41:46,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7238.289), np.float32(7196.591), np.float32(7272.6533), np.float32(7087.375), np.float32(7125.722), np.float32(7020.07), np.float32(7224.2017), np.float32(7261.21), np.float32(7257.897), np.float32(7266.1147)]
2025-09-14 11:41:46,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:41:46,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (7195.01) for latency 3
2025-09-14 11:41:46,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 57 minutes, 8 seconds)
2025-09-14 11:44:44,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 11:44:50,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 6869.94385 ± 1337.113
2025-09-14 11:44:50,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7362.6885), np.float32(2872.7893), np.float32(7400.423), np.float32(7221.0513), np.float32(7487.204), np.float32(7287.583), np.float32(7123.2314), np.float32(7171.0645), np.float32(7317.2505), np.float32(7456.1553)]
2025-09-14 11:44:50,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:44:50,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 53 minutes, 53 seconds)
2025-09-14 11:47:48,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 11:47:54,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 7271.01562 ± 96.390
2025-09-14 11:47:54,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7204.01), np.float32(7319.887), np.float32(7237.3965), np.float32(7483.457), np.float32(7096.748), np.float32(7293.4556), np.float32(7336.2695), np.float32(7203.18), np.float32(7270.8394), np.float32(7264.9175)]
2025-09-14 11:47:54,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:47:54,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (7271.02) for latency 3
2025-09-14 11:47:54,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 50 minutes, 50 seconds)
2025-09-14 11:50:51,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 11:50:57,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 7279.87109 ± 82.294
2025-09-14 11:50:57,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7253.6743), np.float32(7253.6895), np.float32(7203.803), np.float32(7325.839), np.float32(7382.39), np.float32(7302.5425), np.float32(7152.314), np.float32(7445.301), np.float32(7223.1045), np.float32(7256.0522)]
2025-09-14 11:50:57,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:50:57,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (7279.87) for latency 3
2025-09-14 11:50:57,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 47 minutes, 30 seconds)
2025-09-14 11:53:58,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 11:54:03,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 7431.31641 ± 100.675
2025-09-14 11:54:03,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7561.4604), np.float32(7405.805), np.float32(7440.954), np.float32(7575.42), np.float32(7488.64), np.float32(7445.207), np.float32(7493.193), np.float32(7302.9956), np.float32(7252.2485), np.float32(7347.2417)]
2025-09-14 11:54:03,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:54:03,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (7431.32) for latency 3
2025-09-14 11:54:03,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 44 minutes, 30 seconds)
2025-09-14 11:57:02,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 11:57:07,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 7406.11084 ± 97.510
2025-09-14 11:57:07,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7301.381), np.float32(7510.41), np.float32(7244.194), np.float32(7388.9526), np.float32(7446.965), np.float32(7484.0093), np.float32(7560.3013), np.float32(7322.5044), np.float32(7337.2715), np.float32(7465.1196)]
2025-09-14 11:57:07,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:57:07,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 41 minutes, 23 seconds)
2025-09-14 12:00:05,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 12:00:11,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 7482.60547 ± 90.650
2025-09-14 12:00:11,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7411.2524), np.float32(7548.5977), np.float32(7473.828), np.float32(7422.6616), np.float32(7478.715), np.float32(7358.5757), np.float32(7430.0625), np.float32(7431.7236), np.float32(7606.4404), np.float32(7664.193)]
2025-09-14 12:00:11,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:00:11,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (7482.61) for latency 3
2025-09-14 12:00:11,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 38 minutes, 15 seconds)
2025-09-14 12:03:10,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 12:03:15,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 7359.29297 ± 152.603
2025-09-14 12:03:15,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7452.0864), np.float32(7484.2666), np.float32(7644.5107), np.float32(7252.9614), np.float32(7415.5845), np.float32(7102.9233), np.float32(7280.859), np.float32(7190.8213), np.float32(7309.2944), np.float32(7459.6157)]
2025-09-14 12:03:15,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:03:15,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 35 minutes, 12 seconds)
2025-09-14 12:06:15,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 12:06:21,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 7599.52344 ± 188.745
2025-09-14 12:06:21,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7266.916), np.float32(7924.238), np.float32(7710.455), np.float32(7549.3926), np.float32(7320.4385), np.float32(7642.9487), np.float32(7706.066), np.float32(7597.602), np.float32(7513.7295), np.float32(7763.45)]
2025-09-14 12:06:21,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:06:21,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (7599.52) for latency 3
2025-09-14 12:06:21,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 32 minutes, 22 seconds)
2025-09-14 12:09:21,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 12:09:27,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 7557.68750 ± 139.939
2025-09-14 12:09:27,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7498.464), np.float32(7404.2515), np.float32(7307.0864), np.float32(7632.6724), np.float32(7419.398), np.float32(7536.3457), np.float32(7753.091), np.float32(7648.913), np.float32(7679.951), np.float32(7696.6997)]
2025-09-14 12:09:27,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:09:27,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 29 minutes, 15 seconds)
2025-09-14 12:12:26,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 12:12:32,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 7583.35840 ± 158.260
2025-09-14 12:12:32,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7555.41), np.float32(7918.044), np.float32(7329.485), np.float32(7650.0396), np.float32(7729.449), np.float32(7555.4224), np.float32(7544.71), np.float32(7497.379), np.float32(7401.5913), np.float32(7652.053)]
2025-09-14 12:12:32,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:12:32,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 26 minutes, 17 seconds)
2025-09-14 12:15:29,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 12:15:35,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 7413.75293 ± 103.590
2025-09-14 12:15:35,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7564.91), np.float32(7414.2256), np.float32(7436.688), np.float32(7318.788), np.float32(7396.8096), np.float32(7530.6245), np.float32(7230.8335), np.float32(7282.66), np.float32(7506.3086), np.float32(7455.684)]
2025-09-14 12:15:35,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:15:35,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 23 minutes, 11 seconds)
2025-09-14 12:18:34,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 12:18:40,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 7070.71094 ± 1805.750
2025-09-14 12:18:40,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7695.1333), np.float32(7649.9473), np.float32(1668.6157), np.float32(7409.1914), np.float32(7662.4873), np.float32(7907.693), np.float32(7591.617), np.float32(7598.556), np.float32(7881.129), np.float32(7642.7373)]
2025-09-14 12:18:40,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:18:40,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 20 minutes, 7 seconds)
2025-09-14 12:21:38,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 12:21:44,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 7674.39160 ± 153.946
2025-09-14 12:21:44,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7902.6924), np.float32(7468.3223), np.float32(7497.7153), np.float32(7745.3916), np.float32(7580.575), np.float32(7569.8887), np.float32(7934.828), np.float32(7584.0386), np.float32(7776.078), np.float32(7684.3853)]
2025-09-14 12:21:44,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:21:44,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (7674.39) for latency 3
2025-09-14 12:21:44,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 16 minutes, 53 seconds)
2025-09-14 12:24:41,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 12:24:47,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 7816.15527 ± 109.813
2025-09-14 12:24:47,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7853.79), np.float32(7851.313), np.float32(7873.1445), np.float32(7940.6777), np.float32(7920.987), np.float32(7551.7256), np.float32(7774.7905), np.float32(7772.4175), np.float32(7725.9116), np.float32(7896.798)]
2025-09-14 12:24:47,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:24:47,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (7816.16) for latency 3
2025-09-14 12:24:47,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 13 minutes, 35 seconds)
2025-09-14 12:27:28,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 12:27:33,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 7706.10547 ± 181.267
2025-09-14 12:27:33,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7533.627), np.float32(7707.449), np.float32(7838.6064), np.float32(7595.7964), np.float32(7460.1997), np.float32(7951.7065), np.float32(7844.8887), np.float32(7473.7505), np.float32(7978.482), np.float32(7676.5493)]
2025-09-14 12:27:33,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:27:33,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 9 minutes, 4 seconds)
2025-09-14 12:30:14,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 12:30:19,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 6748.10156 ± 1986.174
2025-09-14 12:30:19,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7633.929), np.float32(7540.2627), np.float32(983.4027), np.float32(7728.5273), np.float32(7599.76), np.float32(5922.799), np.float32(7600.446), np.float32(7621.771), np.float32(7300.7935), np.float32(7549.3237)]
2025-09-14 12:30:19,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:30:19,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 4 minutes, 49 seconds)
2025-09-14 12:33:03,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 12:33:08,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 7833.44531 ± 122.746
2025-09-14 12:33:08,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7651.644), np.float32(7924.4556), np.float32(7890.1455), np.float32(7836.56), np.float32(7915.8755), np.float32(7611.155), np.float32(7791.677), np.float32(7856.6343), np.float32(8049.968), np.float32(7806.342)]
2025-09-14 12:33:08,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:33:08,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (7833.45) for latency 3
2025-09-14 12:33:08,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 47 seconds)
2025-09-14 12:35:49,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 12:35:54,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 7898.32178 ± 104.294
2025-09-14 12:35:54,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7846.393), np.float32(7864.879), np.float32(7867.1294), np.float32(7762.4526), np.float32(7804.6875), np.float32(7802.132), np.float32(7909.4756), np.float32(8003.7617), np.float32(8020.957), np.float32(8101.355)]
2025-09-14 12:35:54,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:35:54,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (7898.32) for latency 3
2025-09-14 12:35:54,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 56 minutes, 43 seconds)
2025-09-14 12:38:32,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 12:38:38,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 7838.29834 ± 133.345
2025-09-14 12:38:38,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7867.9307), np.float32(7604.9478), np.float32(7748.7017), np.float32(7844.881), np.float32(7787.826), np.float32(7854.4688), np.float32(7750.168), np.float32(7827.5894), np.float32(7960.01), np.float32(8136.463)]
2025-09-14 12:38:38,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:38:38,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 52 minutes, 37 seconds)
2025-09-14 12:41:20,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 12:41:25,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 7115.32178 ± 1601.665
2025-09-14 12:41:25,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7777.762), np.float32(7677.285), np.float32(7899.5547), np.float32(2358.5635), np.float32(7124.678), np.float32(7530.1333), np.float32(7616.215), np.float32(7988.4526), np.float32(7489.533), np.float32(7691.0415)]
2025-09-14 12:41:25,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:41:25,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 49 minutes, 54 seconds)
2025-09-14 12:44:08,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 12:44:13,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 7649.77051 ± 146.725
2025-09-14 12:44:13,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7934.457), np.float32(7504.711), np.float32(7477.3765), np.float32(7659.2925), np.float32(7686.4907), np.float32(7476.2827), np.float32(7617.7485), np.float32(7570.9087), np.float32(7721.985), np.float32(7848.457)]
2025-09-14 12:44:13,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:44:13,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 47 minutes, 14 seconds)
2025-09-14 12:46:52,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 12:46:57,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 7939.54199 ± 213.455
2025-09-14 12:46:57,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7577.266), np.float32(7628.0117), np.float32(7734.6006), np.float32(7980.2856), np.float32(8067.6035), np.float32(7945.0996), np.float32(8277.946), np.float32(7988.2075), np.float32(8100.5615), np.float32(8095.8394)]
2025-09-14 12:46:57,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:46:57,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (7939.54) for latency 3
2025-09-14 12:46:57,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 44 minutes, 12 seconds)
2025-09-14 12:49:37,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 12:49:42,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 7846.59131 ± 286.835
2025-09-14 12:49:42,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(8073.89), np.float32(8061.404), np.float32(8008.279), np.float32(7783.966), np.float32(7112.5767), np.float32(7931.0225), np.float32(7996.785), np.float32(7574.81), np.float32(8075.8037), np.float32(7847.3774)]
2025-09-14 12:49:42,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:49:42,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 41 minutes, 23 seconds)
2025-09-14 12:52:19,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 12:52:24,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 7992.78613 ± 163.701
2025-09-14 12:52:24,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(8035.375), np.float32(8098.6064), np.float32(8123.437), np.float32(8258.175), np.float32(7894.9106), np.float32(7753.077), np.float32(7784.911), np.float32(7851.316), np.float32(7947.7256), np.float32(8180.33)]
2025-09-14 12:52:24,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:52:24,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (7992.79) for latency 3
2025-09-14 12:52:24,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 38 minutes, 34 seconds)
2025-09-14 12:54:40,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 12:54:45,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 8028.55566 ± 124.698
2025-09-14 12:54:45,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7966.9517), np.float32(8179.0273), np.float32(8057.036), np.float32(8098.722), np.float32(7895.987), np.float32(8129.555), np.float32(7918.8364), np.float32(8166.461), np.float32(7782.315), np.float32(8090.667)]
2025-09-14 12:54:45,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:54:45,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (8028.56) for latency 3
2025-09-14 12:54:46,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 34 minutes, 41 seconds)
2025-09-14 12:57:09,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 12:57:14,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 8142.91113 ± 181.405
2025-09-14 12:57:14,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(8009.9736), np.float32(8007.6777), np.float32(8578.949), np.float32(8052.61), np.float32(8182.1323), np.float32(7879.481), np.float32(8213.813), np.float32(8122.4297), np.float32(8107.882), np.float32(8274.162)]
2025-09-14 12:57:14,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:57:14,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (8142.91) for latency 3
2025-09-14 12:57:14,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 31 minutes, 15 seconds)
2025-09-14 12:59:33,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 12:59:38,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 8002.71387 ± 134.583
2025-09-14 12:59:38,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7908.368), np.float32(7877.903), np.float32(7907.335), np.float32(7988.914), np.float32(8216.742), np.float32(8169.6523), np.float32(8005.8384), np.float32(8095.5815), np.float32(7764.078), np.float32(8092.725)]
2025-09-14 12:59:38,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:59:38,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 27 minutes, 53 seconds)
2025-09-14 13:01:58,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 13:02:03,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 8030.26953 ± 144.456
2025-09-14 13:02:03,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(7909.9844), np.float32(7912.5117), np.float32(8123.3105), np.float32(8082.5894), np.float32(8244.247), np.float32(7804.925), np.float32(8199.692), np.float32(7895.661), np.float32(8172.4966), np.float32(7957.2764)]
2025-09-14 13:02:03,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:02:03,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 24 minutes, 40 seconds)
2025-09-14 13:04:25,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 13:04:30,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 8007.25879 ± 95.941
2025-09-14 13:04:30,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(8071.8965), np.float32(7922.439), np.float32(8082.2856), np.float32(8044.519), np.float32(7931.182), np.float32(7955.1406), np.float32(7821.3096), np.float32(8104.225), np.float32(8151.499), np.float32(7988.089)]
2025-09-14 13:04:30,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:04:30,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 21 minutes, 46 seconds)
2025-09-14 13:06:47,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 13:06:52,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 7999.11963 ± 195.454
2025-09-14 13:06:52,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(8025.1807), np.float32(8239.162), np.float32(7753.352), np.float32(8271.629), np.float32(7802.401), np.float32(7814.5674), np.float32(8039.933), np.float32(7827.967), np.float32(7933.267), np.float32(8283.737)]
2025-09-14 13:06:52,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:06:52,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 19 minutes, 21 seconds)
2025-09-14 13:09:16,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 13:09:22,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 7743.97656 ± 1067.632
2025-09-14 13:09:22,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(8021.2817), np.float32(8008.177), np.float32(8370.063), np.float32(7988.1167), np.float32(8163.9194), np.float32(4562.9414), np.float32(8256.224), np.float32(8085.647), np.float32(7953.6943), np.float32(8029.7)]
2025-09-14 13:09:22,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:09:22,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 16 minutes, 58 seconds)
2025-09-14 13:11:46,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 13:11:51,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 8064.90527 ± 146.610
2025-09-14 13:11:51,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(8015.306), np.float32(7848.124), np.float32(8241.538), np.float32(7926.1865), np.float32(8204.847), np.float32(8000.304), np.float32(7999.535), np.float32(8350.874), np.float32(8069.663), np.float32(7992.681)]
2025-09-14 13:11:51,278 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:11:51,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 14 minutes, 39 seconds)
2025-09-14 13:14:17,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 13:14:22,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 7473.25879 ± 1934.972
2025-09-14 13:14:22,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(8022.2534), np.float32(8229.577), np.float32(1682.7844), np.float32(8332.006), np.float32(8160.1846), np.float32(8118.885), np.float32(7807.7056), np.float32(8041.4966), np.float32(8099.645), np.float32(8238.048)]
2025-09-14 13:14:22,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:14:22,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 12 minutes, 19 seconds)
2025-09-14 13:16:47,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 13:16:52,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 6793.44141 ± 2464.439
2025-09-14 13:16:52,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(8035.1562), np.float32(7889.353), np.float32(7810.9766), np.float32(7638.838), np.float32(8151.6255), np.float32(3065.4712), np.float32(888.9096), np.float32(8184.408), np.float32(7908.771), np.float32(8360.909)]
2025-09-14 13:16:52,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:16:52,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 9 minutes, 53 seconds)
2025-09-14 13:19:12,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 13:19:17,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 8234.97461 ± 187.920
2025-09-14 13:19:17,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(8127.8896), np.float32(8116.764), np.float32(8605.574), np.float32(8325.776), np.float32(8038.6216), np.float32(8020.9116), np.float32(8172.8853), np.float32(8263.887), np.float32(8527.813), np.float32(8149.6226)]
2025-09-14 13:19:17,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:19:17,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (8234.97) for latency 3
2025-09-14 13:19:17,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 7 minutes, 27 seconds)
2025-09-14 13:21:40,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 13:21:45,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 8024.73975 ± 119.104
2025-09-14 13:21:45,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(8140.724), np.float32(8163.8687), np.float32(7858.089), np.float32(7984.421), np.float32(8100.909), np.float32(7942.9805), np.float32(7876.438), np.float32(8154.1157), np.float32(8131.571), np.float32(7894.2827)]
2025-09-14 13:21:45,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:21:45,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 57 seconds)
2025-09-14 13:24:05,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 13:24:10,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 8281.45508 ± 143.900
2025-09-14 13:24:10,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(8341.279), np.float32(8148.927), np.float32(8268.217), np.float32(8292.532), np.float32(8018.817), np.float32(8225.944), np.float32(8273.784), np.float32(8501.4795), np.float32(8527.96), np.float32(8215.598)]
2025-09-14 13:24:10,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:24:10,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1226 [INFO]: New best (8281.46) for latency 3
2025-09-14 13:24:10,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 27 seconds)
2025-09-14 13:26:32,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1214 [DEBUG]: Evaluating for latency 3...
2025-09-14 13:26:37,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1221 [DEBUG]: Total Reward: 8179.16260 ± 136.854
2025-09-14 13:26:37,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(8003.082), np.float32(8146.632), np.float32(7997.243), np.float32(8261.15), np.float32(8052.6543), np.float32(8144.886), np.float32(8344.21), np.float32(8334.243), np.float32(8119.9536), np.float32(8387.574)]
2025-09-14 13:26:37,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:26:37,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille25-halfcheetah):1251 [DEBUG]: Training session finished
