2025-09-14 08:43:01,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1108 [DEBUG]: logdir: _logs/noise-eval-v2/halfcheetah/bpql-noise_0.200-delay_9
2025-09-14 08:43:01,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1109 [DEBUG]: trainer_prefix: noise-eval-v2/halfcheetah/bpql-noise_0.200-delay_9
2025-09-14 08:43:01,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'9': <latency_env.delayed_mdp.ConstantDelay object at 0x7fda04c4a6f0>}
2025-09-14 08:43:01,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1111 [DEBUG]: using device: cpu
2025-09-14 08:43:01,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-14 08:43:01,629 baseline-bpql-noisepromille200-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=71, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-14 08:43:01,629 baseline-bpql-noisepromille200-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-14 08:43:03,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-14 08:43:03,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-14 08:45:37,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 08:45:44,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: -354.32599 ± 44.855
2025-09-14 08:45:44,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-321.50488), np.float32(-266.3306), np.float32(-350.51013), np.float32(-361.1062), np.float32(-316.9024), np.float32(-446.0251), np.float32(-365.61148), np.float32(-376.81186), np.float32(-355.95737), np.float32(-382.5001)]
2025-09-14 08:45:44,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 08:45:44,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (-354.33) for latency 9
2025-09-14 08:45:44,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 4 hours, 25 minutes, 12 seconds)
2025-09-14 08:48:20,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 08:48:27,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: -241.80017 ± 32.064
2025-09-14 08:48:27,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-255.81873), np.float32(-198.18631), np.float32(-285.38162), np.float32(-244.9398), np.float32(-258.34067), np.float32(-259.5471), np.float32(-197.28413), np.float32(-287.89825), np.float32(-224.62521), np.float32(-205.98001)]
2025-09-14 08:48:27,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 08:48:27,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (-241.80) for latency 9
2025-09-14 08:48:27,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 4 hours, 24 minutes, 44 seconds)
2025-09-14 08:51:12,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 08:51:19,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: -177.85091 ± 87.765
2025-09-14 08:51:19,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-197.39813), np.float32(-82.07032), np.float32(-188.56755), np.float32(-151.90463), np.float32(-407.45966), np.float32(-120.12518), np.float32(-99.008804), np.float32(-170.2944), np.float32(-134.26314), np.float32(-227.41722)]
2025-09-14 08:51:19,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 08:51:19,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (-177.85) for latency 9
2025-09-14 08:51:19,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 4 hours, 27 minutes, 12 seconds)
2025-09-14 08:54:03,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 08:54:11,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 4.85424 ± 106.076
2025-09-14 08:54:11,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4.430981), np.float32(239.57996), np.float32(-22.043804), np.float32(78.86317), np.float32(-13.48165), np.float32(12.6176), np.float32(-36.539387), np.float32(-102.70768), np.float32(-178.80247), np.float32(66.62568)]
2025-09-14 08:54:11,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 08:54:11,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (4.85) for latency 9
2025-09-14 08:54:11,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 4 hours, 27 minutes, 7 seconds)
2025-09-14 08:57:05,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 08:57:14,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 150.58054 ± 167.493
2025-09-14 08:57:14,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(112.8758), np.float32(30.484539), np.float32(349.4621), np.float32(-54.301815), np.float32(363.5678), np.float32(164.80217), np.float32(155.48549), np.float32(423.65546), np.float32(28.529715), np.float32(-68.755875)]
2025-09-14 08:57:14,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 08:57:14,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (150.58) for latency 9
2025-09-14 08:57:14,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 4 hours, 29 minutes, 23 seconds)
2025-09-14 09:00:30,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:00:39,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 253.45955 ± 183.532
2025-09-14 09:00:39,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(172.36835), np.float32(564.73486), np.float32(161.6278), np.float32(375.59158), np.float32(295.86673), np.float32(-51.481224), np.float32(291.47525), np.float32(-14.932797), np.float32(283.81274), np.float32(455.53214)]
2025-09-14 09:00:39,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:00:39,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (253.46) for latency 9
2025-09-14 09:00:39,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 40 minutes, 34 seconds)
2025-09-14 09:03:59,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:04:08,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 380.43188 ± 235.435
2025-09-14 09:04:08,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(647.27515), np.float32(153.3215), np.float32(117.00281), np.float32(400.8783), np.float32(207.75111), np.float32(835.3533), np.float32(246.80164), np.float32(309.58282), np.float32(660.17316), np.float32(226.17929)]
2025-09-14 09:04:08,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:04:08,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (380.43) for latency 9
2025-09-14 09:04:08,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 51 minutes, 45 seconds)
2025-09-14 09:07:23,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:07:31,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 598.17548 ± 198.187
2025-09-14 09:07:31,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(401.52292), np.float32(831.49725), np.float32(511.07925), np.float32(510.34473), np.float32(532.12994), np.float32(549.3298), np.float32(815.03876), np.float32(487.37558), np.float32(349.61722), np.float32(993.8194)]
2025-09-14 09:07:31,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:07:31,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (598.18) for latency 9
2025-09-14 09:07:31,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 58 minutes, 16 seconds)
2025-09-14 09:10:44,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:10:53,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 789.84515 ± 98.323
2025-09-14 09:10:53,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(853.97015), np.float32(881.8303), np.float32(815.22626), np.float32(782.84155), np.float32(725.6645), np.float32(600.3772), np.float32(920.88965), np.float32(847.4077), np.float32(643.1566), np.float32(827.08746)]
2025-09-14 09:10:53,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:10:53,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (789.85) for latency 9
2025-09-14 09:10:53,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 5 hours, 3 minutes, 58 seconds)
2025-09-14 09:14:05,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:14:14,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 723.10925 ± 99.184
2025-09-14 09:14:14,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(810.0176), np.float32(672.8242), np.float32(834.97974), np.float32(795.8044), np.float32(625.5393), np.float32(627.5847), np.float32(783.16003), np.float32(791.41876), np.float32(770.0537), np.float32(519.7093)]
2025-09-14 09:14:14,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:14:14,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 5 hours, 6 minutes, 7 seconds)
2025-09-14 09:17:26,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:17:35,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 875.11884 ± 82.268
2025-09-14 09:17:35,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(924.4737), np.float32(899.90314), np.float32(742.7043), np.float32(905.4748), np.float32(1051.1211), np.float32(868.35913), np.float32(770.49713), np.float32(910.1864), np.float32(853.4145), np.float32(825.0545)]
2025-09-14 09:17:35,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:17:35,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (875.12) for latency 9
2025-09-14 09:17:35,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 5 hours, 1 minute, 14 seconds)
2025-09-14 09:20:50,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:21:00,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1004.04431 ± 201.389
2025-09-14 09:21:00,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(984.3754), np.float32(1037.4778), np.float32(765.26685), np.float32(855.1206), np.float32(1365.952), np.float32(1237.1046), np.float32(667.5936), np.float32(1095.7633), np.float32(1109.9418), np.float32(921.84717)]
2025-09-14 09:21:00,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:21:00,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1004.04) for latency 9
2025-09-14 09:21:00,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 56 minutes, 42 seconds)
2025-09-14 09:24:24,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:24:34,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 978.18933 ± 110.329
2025-09-14 09:24:34,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(855.35834), np.float32(1077.3735), np.float32(1006.98975), np.float32(1001.02216), np.float32(848.7052), np.float32(1101.091), np.float32(801.7117), np.float32(1060.2534), np.float32(1122.0094), np.float32(907.37854)]
2025-09-14 09:24:34,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:24:34,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 56 minutes, 34 seconds)
2025-09-14 09:27:59,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:28:09,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1307.34680 ± 264.428
2025-09-14 09:28:09,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1587.3002), np.float32(1534.8585), np.float32(1345.1956), np.float32(1242.5005), np.float32(802.97174), np.float32(963.9914), np.float32(1191.3558), np.float32(1716.8264), np.float32(1283.7556), np.float32(1404.7123)]
2025-09-14 09:28:09,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:28:09,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1307.35) for latency 9
2025-09-14 09:28:09,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 56 minutes, 54 seconds)
2025-09-14 09:31:33,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:31:43,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1206.24976 ± 296.788
2025-09-14 09:31:43,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1427.0282), np.float32(842.45807), np.float32(1652.8767), np.float32(761.7771), np.float32(870.3752), np.float32(1461.6161), np.float32(1160.2693), np.float32(1455.1938), np.float32(1041.6277), np.float32(1389.2761)]
2025-09-14 09:31:43,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:31:43,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 57 minutes, 8 seconds)
2025-09-14 09:34:47,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:34:55,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1241.23596 ± 431.336
2025-09-14 09:34:55,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1067.2891), np.float32(918.68), np.float32(827.0691), np.float32(1901.1909), np.float32(1754.764), np.float32(824.53674), np.float32(1286.8108), np.float32(771.1121), np.float32(1148.7512), np.float32(1912.1554)]
2025-09-14 09:34:55,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:34:55,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 4 hours, 51 minutes, 15 seconds)
2025-09-14 09:37:38,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:37:45,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1315.48401 ± 295.991
2025-09-14 09:37:45,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1565.6316), np.float32(1394.612), np.float32(946.743), np.float32(1861.8005), np.float32(1620.1194), np.float32(1282.2368), np.float32(1314.187), np.float32(1080.0779), np.float32(848.65247), np.float32(1240.7788)]
2025-09-14 09:37:45,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:37:45,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1315.48) for latency 9
2025-09-14 09:37:45,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours, 38 minutes, 7 seconds)
2025-09-14 09:40:21,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:40:27,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1019.15686 ± 184.902
2025-09-14 09:40:27,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(836.99854), np.float32(730.7411), np.float32(1224.8506), np.float32(927.71967), np.float32(1246.3378), np.float32(992.05237), np.float32(943.806), np.float32(1313.2814), np.float32(1106.9742), np.float32(868.80695)]
2025-09-14 09:40:27,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:40:27,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 20 minutes, 35 seconds)
2025-09-14 09:43:03,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:43:10,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1522.39624 ± 400.697
2025-09-14 09:43:10,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1183.6311), np.float32(1573.5409), np.float32(2007.9731), np.float32(818.17786), np.float32(1697.0995), np.float32(1834.3584), np.float32(2192.2996), np.float32(1262.6719), np.float32(1461.543), np.float32(1192.6676)]
2025-09-14 09:43:10,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:43:10,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1522.40) for latency 9
2025-09-14 09:43:10,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 4 hours, 3 minutes, 19 seconds)
2025-09-14 09:46:26,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:46:35,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1319.51208 ± 369.530
2025-09-14 09:46:35,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1363.5137), np.float32(993.83466), np.float32(2087.1287), np.float32(1016.4891), np.float32(950.07434), np.float32(1094.0872), np.float32(1151.3358), np.float32(1584.0391), np.float32(1824.5173), np.float32(1130.1013)]
2025-09-14 09:46:35,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:46:35,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 3 hours, 58 minutes, 1 second)
2025-09-14 09:50:03,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:50:13,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1249.04175 ± 290.655
2025-09-14 09:50:13,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1228.1211), np.float32(1302.7722), np.float32(1416.8551), np.float32(1036.009), np.float32(1554.2765), np.float32(822.5415), np.float32(1682.0524), np.float32(1561.3505), np.float32(1045.1406), np.float32(841.29816)]
2025-09-14 09:50:13,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:50:13,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 4 hours, 1 minute, 50 seconds)
2025-09-14 09:53:41,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:53:51,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1338.41650 ± 304.094
2025-09-14 09:53:51,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1648.0085), np.float32(1386.6816), np.float32(1709.2253), np.float32(1032.7712), np.float32(1611.057), np.float32(892.4998), np.float32(1298.9523), np.float32(1685.58), np.float32(1209.7078), np.float32(909.68195)]
2025-09-14 09:53:51,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:53:51,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 4 hours, 11 minutes, 7 seconds)
2025-09-14 09:57:19,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 09:57:29,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1774.67029 ± 143.041
2025-09-14 09:57:29,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1534.7745), np.float32(1831.7745), np.float32(1643.5228), np.float32(1897.7086), np.float32(1731.4423), np.float32(1552.4308), np.float32(1830.0629), np.float32(1937.949), np.float32(1926.8412), np.float32(1860.1968)]
2025-09-14 09:57:29,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:57:29,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1774.67) for latency 9
2025-09-14 09:57:29,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 4 hours, 22 minutes, 12 seconds)
2025-09-14 10:00:58,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:01:07,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1183.12085 ± 437.916
2025-09-14 10:01:07,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1496.1035), np.float32(1015.26587), np.float32(986.6801), np.float32(808.52856), np.float32(1426.1296), np.float32(859.4236), np.float32(1219.8024), np.float32(2292.4827), np.float32(784.8009), np.float32(941.99115)]
2025-09-14 10:01:07,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:01:07,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 4 hours, 32 minutes, 59 seconds)
2025-09-14 10:04:35,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:04:45,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1306.44983 ± 414.950
2025-09-14 10:04:45,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(957.38617), np.float32(1109.2394), np.float32(1561.8933), np.float32(1678.4983), np.float32(1207.0822), np.float32(2125.4844), np.float32(897.36755), np.float32(830.26764), np.float32(975.08887), np.float32(1722.1908)]
2025-09-14 10:04:45,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:04:45,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 4 hours, 32 minutes, 25 seconds)
2025-09-14 10:08:13,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:08:23,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1763.55115 ± 598.907
2025-09-14 10:08:23,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2313.737), np.float32(1218.3698), np.float32(856.814), np.float32(1006.505), np.float32(2174.5151), np.float32(1971.3912), np.float32(2431.8613), np.float32(1161.5247), np.float32(2470.0327), np.float32(2030.7615)]
2025-09-14 10:08:23,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:08:23,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 4 hours, 28 minutes, 45 seconds)
2025-09-14 10:11:51,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:12:01,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1706.01245 ± 295.476
2025-09-14 10:12:01,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1813.848), np.float32(2080.183), np.float32(1815.9741), np.float32(1752.9261), np.float32(1372.5286), np.float32(1565.3202), np.float32(1193.024), np.float32(2016.474), np.float32(1396.3893), np.float32(2053.4575)]
2025-09-14 10:12:01,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:12:01,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 4 hours, 25 minutes, 7 seconds)
2025-09-14 10:15:28,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:15:38,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1589.73352 ± 420.581
2025-09-14 10:15:38,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1998.7988), np.float32(2048.7659), np.float32(1331.7026), np.float32(1181.0653), np.float32(2149.2017), np.float32(1537.284), np.float32(2008.2682), np.float32(1434.1799), np.float32(1395.853), np.float32(812.2138)]
2025-09-14 10:15:38,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:15:38,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 4 hours, 21 minutes, 20 seconds)
2025-09-14 10:19:06,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:19:16,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1456.68384 ± 370.278
2025-09-14 10:19:16,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1426.1486), np.float32(1810.3219), np.float32(1742.1979), np.float32(1146.6193), np.float32(1028.4501), np.float32(1344.5831), np.float32(1309.1813), np.float32(900.7976), np.float32(1712.4553), np.float32(2146.0833)]
2025-09-14 10:19:16,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:19:16,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 4 hours, 17 minutes, 33 seconds)
2025-09-14 10:22:43,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:22:53,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1602.80042 ± 524.919
2025-09-14 10:22:53,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1929.7023), np.float32(928.76), np.float32(1347.873), np.float32(1133.2968), np.float32(2435.6423), np.float32(872.4009), np.float32(2069.103), np.float32(1268.7119), np.float32(1986.541), np.float32(2055.9731)]
2025-09-14 10:22:53,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:22:53,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 4 hours, 13 minutes, 50 seconds)
2025-09-14 10:26:21,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:26:30,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1528.83789 ± 471.847
2025-09-14 10:26:30,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1867.8689), np.float32(1954.956), np.float32(2191.0754), np.float32(1597.9249), np.float32(2031.6986), np.float32(1644.2792), np.float32(773.7908), np.float32(1211.7577), np.float32(952.0268), np.float32(1063.0005)]
2025-09-14 10:26:30,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:26:30,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 4 hours, 10 minutes, 10 seconds)
2025-09-14 10:29:57,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:30:06,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1661.94922 ± 494.236
2025-09-14 10:30:06,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2328.7502), np.float32(2326.7205), np.float32(2017.5259), np.float32(928.1491), np.float32(1322.4353), np.float32(1160.3969), np.float32(1989.4685), np.float32(1504.8914), np.float32(1940.4066), np.float32(1100.7476)]
2025-09-14 10:30:06,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:30:06,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 4 hours, 6 minutes, 6 seconds)
2025-09-14 10:33:33,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:33:43,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1616.30530 ± 329.769
2025-09-14 10:33:43,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1707.5438), np.float32(2163.6084), np.float32(1314.6486), np.float32(1165.0997), np.float32(1307.8895), np.float32(1552.5579), np.float32(1589.0007), np.float32(1704.5656), np.float32(1450.4053), np.float32(2207.7336)]
2025-09-14 10:33:43,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:33:43,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 4 hours, 2 minutes, 15 seconds)
2025-09-14 10:37:10,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:37:20,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1684.89124 ± 486.318
2025-09-14 10:37:20,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(899.29596), np.float32(2015.1667), np.float32(840.93445), np.float32(1876.4176), np.float32(2130.2307), np.float32(1590.3116), np.float32(1324.2952), np.float32(2317.801), np.float32(1795.8251), np.float32(2058.6335)]
2025-09-14 10:37:20,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:37:20,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 58 minutes, 28 seconds)
2025-09-14 10:40:47,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:40:56,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1432.95154 ± 623.744
2025-09-14 10:40:56,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1293.4329), np.float32(764.868), np.float32(2512.9785), np.float32(1108.9183), np.float32(1255.6067), np.float32(2471.8025), np.float32(810.28156), np.float32(946.122), np.float32(2014.6372), np.float32(1150.8695)]
2025-09-14 10:40:56,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:40:56,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 54 minutes, 43 seconds)
2025-09-14 10:44:23,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:44:33,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1620.66345 ± 525.422
2025-09-14 10:44:33,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2331.2031), np.float32(2137.8394), np.float32(1810.9843), np.float32(990.1224), np.float32(1735.2545), np.float32(817.77673), np.float32(1185.122), np.float32(1179.1116), np.float32(2333.3037), np.float32(1685.9169)]
2025-09-14 10:44:33,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:44:33,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 50 minutes, 55 seconds)
2025-09-14 10:47:59,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:48:09,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1840.03320 ± 449.010
2025-09-14 10:48:09,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1469.8932), np.float32(2084.7598), np.float32(1170.3551), np.float32(1847.7195), np.float32(2557.7), np.float32(2097.111), np.float32(2321.061), np.float32(1915.4221), np.float32(1096.7518), np.float32(1839.5582)]
2025-09-14 10:48:09,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:48:09,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1840.03) for latency 9
2025-09-14 10:48:09,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 47 minutes, 21 seconds)
2025-09-14 10:51:35,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:51:45,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1534.73999 ± 544.866
2025-09-14 10:51:45,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1126.9531), np.float32(2285.9592), np.float32(1931.6721), np.float32(1919.1233), np.float32(2438.6968), np.float32(979.4601), np.float32(855.348), np.float32(1319.7446), np.float32(1502.077), np.float32(988.36615)]
2025-09-14 10:51:45,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:51:45,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 43 minutes, 39 seconds)
2025-09-14 10:55:08,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:55:17,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1727.74219 ± 421.967
2025-09-14 10:55:17,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1531.4249), np.float32(1994.694), np.float32(2404.4746), np.float32(1749.3701), np.float32(1753.1439), np.float32(1616.165), np.float32(1989.4335), np.float32(1122.6624), np.float32(2152.5918), np.float32(963.4607)]
2025-09-14 10:55:17,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:55:17,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 39 minutes, 2 seconds)
2025-09-14 10:58:38,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 10:58:48,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1560.56665 ± 468.608
2025-09-14 10:58:48,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2092.8525), np.float32(2012.2584), np.float32(1826.4021), np.float32(946.25635), np.float32(1519.596), np.float32(2409.749), np.float32(1192.3333), np.float32(1123.0233), np.float32(1181.4951), np.float32(1301.7)]
2025-09-14 10:58:48,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:58:48,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 34 minutes, 15 seconds)
2025-09-14 11:01:59,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:02:07,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1730.48242 ± 551.596
2025-09-14 11:02:07,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1683.0094), np.float32(1245.9866), np.float32(2432.813), np.float32(2135.7434), np.float32(1601.5819), np.float32(1382.8223), np.float32(1007.19635), np.float32(1966.7991), np.float32(1100.9149), np.float32(2747.9573)]
2025-09-14 11:02:07,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:02:07,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 27 minutes, 20 seconds)
2025-09-14 11:05:15,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:05:23,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1910.43286 ± 574.957
2025-09-14 11:05:23,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1170.4242), np.float32(1475.6478), np.float32(2087.5547), np.float32(2195.3577), np.float32(2226.5923), np.float32(1784.5978), np.float32(796.13617), np.float32(2086.868), np.float32(2610.9033), np.float32(2670.2463)]
2025-09-14 11:05:23,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:05:23,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1910.43) for latency 9
2025-09-14 11:05:23,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 19 minutes, 57 seconds)
2025-09-14 11:08:21,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:08:29,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2203.16553 ± 593.789
2025-09-14 11:08:29,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2950.0737), np.float32(2701.2239), np.float32(2230.7925), np.float32(1867.5844), np.float32(2735.8552), np.float32(1729.794), np.float32(2581.0532), np.float32(2683.264), np.float32(1331.537), np.float32(1220.4792)]
2025-09-14 11:08:29,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:08:29,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (2203.17) for latency 9
2025-09-14 11:08:29,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 10 minutes, 47 seconds)
2025-09-14 11:11:09,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:11:16,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1912.20056 ± 518.601
2025-09-14 11:11:16,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2130.2388), np.float32(2802.2725), np.float32(1192.9392), np.float32(1465.625), np.float32(2172.069), np.float32(2314.7412), np.float32(1110.7837), np.float32(2382.9272), np.float32(1806.6238), np.float32(1743.785)]
2025-09-14 11:11:16,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:11:16,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 59 minutes, 1 second)
2025-09-14 11:13:49,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:13:56,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1963.90723 ± 704.787
2025-09-14 11:13:56,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1036.1484), np.float32(2817.206), np.float32(1771.3856), np.float32(1298.2794), np.float32(2845.9302), np.float32(1865.4779), np.float32(819.5576), np.float32(1992.7798), np.float32(2576.4404), np.float32(2615.8647)]
2025-09-14 11:13:56,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:13:56,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 46 minutes, 29 seconds)
2025-09-14 11:16:29,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:16:35,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1883.78394 ± 800.659
2025-09-14 11:16:35,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1125.1545), np.float32(2822.0452), np.float32(2549.606), np.float32(2706.8423), np.float32(880.8843), np.float32(1066.1919), np.float32(2584.4236), np.float32(1009.49176), np.float32(2699.0513), np.float32(1394.1475)]
2025-09-14 11:16:35,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:16:35,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 36 minutes, 16 seconds)
2025-09-14 11:19:09,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:19:15,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2048.07690 ± 626.333
2025-09-14 11:19:15,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1322.1399), np.float32(2553.422), np.float32(1402.4171), np.float32(2806.641), np.float32(2963.7202), np.float32(1479.9744), np.float32(1282.2883), np.float32(1888.7561), np.float32(2655.778), np.float32(2125.6333)]
2025-09-14 11:19:15,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:19:15,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 26 minutes, 58 seconds)
2025-09-14 11:21:48,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:21:55,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1980.03943 ± 887.320
2025-09-14 11:21:55,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3260.9941), np.float32(2760.0825), np.float32(1455.9181), np.float32(2891.4812), np.float32(1149.5282), np.float32(860.5787), np.float32(2557.1528), np.float32(1099.457), np.float32(1031.6565), np.float32(2733.5452)]
2025-09-14 11:21:55,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:21:55,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 19 minutes, 39 seconds)
2025-09-14 11:24:28,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:24:35,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2206.90479 ± 834.175
2025-09-14 11:24:35,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(998.5166), np.float32(2991.5198), np.float32(2694.2778), np.float32(2939.2344), np.float32(1067.6632), np.float32(2198.6672), np.float32(2936.6743), np.float32(2529.1047), np.float32(2838.8367), np.float32(874.5523)]
2025-09-14 11:24:35,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:24:35,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (2206.90) for latency 9
2025-09-14 11:24:35,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 15 minutes, 46 seconds)
2025-09-14 11:27:08,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:27:14,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1898.43530 ± 695.496
2025-09-14 11:27:14,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2918.698), np.float32(1341.7823), np.float32(2009.7052), np.float32(1883.519), np.float32(2003.5798), np.float32(2908.94), np.float32(975.07104), np.float32(1148.0518), np.float32(1181.3004), np.float32(2613.7048)]
2025-09-14 11:27:14,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:27:14,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 13 minutes, 5 seconds)
2025-09-14 11:29:48,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:29:55,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2089.07617 ± 600.830
2025-09-14 11:29:55,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1414.8771), np.float32(1108.1584), np.float32(2488.4675), np.float32(2289.0515), np.float32(3094.4436), np.float32(1680.4465), np.float32(2210.9277), np.float32(1853.6167), np.float32(2914.7102), np.float32(1836.0629)]
2025-09-14 11:29:55,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:29:55,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 10 minutes, 33 seconds)
2025-09-14 11:32:27,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:32:34,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1824.39038 ± 667.799
2025-09-14 11:32:34,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1245.3372), np.float32(1984.2944), np.float32(2922.7798), np.float32(2619.319), np.float32(1327.1484), np.float32(2715.1218), np.float32(1805.4508), np.float32(1121.3429), np.float32(1066.155), np.float32(1436.9546)]
2025-09-14 11:32:34,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:32:34,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 7 minutes, 50 seconds)
2025-09-14 11:35:07,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:35:14,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1815.61841 ± 604.102
2025-09-14 11:35:14,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1385.4375), np.float32(1809.521), np.float32(2265.365), np.float32(2979.3381), np.float32(2068.753), np.float32(1026.0305), np.float32(1412.686), np.float32(1737.744), np.float32(2456.4329), np.float32(1014.87445)]
2025-09-14 11:35:14,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:35:14,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 5 minutes, 12 seconds)
2025-09-14 11:37:47,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:37:54,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2097.24829 ± 751.492
2025-09-14 11:37:54,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2510.8286), np.float32(2911.0344), np.float32(1886.8628), np.float32(2340.724), np.float32(2159.8235), np.float32(2453.045), np.float32(2370.0361), np.float32(2882.748), np.float32(461.53882), np.float32(995.842)]
2025-09-14 11:37:54,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:37:54,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 2 minutes, 32 seconds)
2025-09-14 11:40:27,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:40:34,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2309.35889 ± 663.113
2025-09-14 11:40:34,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1344.0543), np.float32(3011.2322), np.float32(2752.975), np.float32(1613.1072), np.float32(2892.6123), np.float32(1324.4814), np.float32(2053.5676), np.float32(2876.45), np.float32(3065.4026), np.float32(2159.7056)]
2025-09-14 11:40:34,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:40:34,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (2309.36) for latency 9
2025-09-14 11:40:34,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 59 minutes, 55 seconds)
2025-09-14 11:43:07,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:43:13,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2115.32153 ± 782.318
2025-09-14 11:43:13,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2767.734), np.float32(1900.7299), np.float32(2875.5112), np.float32(1022.4069), np.float32(3032.727), np.float32(2775.0542), np.float32(967.4083), np.float32(1636.5526), np.float32(1362.6937), np.float32(2812.3972)]
2025-09-14 11:43:13,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:43:13,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 57 minutes, 9 seconds)
2025-09-14 11:45:46,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:45:53,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1997.13025 ± 685.155
2025-09-14 11:45:53,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1700.9229), np.float32(2601.2537), np.float32(2462.7908), np.float32(2908.5984), np.float32(1699.5753), np.float32(2602.8982), np.float32(2349.3804), np.float32(821.85986), np.float32(1935.6954), np.float32(888.32654)]
2025-09-14 11:45:53,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:45:53,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 54 minutes, 27 seconds)
2025-09-14 11:48:26,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:48:33,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2477.59424 ± 571.859
2025-09-14 11:48:33,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2790.944), np.float32(3013.8997), np.float32(1921.1588), np.float32(2805.5576), np.float32(2809.625), np.float32(2416.2551), np.float32(3032.9634), np.float32(2931.6533), np.float32(1651.7343), np.float32(1402.1512)]
2025-09-14 11:48:33,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:48:33,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (2477.59) for latency 9
2025-09-14 11:48:33,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 51 minutes, 49 seconds)
2025-09-14 11:51:06,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:51:13,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1913.43945 ± 963.489
2025-09-14 11:51:13,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2809.319), np.float32(2713.7676), np.float32(3054.6165), np.float32(1508.0616), np.float32(344.25662), np.float32(822.9017), np.float32(1408.4695), np.float32(922.93604), np.float32(2797.2512), np.float32(2752.8137)]
2025-09-14 11:51:13,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:51:13,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 49 minutes, 11 seconds)
2025-09-14 11:53:46,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:53:53,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2130.65894 ± 785.595
2025-09-14 11:53:53,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2461.3582), np.float32(2804.6082), np.float32(1042.5555), np.float32(2740.3557), np.float32(2871.286), np.float32(982.45575), np.float32(2643.7676), np.float32(2919.716), np.float32(1801.1362), np.float32(1039.3508)]
2025-09-14 11:53:53,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:53:53,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 46 minutes, 31 seconds)
2025-09-14 11:56:26,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:56:33,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2516.04565 ± 490.039
2025-09-14 11:56:33,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2789.2717), np.float32(2806.1504), np.float32(2625.3416), np.float32(2588.4402), np.float32(2787.3157), np.float32(2659.1074), np.float32(1297.9305), np.float32(2942.3333), np.float32(2778.5244), np.float32(1886.0428)]
2025-09-14 11:56:33,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:56:33,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (2516.05) for latency 9
2025-09-14 11:56:33,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 43 minutes, 55 seconds)
2025-09-14 11:59:06,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 11:59:13,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2269.45850 ± 578.411
2025-09-14 11:59:13,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2933.923), np.float32(1434.1848), np.float32(1647.1228), np.float32(2854.0042), np.float32(2850.6528), np.float32(1828.756), np.float32(2980.1213), np.float32(2063.6345), np.float32(1662.102), np.float32(2440.0835)]
2025-09-14 11:59:13,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:59:13,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 41 minutes, 18 seconds)
2025-09-14 12:01:46,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:01:52,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2392.95850 ± 648.920
2025-09-14 12:01:52,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1623.8307), np.float32(3169.4727), np.float32(2582.3743), np.float32(1052.7135), np.float32(2545.423), np.float32(1834.8698), np.float32(2908.8218), np.float32(2457.1118), np.float32(3111.8257), np.float32(2643.1436)]
2025-09-14 12:01:52,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:01:52,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 38 minutes, 36 seconds)
2025-09-14 12:04:25,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:04:32,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2249.98047 ± 707.796
2025-09-14 12:04:32,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2597.5693), np.float32(1092.2545), np.float32(1150.5339), np.float32(3099.7798), np.float32(3137.8623), np.float32(2811.9675), np.float32(1645.552), np.float32(2140.2402), np.float32(2207.5593), np.float32(2616.486)]
2025-09-14 12:04:32,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:04:32,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 35 minutes, 53 seconds)
2025-09-14 12:07:05,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:07:12,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2272.16870 ± 634.184
2025-09-14 12:07:12,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3226.4329), np.float32(1361.2405), np.float32(2393.34), np.float32(1428.6188), np.float32(2464.3281), np.float32(1529.4176), np.float32(3058.1106), np.float32(2028.8755), np.float32(2779.6023), np.float32(2451.7214)]
2025-09-14 12:07:12,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:07:12,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 33 minutes, 15 seconds)
2025-09-14 12:09:45,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:09:52,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2243.37573 ± 842.781
2025-09-14 12:09:52,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2857.3943), np.float32(3051.437), np.float32(1389.6263), np.float32(932.74304), np.float32(1476.1918), np.float32(2938.487), np.float32(3039.6926), np.float32(2644.2642), np.float32(1133.725), np.float32(2970.1973)]
2025-09-14 12:09:52,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:09:52,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 30 minutes, 34 seconds)
2025-09-14 12:12:25,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:12:32,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2531.60400 ± 606.711
2025-09-14 12:12:32,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2874.4875), np.float32(3120.5178), np.float32(1941.3337), np.float32(1743.5604), np.float32(2833.5332), np.float32(3055.3052), np.float32(2932.4521), np.float32(2605.693), np.float32(1282.7302), np.float32(2926.4275)]
2025-09-14 12:12:32,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:12:32,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (2531.60) for latency 9
2025-09-14 12:12:32,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 27 minutes, 53 seconds)
2025-09-14 12:15:04,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:15:11,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2379.45020 ± 779.772
2025-09-14 12:15:11,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2607.2795), np.float32(972.58136), np.float32(1419.5571), np.float32(2911.9167), np.float32(1269.1691), np.float32(2846.312), np.float32(2904.8748), np.float32(3170.8257), np.float32(3013.3064), np.float32(2678.68)]
2025-09-14 12:15:11,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:15:11,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 25 minutes, 12 seconds)
2025-09-14 12:17:44,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:17:51,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2200.37061 ± 681.872
2025-09-14 12:17:51,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2857.6633), np.float32(1210.3203), np.float32(2875.225), np.float32(1308.4222), np.float32(2944.6252), np.float32(2747.9805), np.float32(1212.3223), np.float32(2173.7603), np.float32(2579.9612), np.float32(2093.426)]
2025-09-14 12:17:51,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:17:51,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 22 minutes, 33 seconds)
2025-09-14 12:20:24,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:20:31,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2422.52222 ± 696.122
2025-09-14 12:20:31,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3040.5178), np.float32(2085.0928), np.float32(1002.4076), np.float32(2996.903), np.float32(1928.5292), np.float32(2792.79), np.float32(2799.0989), np.float32(2930.6035), np.float32(3100.2664), np.float32(1549.0146)]
2025-09-14 12:20:31,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:20:31,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 19 minutes, 54 seconds)
2025-09-14 12:23:04,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:23:11,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2347.41064 ± 677.650
2025-09-14 12:23:11,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2851.4429), np.float32(2931.7122), np.float32(2834.079), np.float32(1223.1523), np.float32(3216.5837), np.float32(2188.3186), np.float32(2516.11), np.float32(1401.3503), np.float32(1556.4916), np.float32(2754.8625)]
2025-09-14 12:23:11,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:23:11,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 17 minutes, 15 seconds)
2025-09-14 12:25:44,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:25:51,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2566.52319 ± 495.646
2025-09-14 12:25:51,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2046.1638), np.float32(2605.0767), np.float32(2997.3975), np.float32(3081.7012), np.float32(2543.2393), np.float32(1445.9171), np.float32(2413.7166), np.float32(2835.727), np.float32(3181.7478), np.float32(2514.5474)]
2025-09-14 12:25:51,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:25:51,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (2566.52) for latency 9
2025-09-14 12:25:51,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 14 minutes, 34 seconds)
2025-09-14 12:28:24,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:28:31,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2181.53735 ± 620.330
2025-09-14 12:28:31,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1259.0188), np.float32(2584.2188), np.float32(2932.3887), np.float32(2210.5513), np.float32(2153.2393), np.float32(2067.8032), np.float32(2818.359), np.float32(2970.1223), np.float32(1271.8212), np.float32(1547.8518)]
2025-09-14 12:28:31,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:28:31,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 11 minutes, 57 seconds)
2025-09-14 12:31:04,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:31:11,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2405.34619 ± 687.032
2025-09-14 12:31:11,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1939.0183), np.float32(2717.6838), np.float32(3058.7678), np.float32(3158.2742), np.float32(927.49963), np.float32(3109.94), np.float32(2233.3083), np.float32(1635.0928), np.float32(2654.3735), np.float32(2619.5034)]
2025-09-14 12:31:11,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:31:11,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 9 minutes, 18 seconds)
2025-09-14 12:33:44,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:33:50,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2779.95630 ± 431.801
2025-09-14 12:33:50,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2529.5269), np.float32(2867.6252), np.float32(2981.6794), np.float32(3061.4006), np.float32(3058.6553), np.float32(2806.033), np.float32(3128.567), np.float32(3128.307), np.float32(1629.5227), np.float32(2608.2441)]
2025-09-14 12:33:50,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:33:50,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (2779.96) for latency 9
2025-09-14 12:33:50,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 6 minutes, 36 seconds)
2025-09-14 12:36:24,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:36:30,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2336.21729 ± 610.094
2025-09-14 12:36:30,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3002.3215), np.float32(2705.5337), np.float32(1270.3517), np.float32(1340.7776), np.float32(2695.285), np.float32(2627.5151), np.float32(2799.1228), np.float32(1703.2709), np.float32(2746.082), np.float32(2471.9116)]
2025-09-14 12:36:30,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:36:30,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 3 minutes, 56 seconds)
2025-09-14 12:39:03,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:39:10,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2708.32568 ± 559.586
2025-09-14 12:39:10,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3034.0735), np.float32(2554.0144), np.float32(2877.4514), np.float32(2996.234), np.float32(2906.9443), np.float32(3142.344), np.float32(1107.8591), np.float32(2627.883), np.float32(2973.186), np.float32(2863.2678)]
2025-09-14 12:39:10,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:39:10,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 1 minute, 15 seconds)
2025-09-14 12:41:43,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:41:50,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2153.16650 ± 778.238
2025-09-14 12:41:50,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1120.9457), np.float32(2874.8242), np.float32(2775.2214), np.float32(1920.7799), np.float32(2456.1223), np.float32(3142.9888), np.float32(1078.0383), np.float32(2612.0264), np.float32(2583.3965), np.float32(967.32025)]
2025-09-14 12:41:50,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:41:50,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 58 minutes, 36 seconds)
2025-09-14 12:44:23,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:44:30,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2616.49194 ± 827.171
2025-09-14 12:44:30,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2606.2458), np.float32(3070.3406), np.float32(2963.6646), np.float32(3039.004), np.float32(1527.8739), np.float32(570.856), np.float32(3233.075), np.float32(3018.0825), np.float32(2972.2114), np.float32(3163.5688)]
2025-09-14 12:44:30,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:44:30,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 55 minutes, 56 seconds)
2025-09-14 12:47:03,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:47:10,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2142.49683 ± 724.915
2025-09-14 12:47:10,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3059.3918), np.float32(1983.4617), np.float32(2931.515), np.float32(2732.664), np.float32(1422.1886), np.float32(3087.8093), np.float32(1134.6715), np.float32(1358.0616), np.float32(1531.76), np.float32(2183.4463)]
2025-09-14 12:47:10,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:47:10,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 53 minutes, 17 seconds)
2025-09-14 12:49:43,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:49:50,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2502.03955 ± 591.746
2025-09-14 12:49:50,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2820.674), np.float32(2673.831), np.float32(2495.3557), np.float32(3093.1572), np.float32(2900.0283), np.float32(2272.9648), np.float32(2835.7058), np.float32(2554.4697), np.float32(2514.1265), np.float32(860.0795)]
2025-09-14 12:49:50,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:49:50,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 50 minutes, 37 seconds)
2025-09-14 12:52:23,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:52:29,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2650.85938 ± 546.594
2025-09-14 12:52:29,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2934.4312), np.float32(3035.8093), np.float32(1637.9506), np.float32(2842.7546), np.float32(2793.9478), np.float32(2992.4749), np.float32(2860.0068), np.float32(1561.8953), np.float32(3224.2869), np.float32(2625.0364)]
2025-09-14 12:52:29,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:52:29,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 47 minutes, 58 seconds)
2025-09-14 12:55:02,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:55:09,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2876.44238 ± 291.290
2025-09-14 12:55:09,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3149.606), np.float32(3121.425), np.float32(2128.4426), np.float32(2676.7498), np.float32(3001.2095), np.float32(3071.2024), np.float32(3036.9521), np.float32(3001.773), np.float32(2833.5566), np.float32(2743.5076)]
2025-09-14 12:55:09,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:55:09,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (2876.44) for latency 9
2025-09-14 12:55:09,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 45 minutes, 17 seconds)
2025-09-14 12:57:42,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 12:57:49,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2523.25439 ± 727.577
2025-09-14 12:57:49,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2908.4167), np.float32(2929.8062), np.float32(971.9432), np.float32(2610.474), np.float32(1203.2416), np.float32(2948.7625), np.float32(2815.8074), np.float32(3038.5208), np.float32(2862.6873), np.float32(2942.8826)]
2025-09-14 12:57:49,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:57:49,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 42 minutes, 37 seconds)
2025-09-14 13:00:23,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:00:29,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2055.60352 ± 696.548
2025-09-14 13:00:29,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1356.7728), np.float32(1057.0746), np.float32(2880.087), np.float32(2549.732), np.float32(2315.4531), np.float32(1952.4414), np.float32(2595.8833), np.float32(1529.3964), np.float32(3101.361), np.float32(1217.834)]
2025-09-14 13:00:29,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:00:29,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 39 minutes, 58 seconds)
2025-09-14 13:03:03,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:03:10,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2543.77197 ± 588.400
2025-09-14 13:03:10,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2545.787), np.float32(2139.5305), np.float32(1775.2008), np.float32(1762.2274), np.float32(2541.9104), np.float32(1923.1232), np.float32(3020.2576), np.float32(3180.3271), np.float32(3199.4204), np.float32(3349.933)]
2025-09-14 13:03:10,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:03:10,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 37 minutes, 19 seconds)
2025-09-14 13:05:43,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:05:50,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2140.00073 ± 762.346
2025-09-14 13:05:50,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3034.5112), np.float32(1697.4716), np.float32(1521.6492), np.float32(2799.5764), np.float32(2922.7043), np.float32(2777.626), np.float32(1719.9064), np.float32(2813.85), np.float32(1052.7494), np.float32(1059.9623)]
2025-09-14 13:05:50,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:05:50,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 34 minutes, 40 seconds)
2025-09-14 13:08:23,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:08:30,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2724.38428 ± 417.046
2025-09-14 13:08:30,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2735.523), np.float32(1659.675), np.float32(2371.9321), np.float32(3125.5933), np.float32(3042.8748), np.float32(2755.763), np.float32(3149.4868), np.float32(2687.4258), np.float32(2846.333), np.float32(2869.2358)]
2025-09-14 13:08:30,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:08:30,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 32 minutes, 1 second)
2025-09-14 13:11:03,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:11:10,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2276.13232 ± 912.055
2025-09-14 13:11:10,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(848.6751), np.float32(3263.3838), np.float32(3194.762), np.float32(2776.0657), np.float32(1207.626), np.float32(2961.4612), np.float32(2812.7993), np.float32(2864.3088), np.float32(1909.954), np.float32(922.2871)]
2025-09-14 13:11:10,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:11:10,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 29 minutes, 21 seconds)
2025-09-14 13:13:43,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:13:49,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2542.72827 ± 624.362
2025-09-14 13:13:49,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1561.6525), np.float32(3000.841), np.float32(1622.5989), np.float32(2807.8962), np.float32(1721.5397), np.float32(3056.8833), np.float32(3144.9426), np.float32(3133.4426), np.float32(2447.912), np.float32(2929.5747)]
2025-09-14 13:13:49,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:13:49,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 26 minutes, 40 seconds)
2025-09-14 13:16:22,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:16:29,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2466.22803 ± 954.222
2025-09-14 13:16:29,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3093.3916), np.float32(967.07825), np.float32(967.4214), np.float32(3124.0957), np.float32(3186.3284), np.float32(2012.9739), np.float32(1449.4529), np.float32(3271.0176), np.float32(3346.7673), np.float32(3243.7512)]
2025-09-14 13:16:29,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:16:29,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 23 minutes, 58 seconds)
2025-09-14 13:19:02,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:19:09,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2696.65527 ± 543.878
2025-09-14 13:19:09,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3011.3564), np.float32(2837.9285), np.float32(2851.826), np.float32(2918.2375), np.float32(2120.4304), np.float32(3108.027), np.float32(2635.5798), np.float32(3055.8103), np.float32(3126.1853), np.float32(1301.1711)]
2025-09-14 13:19:09,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:19:09,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 21 minutes, 18 seconds)
2025-09-14 13:21:42,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:21:48,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2573.66577 ± 438.763
2025-09-14 13:21:48,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1927.0841), np.float32(2492.091), np.float32(3097.2966), np.float32(1644.4274), np.float32(3033.905), np.float32(2857.4128), np.float32(2541.8315), np.float32(2788.1792), np.float32(2646.2043), np.float32(2708.2268)]
2025-09-14 13:21:48,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:21:48,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 18 minutes, 38 seconds)
2025-09-14 13:24:21,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:24:28,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2383.46826 ± 798.962
2025-09-14 13:24:28,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3119.5642), np.float32(1317.2058), np.float32(3071.7996), np.float32(1908.392), np.float32(1211.9205), np.float32(1305.6475), np.float32(2883.1655), np.float32(3078.5662), np.float32(3130.2725), np.float32(2808.1504)]
2025-09-14 13:24:28,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:24:28,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 15 minutes, 57 seconds)
2025-09-14 13:27:01,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:27:08,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2243.46143 ± 747.608
2025-09-14 13:27:08,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1511.6837), np.float32(2126.3518), np.float32(3149.285), np.float32(1017.21014), np.float32(2998.8894), np.float32(2733.7737), np.float32(1956.7537), np.float32(1284.7804), np.float32(2496.155), np.float32(3159.7322)]
2025-09-14 13:27:08,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:27:08,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 13 minutes, 18 seconds)
2025-09-14 13:29:41,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:29:48,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2014.15161 ± 828.060
2025-09-14 13:29:48,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2956.9172), np.float32(1836.2446), np.float32(1051.7312), np.float32(2812.059), np.float32(2742.6997), np.float32(972.48944), np.float32(1447.1545), np.float32(3045.9148), np.float32(900.2991), np.float32(2376.005)]
2025-09-14 13:29:48,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:29:48,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 10 minutes, 38 seconds)
2025-09-14 13:32:21,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:32:27,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2230.85913 ± 875.992
2025-09-14 13:32:27,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2781.3054), np.float32(1079.2056), np.float32(2893.0457), np.float32(769.0903), np.float32(1761.2577), np.float32(2778.8193), np.float32(3190.203), np.float32(1204.9438), np.float32(2851.992), np.float32(2998.7285)]
2025-09-14 13:32:27,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:32:27,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 7 minutes, 59 seconds)
2025-09-14 13:34:57,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:35:03,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2498.25342 ± 668.785
2025-09-14 13:35:03,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2996.3726), np.float32(2751.1667), np.float32(2936.3376), np.float32(2991.7346), np.float32(2753.5867), np.float32(1130.3594), np.float32(1245.0878), np.float32(2533.035), np.float32(2881.829), np.float32(2763.0283)]
2025-09-14 13:35:03,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:35:03,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 5 minutes, 17 seconds)
2025-09-14 13:37:14,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:37:19,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2866.55591 ± 281.512
2025-09-14 13:37:19,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2755.4473), np.float32(2837.9153), np.float32(3086.3137), np.float32(2935.1792), np.float32(2071.6624), np.float32(3071.356), np.float32(3026.899), np.float32(2965.0667), np.float32(2943.1082), np.float32(2972.6106)]
2025-09-14 13:37:19,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:37:19,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 34 seconds)
2025-09-14 13:39:22,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 9...
2025-09-14 13:39:28,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 2903.06787 ± 192.163
2025-09-14 13:39:28,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2587.6604), np.float32(3142.396), np.float32(3028.5913), np.float32(2896.0999), np.float32(3086.1897), np.float32(2960.0852), np.float32(2975.746), np.float32(2995.9236), np.float32(2829.9788), np.float32(2528.0059)]
2025-09-14 13:39:28,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:39:28,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (2903.07) for latency 9
2025-09-14 13:39:28,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1251 [DEBUG]: Training session finished
