2025-09-14 16:19:56,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1108 [DEBUG]: logdir: _logs/noise-eval-v2/halfcheetah/bpql-noise_0.150-delay_24
2025-09-14 16:19:56,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1109 [DEBUG]: trainer_prefix: noise-eval-v2/halfcheetah/bpql-noise_0.150-delay_24
2025-09-14 16:19:56,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'24': <latency_env.delayed_mdp.ConstantDelay object at 0x7f41df207ce0>}
2025-09-14 16:19:56,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1111 [DEBUG]: using device: cpu
2025-09-14 16:19:56,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-14 16:19:56,446 baseline-bpql-noisepromille150-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=161, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-14 16:19:56,446 baseline-bpql-noisepromille150-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-14 16:19:57,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-14 16:19:57,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-14 16:23:08,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:23:20,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: -373.06464 ± 11.520
2025-09-14 16:23:20,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-376.29797), np.float32(-348.29794), np.float32(-377.95258), np.float32(-374.51007), np.float32(-389.1585), np.float32(-361.24805), np.float32(-387.884), np.float32(-372.41644), np.float32(-366.2765), np.float32(-376.60416)]
2025-09-14 16:23:20,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:23:20,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (-373.06) for latency 24
2025-09-14 16:23:20,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 5 hours, 34 minutes, 1 second)
2025-09-14 16:26:29,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:26:40,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: -286.27530 ± 43.344
2025-09-14 16:26:40,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-273.69916), np.float32(-266.80368), np.float32(-261.2808), np.float32(-291.77466), np.float32(-334.40952), np.float32(-392.62994), np.float32(-266.79263), np.float32(-269.89127), np.float32(-275.85095), np.float32(-229.62015)]
2025-09-14 16:26:40,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:26:40,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (-286.28) for latency 24
2025-09-14 16:26:40,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 5 hours, 29 minutes, 5 seconds)
2025-09-14 16:29:44,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:29:55,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: -171.15869 ± 26.806
2025-09-14 16:29:55,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-149.24962), np.float32(-147.54355), np.float32(-215.16374), np.float32(-131.63109), np.float32(-149.91356), np.float32(-202.41678), np.float32(-166.79343), np.float32(-191.14377), np.float32(-160.3777), np.float32(-197.3537)]
2025-09-14 16:29:55,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:29:55,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (-171.16) for latency 24
2025-09-14 16:29:55,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 5 hours, 22 minutes, 11 seconds)
2025-09-14 16:33:00,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:33:12,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: -87.62647 ± 95.170
2025-09-14 16:33:12,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-108.65742), np.float32(-202.48784), np.float32(-58.7035), np.float32(-128.44255), np.float32(33.326607), np.float32(-249.75337), np.float32(-17.319145), np.float32(-39.25346), np.float32(-161.50002), np.float32(56.525963)]
2025-09-14 16:33:12,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:33:12,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (-87.63) for latency 24
2025-09-14 16:33:12,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 5 hours, 17 minutes, 43 seconds)
2025-09-14 16:36:17,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:36:29,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: -78.29679 ± 83.008
2025-09-14 16:36:29,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-169.53297), np.float32(-128.85149), np.float32(-144.35706), np.float32(-117.35801), np.float32(-85.0221), np.float32(-144.22723), np.float32(45.284515), np.float32(98.36765), np.float32(-45.48607), np.float32(-91.78514)]
2025-09-14 16:36:29,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:36:29,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (-78.30) for latency 24
2025-09-14 16:36:29,374 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 5 hours, 13 minutes, 59 seconds)
2025-09-14 16:39:42,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:39:53,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 53.18641 ± 62.546
2025-09-14 16:39:53,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(103.50474), np.float32(6.139441), np.float32(196.69034), np.float32(88.127625), np.float32(-6.179304), np.float32(93.7303), np.float32(35.77168), np.float32(0.6138327), np.float32(4.636595), np.float32(8.8288)]
2025-09-14 16:39:53,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:39:53,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (53.19) for latency 24
2025-09-14 16:39:53,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 5 hours, 11 minutes, 21 seconds)
2025-09-14 16:43:08,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:43:21,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 185.90167 ± 56.565
2025-09-14 16:43:21,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(185.18172), np.float32(137.91382), np.float32(205.34608), np.float32(202.93832), np.float32(319.41724), np.float32(110.52382), np.float32(216.40659), np.float32(142.5348), np.float32(136.52078), np.float32(202.23349)]
2025-09-14 16:43:21,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:43:21,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (185.90) for latency 24
2025-09-14 16:43:21,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 5 hours, 10 minutes, 8 seconds)
2025-09-14 16:46:36,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:46:48,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 473.64658 ± 147.853
2025-09-14 16:46:48,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(642.3981), np.float32(383.89062), np.float32(134.93208), np.float32(576.1141), np.float32(596.0791), np.float32(503.9949), np.float32(524.79047), np.float32(615.7802), np.float32(367.14786), np.float32(391.33813)]
2025-09-14 16:46:48,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:46:48,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (473.65) for latency 24
2025-09-14 16:46:48,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 5 hours, 10 minutes, 38 seconds)
2025-09-14 16:49:54,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:50:05,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 430.27676 ± 326.247
2025-09-14 16:50:05,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-399.75204), np.float32(401.37265), np.float32(494.76044), np.float32(617.5556), np.float32(525.1762), np.float32(659.12573), np.float32(729.39825), np.float32(583.6335), np.float32(616.5793), np.float32(74.91808)]
2025-09-14 16:50:05,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:50:05,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 5 hours, 7 minutes, 27 seconds)
2025-09-14 16:53:09,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:53:21,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 663.03015 ± 200.671
2025-09-14 16:53:21,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(885.63995), np.float32(790.7885), np.float32(792.4312), np.float32(808.3862), np.float32(846.5659), np.float32(666.7547), np.float32(644.26465), np.float32(587.95636), np.float32(327.15668), np.float32(280.35733)]
2025-09-14 16:53:21,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:53:21,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (663.03) for latency 24
2025-09-14 16:53:21,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 5 hours, 3 minutes, 36 seconds)
2025-09-14 16:56:28,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 16:56:39,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 783.50720 ± 83.186
2025-09-14 16:56:39,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(807.3477), np.float32(794.1115), np.float32(622.19965), np.float32(850.4445), np.float32(833.8627), np.float32(859.5173), np.float32(700.82056), np.float32(687.2795), np.float32(900.0848), np.float32(779.40326)]
2025-09-14 16:56:39,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:56:39,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (783.51) for latency 24
2025-09-14 16:56:39,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 58 minutes, 26 seconds)
2025-09-14 16:59:53,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:00:05,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 674.52209 ± 158.990
2025-09-14 17:00:05,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(865.9157), np.float32(597.9475), np.float32(464.4532), np.float32(799.2525), np.float32(632.6187), np.float32(533.5773), np.float32(485.66882), np.float32(874.2495), np.float32(892.84924), np.float32(598.6885)]
2025-09-14 17:00:05,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:00:05,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 54 minutes, 31 seconds)
2025-09-14 17:03:16,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:03:28,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 683.08392 ± 351.627
2025-09-14 17:03:28,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(946.8453), np.float32(726.49304), np.float32(813.11334), np.float32(768.40497), np.float32(743.2634), np.float32(897.09546), np.float32(702.28265), np.float32(785.5151), np.float32(797.8533), np.float32(-350.02786)]
2025-09-14 17:03:28,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:03:28,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 49 minutes, 51 seconds)
2025-09-14 17:06:38,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:06:49,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 725.97961 ± 173.231
2025-09-14 17:06:49,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(715.824), np.float32(770.7242), np.float32(408.44983), np.float32(852.6629), np.float32(859.7732), np.float32(813.11597), np.float32(394.8297), np.float32(720.3061), np.float32(795.3795), np.float32(928.7309)]
2025-09-14 17:06:49,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:06:49,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 47 minutes, 51 seconds)
2025-09-14 17:10:00,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:10:12,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 792.44006 ± 148.933
2025-09-14 17:10:12,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(440.1967), np.float32(950.90405), np.float32(894.9318), np.float32(676.5821), np.float32(713.03796), np.float32(859.3425), np.float32(709.1255), np.float32(900.7863), np.float32(877.53876), np.float32(901.9552)]
2025-09-14 17:10:12,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:10:12,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (792.44) for latency 24
2025-09-14 17:10:12,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 46 minutes, 28 seconds)
2025-09-14 17:13:27,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:13:39,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 633.07458 ± 389.542
2025-09-14 17:13:39,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(747.3548), np.float32(809.1073), np.float32(990.9511), np.float32(556.5056), np.float32(869.11304), np.float32(-369.29156), np.float32(865.43835), np.float32(235.31908), np.float32(844.35645), np.float32(781.8913)]
2025-09-14 17:13:39,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:13:39,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 4 hours, 45 minutes, 37 seconds)
2025-09-14 17:17:02,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:17:15,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 730.22058 ± 171.256
2025-09-14 17:17:15,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(868.0015), np.float32(432.09372), np.float32(808.95374), np.float32(901.4852), np.float32(850.2991), np.float32(388.22803), np.float32(732.0941), np.float32(806.61206), np.float32(832.1071), np.float32(682.3313)]
2025-09-14 17:17:15,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:17:15,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours, 45 minutes, 2 seconds)
2025-09-14 17:20:43,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:20:56,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 796.94849 ± 154.538
2025-09-14 17:20:56,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(595.05457), np.float32(898.9069), np.float32(871.2056), np.float32(576.1994), np.float32(931.2142), np.float32(920.4955), np.float32(541.8022), np.float32(802.775), np.float32(973.74664), np.float32(858.08514)]
2025-09-14 17:20:56,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:20:56,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (796.95) for latency 24
2025-09-14 17:20:56,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 46 minutes, 25 seconds)
2025-09-14 17:24:15,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:24:28,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 817.89368 ± 169.599
2025-09-14 17:24:28,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(819.2138), np.float32(707.63135), np.float32(784.80585), np.float32(756.65784), np.float32(870.01666), np.float32(924.99994), np.float32(409.1269), np.float32(1073.5284), np.float32(868.68036), np.float32(964.2758)]
2025-09-14 17:24:28,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:24:28,472 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (817.89) for latency 24
2025-09-14 17:24:28,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 4 hours, 45 minutes, 49 seconds)
2025-09-14 17:27:47,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:27:59,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 686.46411 ± 136.398
2025-09-14 17:27:59,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(779.2203), np.float32(841.85754), np.float32(706.2627), np.float32(651.523), np.float32(595.2162), np.float32(500.5528), np.float32(736.4045), np.float32(881.3926), np.float32(741.2761), np.float32(430.93494)]
2025-09-14 17:27:59,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:27:59,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 4 hours, 44 minutes, 27 seconds)
2025-09-14 17:31:15,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:31:26,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 921.62518 ± 86.695
2025-09-14 17:31:26,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(898.6754), np.float32(829.4063), np.float32(1021.64746), np.float32(903.2889), np.float32(735.33215), np.float32(950.1571), np.float32(1053.3855), np.float32(936.2252), np.float32(915.00964), np.float32(973.1236)]
2025-09-14 17:31:26,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:31:26,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (921.63) for latency 24
2025-09-14 17:31:26,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 4 hours, 40 minutes, 55 seconds)
2025-09-14 17:34:36,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:34:48,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 786.25031 ± 163.533
2025-09-14 17:34:48,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(902.491), np.float32(674.0074), np.float32(486.45844), np.float32(840.171), np.float32(1019.5927), np.float32(534.967), np.float32(919.13544), np.float32(791.08875), np.float32(901.2013), np.float32(793.39014)]
2025-09-14 17:34:48,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:34:48,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 4 hours, 33 minutes, 46 seconds)
2025-09-14 17:38:14,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:38:27,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 815.10461 ± 126.186
2025-09-14 17:38:27,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(794.8041), np.float32(638.1739), np.float32(958.3651), np.float32(785.8514), np.float32(895.48785), np.float32(967.50385), np.float32(915.6794), np.float32(901.2358), np.float32(612.5121), np.float32(681.4325)]
2025-09-14 17:38:27,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:38:27,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 4 hours, 29 minutes, 54 seconds)
2025-09-14 17:42:01,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:42:13,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 709.14050 ± 399.188
2025-09-14 17:42:13,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(862.8995), np.float32(-445.7716), np.float32(1017.1666), np.float32(922.48206), np.float32(596.43445), np.float32(846.55835), np.float32(841.5201), np.float32(729.3816), np.float32(874.3273), np.float32(846.4068)]
2025-09-14 17:42:13,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:42:13,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 4 hours, 29 minutes, 47 seconds)
2025-09-14 17:45:39,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:45:51,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 891.90234 ± 130.081
2025-09-14 17:45:51,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(952.45703), np.float32(916.35834), np.float32(1068.7411), np.float32(967.82794), np.float32(821.0117), np.float32(938.0104), np.float32(701.65466), np.float32(907.0473), np.float32(1016.38245), np.float32(629.53253)]
2025-09-14 17:45:51,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:45:51,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 4 hours, 28 minutes, 8 seconds)
2025-09-14 17:49:17,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:49:30,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 864.33264 ± 178.445
2025-09-14 17:49:30,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(848.3471), np.float32(972.65106), np.float32(823.3922), np.float32(1134.9888), np.float32(804.92615), np.float32(801.2652), np.float32(824.3337), np.float32(1102.9658), np.float32(457.66174), np.float32(872.79456)]
2025-09-14 17:49:30,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:49:30,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 4 hours, 27 minutes, 22 seconds)
2025-09-14 17:52:58,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:53:11,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 987.82141 ± 73.579
2025-09-14 17:53:11,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(990.1288), np.float32(1075.6238), np.float32(915.26794), np.float32(832.2709), np.float32(1043.0792), np.float32(1005.4128), np.float32(1027.239), np.float32(1066.7583), np.float32(914.3091), np.float32(1008.1239)]
2025-09-14 17:53:11,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:53:11,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (987.82) for latency 24
2025-09-14 17:53:11,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 4 hours, 28 minutes, 20 seconds)
2025-09-14 17:56:37,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 17:56:50,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1016.62415 ± 145.665
2025-09-14 17:56:50,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(847.6968), np.float32(794.1931), np.float32(1073.5684), np.float32(1062.8109), np.float32(972.7276), np.float32(866.8971), np.float32(1162.7588), np.float32(1096.5188), np.float32(996.229), np.float32(1292.8411)]
2025-09-14 17:56:50,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:56:50,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (1016.62) for latency 24
2025-09-14 17:56:50,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 4 hours, 24 minutes, 47 seconds)
2025-09-14 18:00:19,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:00:31,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1004.15515 ± 102.694
2025-09-14 18:00:31,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(871.0185), np.float32(1074.3894), np.float32(986.813), np.float32(952.18365), np.float32(1063.731), np.float32(798.2856), np.float32(1089.7701), np.float32(1097.7703), np.float32(1133.9438), np.float32(973.6455)]
2025-09-14 18:00:31,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:00:31,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 4 hours, 19 minutes, 45 seconds)
2025-09-14 18:04:05,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:04:19,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1100.38159 ± 162.009
2025-09-14 18:04:19,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1088.8029), np.float32(1453.9078), np.float32(948.28186), np.float32(972.2744), np.float32(1251.8042), np.float32(1115.6233), np.float32(895.75366), np.float32(1054.5146), np.float32(988.0848), np.float32(1234.7686)]
2025-09-14 18:04:19,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:04:19,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (1100.38) for latency 24
2025-09-14 18:04:19,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 4 hours, 18 minutes, 23 seconds)
2025-09-14 18:07:48,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:08:00,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 905.29968 ± 92.932
2025-09-14 18:08:00,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(809.78625), np.float32(949.50037), np.float32(861.60034), np.float32(908.58356), np.float32(725.8155), np.float32(1033.4133), np.float32(979.1681), np.float32(914.0449), np.float32(1028.7522), np.float32(842.3326)]
2025-09-14 18:08:00,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:08:00,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 4 hours, 15 minutes, 12 seconds)
2025-09-14 18:11:30,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:11:42,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1053.18811 ± 146.695
2025-09-14 18:11:42,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(879.29694), np.float32(1121.0035), np.float32(1143.7479), np.float32(985.34875), np.float32(940.3805), np.float32(1415.273), np.float32(1013.07306), np.float32(1031.2428), np.float32(911.14886), np.float32(1091.3657)]
2025-09-14 18:11:42,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:11:42,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 4 hours, 11 minutes, 46 seconds)
2025-09-14 18:15:09,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:15:22,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1077.56042 ± 101.440
2025-09-14 18:15:22,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1066.22), np.float32(1148.9236), np.float32(1333.4749), np.float32(941.89154), np.float32(1029.1576), np.float32(1127.4436), np.float32(1050.3832), np.float32(1037.2678), np.float32(1025.8225), np.float32(1015.0207)]
2025-09-14 18:15:22,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:15:22,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 4 hours, 8 minutes, 13 seconds)
2025-09-14 18:18:49,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:19:02,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1001.51758 ± 144.435
2025-09-14 18:19:02,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1247.6235), np.float32(1027.5103), np.float32(1088.3386), np.float32(845.0831), np.float32(857.95404), np.float32(793.0931), np.float32(1047.7578), np.float32(1204.644), np.float32(989.8315), np.float32(913.3396)]
2025-09-14 18:19:02,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:19:02,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 4 hours, 4 minutes, 27 seconds)
2025-09-14 18:22:28,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:22:41,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1108.52600 ± 194.391
2025-09-14 18:22:41,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1311.157), np.float32(1233.2551), np.float32(1562.5203), np.float32(1006.3601), np.float32(890.40814), np.float32(1040.1227), np.float32(990.1987), np.float32(1054.8485), np.float32(1073.1256), np.float32(923.2625)]
2025-09-14 18:22:41,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:22:41,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (1108.53) for latency 24
2025-09-14 18:22:41,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 58 minutes, 53 seconds)
2025-09-14 18:26:09,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:26:22,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1005.01190 ± 105.281
2025-09-14 18:26:22,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1054.6539), np.float32(1138.3647), np.float32(1122.4996), np.float32(814.8847), np.float32(1011.14435), np.float32(1021.00165), np.float32(1126.3127), np.float32(934.4023), np.float32(957.43976), np.float32(869.4165)]
2025-09-14 18:26:22,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:26:22,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 55 minutes, 9 seconds)
2025-09-14 18:29:58,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:30:12,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1139.01416 ± 239.499
2025-09-14 18:30:12,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(899.77026), np.float32(970.1587), np.float32(1398.8314), np.float32(1423.8159), np.float32(918.9381), np.float32(1584.8164), np.float32(937.61914), np.float32(1095.9485), np.float32(933.99634), np.float32(1226.2468)]
2025-09-14 18:30:12,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:30:12,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (1139.01) for latency 24
2025-09-14 18:30:12,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 53 minutes, 8 seconds)
2025-09-14 18:33:38,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:33:51,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1248.22021 ± 260.358
2025-09-14 18:33:51,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1134.6477), np.float32(1215.9684), np.float32(1172.8643), np.float32(1393.4025), np.float32(1505.4218), np.float32(798.587), np.float32(976.57), np.float32(1732.1794), np.float32(1104.1499), np.float32(1448.4113)]
2025-09-14 18:33:51,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:33:51,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (1248.22) for latency 24
2025-09-14 18:33:51,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 49 minutes, 18 seconds)
2025-09-14 18:37:20,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:37:33,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1292.41809 ± 260.508
2025-09-14 18:37:33,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1011.7182), np.float32(1390.9532), np.float32(1127.865), np.float32(1779.3286), np.float32(1449.4884), np.float32(995.9019), np.float32(1264.5413), np.float32(1305.7101), np.float32(973.497), np.float32(1625.1764)]
2025-09-14 18:37:33,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:37:33,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (1292.42) for latency 24
2025-09-14 18:37:33,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 45 minutes, 55 seconds)
2025-09-14 18:41:01,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:41:13,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1005.88849 ± 392.052
2025-09-14 18:41:13,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1128.8795), np.float32(31.64023), np.float32(968.63666), np.float32(1714.9756), np.float32(1029.6292), np.float32(1080.0182), np.float32(923.34033), np.float32(918.3591), np.float32(1041.2186), np.float32(1222.1871)]
2025-09-14 18:41:13,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:41:13,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 42 minutes, 19 seconds)
2025-09-14 18:44:41,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:44:52,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1105.42163 ± 142.261
2025-09-14 18:44:52,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(945.2258), np.float32(1284.623), np.float32(1018.0537), np.float32(1161.8241), np.float32(956.32996), np.float32(1003.469), np.float32(1397.8789), np.float32(1078.1709), np.float32(1018.02924), np.float32(1190.6119)]
2025-09-14 18:44:52,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:44:52,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 38 minutes, 21 seconds)
2025-09-14 18:48:21,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:48:33,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1167.74121 ± 281.024
2025-09-14 18:48:33,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1406.2081), np.float32(978.37384), np.float32(1060.6083), np.float32(1003.9596), np.float32(964.72144), np.float32(1833.9534), np.float32(980.108), np.float32(1417.6418), np.float32(1138.1979), np.float32(893.63916)]
2025-09-14 18:48:33,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:48:33,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 32 minutes, 48 seconds)
2025-09-14 18:52:00,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:52:13,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1233.51196 ± 263.583
2025-09-14 18:52:13,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1280.1196), np.float32(981.07654), np.float32(1007.64264), np.float32(1765.448), np.float32(1656.9834), np.float32(1201.1803), np.float32(1109.4459), np.float32(1049.5457), np.float32(986.1605), np.float32(1297.5175)]
2025-09-14 18:52:13,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:52:13,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 29 minutes, 18 seconds)
2025-09-14 18:55:43,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:55:56,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1110.31116 ± 199.792
2025-09-14 18:55:56,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1082.1996), np.float32(1108.5874), np.float32(1022.3789), np.float32(1023.5753), np.float32(1401.1349), np.float32(1038.4395), np.float32(861.2759), np.float32(1517.9753), np.float32(865.361), np.float32(1182.1837)]
2025-09-14 18:55:56,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:55:56,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 25 minutes, 56 seconds)
2025-09-14 18:59:16,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 18:59:28,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1144.43604 ± 219.434
2025-09-14 18:59:28,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1068.452), np.float32(993.58685), np.float32(1478.998), np.float32(1075.6923), np.float32(966.45776), np.float32(1048.1486), np.float32(1055.2793), np.float32(970.3922), np.float32(1135.9001), np.float32(1651.4521)]
2025-09-14 18:59:28,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:59:28,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 3 hours, 20 minutes, 43 seconds)
2025-09-14 19:02:33,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:02:45,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1212.91272 ± 264.793
2025-09-14 19:02:45,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1781.5078), np.float32(1489.274), np.float32(957.1234), np.float32(1031.598), np.float32(1045.8923), np.float32(1192.4556), np.float32(1289.0956), np.float32(1416.6335), np.float32(997.52966), np.float32(928.0175)]
2025-09-14 19:02:45,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:02:45,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 3 hours, 12 minutes, 59 seconds)
2025-09-14 19:05:49,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:06:01,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1204.34570 ± 121.861
2025-09-14 19:06:01,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1132.1797), np.float32(1162.4883), np.float32(975.0632), np.float32(1267.1943), np.float32(1300.8646), np.float32(1234.2765), np.float32(1326.0393), np.float32(1056.6041), np.float32(1189.5833), np.float32(1399.1636)]
2025-09-14 19:06:01,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:06:01,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 3 hours, 5 minutes, 11 seconds)
2025-09-14 19:09:05,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:09:17,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1126.71838 ± 138.453
2025-09-14 19:09:17,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(938.9669), np.float32(1116.0867), np.float32(1142.1042), np.float32(949.6008), np.float32(1311.7294), np.float32(1151.3569), np.float32(977.0209), np.float32(1368.4583), np.float32(1095.3906), np.float32(1216.4686)]
2025-09-14 19:09:17,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:09:17,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 57 minutes, 26 seconds)
2025-09-14 19:12:22,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:12:34,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1203.88159 ± 298.879
2025-09-14 19:12:34,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1144.6449), np.float32(1069.8859), np.float32(1117.5784), np.float32(2001.1162), np.float32(1353.2522), np.float32(890.5862), np.float32(1191.5984), np.float32(964.6099), np.float32(1001.6142), np.float32(1303.9292)]
2025-09-14 19:12:34,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:12:34,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 49 minutes, 40 seconds)
2025-09-14 19:15:49,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:16:00,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1309.71216 ± 255.760
2025-09-14 19:16:00,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1208.5433), np.float32(1209.241), np.float32(998.8858), np.float32(1363.7068), np.float32(1153.2468), np.float32(1102.6693), np.float32(1063.0023), np.float32(1795.2542), np.float32(1613.0648), np.float32(1589.5074)]
2025-09-14 19:16:00,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:16:00,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (1309.71) for latency 24
2025-09-14 19:16:00,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 45 minutes, 24 seconds)
2025-09-14 19:19:13,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:19:25,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1404.26624 ± 318.982
2025-09-14 19:19:25,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1554.2625), np.float32(1126.8484), np.float32(1361.7405), np.float32(1317.5369), np.float32(1689.3218), np.float32(897.3606), np.float32(1285.8253), np.float32(1833.9749), np.float32(1064.9128), np.float32(1910.8782)]
2025-09-14 19:19:25,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:19:25,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (1404.27) for latency 24
2025-09-14 19:19:25,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 43 minutes, 20 seconds)
2025-09-14 19:22:39,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:22:50,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1259.95203 ± 281.921
2025-09-14 19:22:50,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1130.3365), np.float32(1041.8422), np.float32(1867.3134), np.float32(1288.9052), np.float32(1318.5511), np.float32(1696.427), np.float32(1077.4009), np.float32(1001.217), np.float32(1026.026), np.float32(1151.5013)]
2025-09-14 19:22:50,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:22:50,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 41 minutes, 24 seconds)
2025-09-14 19:26:05,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:26:16,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1215.96558 ± 309.366
2025-09-14 19:26:16,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1018.7668), np.float32(802.3629), np.float32(1720.9845), np.float32(1207.1323), np.float32(960.8574), np.float32(1264.4253), np.float32(1838.7129), np.float32(1119.0779), np.float32(1170.4053), np.float32(1056.9305)]
2025-09-14 19:26:16,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:26:16,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 39 minutes, 44 seconds)
2025-09-14 19:29:29,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:29:40,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1445.13879 ± 318.736
2025-09-14 19:29:40,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1751.274), np.float32(1580.71), np.float32(1644.919), np.float32(1101.2454), np.float32(1373.2728), np.float32(1082.6401), np.float32(1243.2047), np.float32(1152.0154), np.float32(2137.0725), np.float32(1385.0343)]
2025-09-14 19:29:40,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:29:40,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (1445.14) for latency 24
2025-09-14 19:29:40,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 37 minutes, 19 seconds)
2025-09-14 19:32:55,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:33:07,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1280.53613 ± 260.755
2025-09-14 19:33:07,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1311.627), np.float32(1104.5764), np.float32(1211.8364), np.float32(1487.3966), np.float32(1109.4258), np.float32(1807.6271), np.float32(1625.497), np.float32(997.75024), np.float32(1144.744), np.float32(1004.88055)]
2025-09-14 19:33:07,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:33:07,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 34 minutes, 2 seconds)
2025-09-14 19:36:24,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:36:36,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1329.28467 ± 389.594
2025-09-14 19:36:36,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2352.337), np.float32(1082.8469), np.float32(1307.2172), np.float32(1072.7638), np.float32(1334.8241), np.float32(1089.1707), np.float32(953.23663), np.float32(1624.2168), np.float32(1382.1145), np.float32(1094.1193)]
2025-09-14 19:36:36,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:36:36,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 31 minutes, 16 seconds)
2025-09-14 19:39:47,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:39:58,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1422.63928 ± 438.993
2025-09-14 19:39:58,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1207.0039), np.float32(1844.7367), np.float32(1417.6328), np.float32(1174.5298), np.float32(1590.8468), np.float32(1286.7706), np.float32(1067.8536), np.float32(2506.465), np.float32(927.1116), np.float32(1203.4419)]
2025-09-14 19:39:58,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:39:58,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 27 minutes, 24 seconds)
2025-09-14 19:43:03,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:43:15,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1229.66895 ± 250.075
2025-09-14 19:43:15,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1273.9713), np.float32(1030.6799), np.float32(1192.1218), np.float32(1031.1122), np.float32(1052.2878), np.float32(1379.8456), np.float32(1639.8081), np.float32(1079.6376), np.float32(1686.6173), np.float32(930.60767)]
2025-09-14 19:43:15,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:43:15,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 22 minutes, 32 seconds)
2025-09-14 19:46:19,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:46:31,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1210.54712 ± 205.841
2025-09-14 19:46:31,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1511.4248), np.float32(1004.1276), np.float32(1121.4888), np.float32(1148.4999), np.float32(1355.2886), np.float32(1427.0521), np.float32(1144.9839), np.float32(809.9965), np.float32(1420.1042), np.float32(1162.5049)]
2025-09-14 19:46:31,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:46:31,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 18 minutes, 8 seconds)
2025-09-14 19:49:35,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:49:47,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1295.94360 ± 232.539
2025-09-14 19:49:47,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1198.4315), np.float32(1032.5994), np.float32(1307.6777), np.float32(1396.7822), np.float32(1103.593), np.float32(987.4771), np.float32(1149.8839), np.float32(1715.6007), np.float32(1600.5206), np.float32(1466.8708)]
2025-09-14 19:49:47,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:49:47,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 13 minutes, 15 seconds)
2025-09-14 19:52:58,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:53:10,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1454.28711 ± 269.043
2025-09-14 19:53:10,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2060.1274), np.float32(1520.3833), np.float32(1490.615), np.float32(1446.6632), np.float32(1397.3397), np.float32(1396.7402), np.float32(1676.5726), np.float32(1338.1805), np.float32(963.301), np.float32(1252.9486)]
2025-09-14 19:53:10,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:53:10,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (1454.29) for latency 24
2025-09-14 19:53:10,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 9 minutes, 10 seconds)
2025-09-14 19:56:24,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:56:36,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1460.06433 ± 414.016
2025-09-14 19:56:36,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1799.3701), np.float32(1074.024), np.float32(1320.8665), np.float32(1149.6328), np.float32(1443.4985), np.float32(1153.3536), np.float32(1199.729), np.float32(2000.3054), np.float32(2335.9673), np.float32(1123.8955)]
2025-09-14 19:56:36,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:56:36,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (1460.06) for latency 24
2025-09-14 19:56:36,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 6 minutes, 22 seconds)
2025-09-14 19:59:46,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 19:59:57,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1195.94360 ± 200.512
2025-09-14 19:59:57,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(966.13074), np.float32(1245.7609), np.float32(1051.0581), np.float32(1242.9227), np.float32(1307.5602), np.float32(1154.4698), np.float32(1110.335), np.float32(999.66656), np.float32(1708.7672), np.float32(1172.7648)]
2025-09-14 19:59:57,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:59:57,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 3 minutes, 38 seconds)
2025-09-14 20:03:01,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:03:13,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1750.73267 ± 658.472
2025-09-14 20:03:13,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2455.3254), np.float32(2118.6165), np.float32(1226.7657), np.float32(2876.7634), np.float32(2197.062), np.float32(1177.1012), np.float32(876.10144), np.float32(1061.2927), np.float32(2202.3535), np.float32(1315.9447)]
2025-09-14 20:03:13,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:03:13,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (1750.73) for latency 24
2025-09-14 20:03:13,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 14 seconds)
2025-09-14 20:06:18,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:06:29,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1478.31848 ± 282.437
2025-09-14 20:06:29,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1377.9265), np.float32(1413.5327), np.float32(1766.8444), np.float32(1284.4786), np.float32(2082.797), np.float32(1306.0048), np.float32(1027.1539), np.float32(1443.697), np.float32(1374.4352), np.float32(1706.3135)]
2025-09-14 20:06:29,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:06:29,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 56 minutes, 57 seconds)
2025-09-14 20:09:35,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:09:47,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1324.73425 ± 262.316
2025-09-14 20:09:47,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1706.2218), np.float32(1195.7076), np.float32(1528.9313), np.float32(1420.2365), np.float32(959.2607), np.float32(956.2236), np.float32(1077.1165), np.float32(1241.5591), np.float32(1525.773), np.float32(1636.3131)]
2025-09-14 20:09:47,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:09:47,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 53 minutes)
2025-09-14 20:13:03,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:13:16,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1395.21570 ± 442.561
2025-09-14 20:13:16,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(947.2925), np.float32(991.472), np.float32(1550.1965), np.float32(1186.5631), np.float32(1317.2208), np.float32(1193.798), np.float32(2361.2585), np.float32(1322.5695), np.float32(1040.9963), np.float32(2040.7899)]
2025-09-14 20:13:16,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:13:16,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 49 minutes, 59 seconds)
2025-09-14 20:16:38,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:16:50,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1190.67188 ± 304.942
2025-09-14 20:16:50,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1012.518), np.float32(984.9073), np.float32(1041.412), np.float32(1046.1464), np.float32(1082.3582), np.float32(1250.5598), np.float32(895.6664), np.float32(1943.7262), np.float32(1556.3142), np.float32(1093.1105)]
2025-09-14 20:16:50,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:16:50,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 48 minutes, 5 seconds)
2025-09-14 20:20:09,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:20:21,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1402.16821 ± 397.688
2025-09-14 20:20:21,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(951.87366), np.float32(983.59674), np.float32(1297.8984), np.float32(1428.1168), np.float32(1270.2242), np.float32(1690.1688), np.float32(1365.7222), np.float32(2426.0078), np.float32(1410.9475), np.float32(1197.127)]
2025-09-14 20:20:21,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:20:21,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 46 minutes, 15 seconds)
2025-09-14 20:23:31,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:23:42,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1569.18420 ± 522.207
2025-09-14 20:23:42,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2135.4695), np.float32(918.3661), np.float32(1066.5072), np.float32(1027.2513), np.float32(2124.0552), np.float32(1173.3324), np.float32(1312.5228), np.float32(2041.0625), np.float32(1500.3639), np.float32(2392.9106)]
2025-09-14 20:23:42,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:23:42,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 43 minutes, 17 seconds)
2025-09-14 20:26:52,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:27:02,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1576.43713 ± 430.162
2025-09-14 20:27:02,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1624.8119), np.float32(1999.2971), np.float32(999.02606), np.float32(1022.5232), np.float32(2015.1317), np.float32(1249.7839), np.float32(1287.2089), np.float32(1337.2928), np.float32(2184.2217), np.float32(2045.074)]
2025-09-14 20:27:02,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:27:02,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 40 minutes, 3 seconds)
2025-09-14 20:30:11,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:30:21,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1513.96167 ± 394.072
2025-09-14 20:30:21,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1211.6846), np.float32(1506.9822), np.float32(1095.8055), np.float32(1401.58), np.float32(1040.5555), np.float32(2013.9418), np.float32(1604.5967), np.float32(1155.7134), np.float32(1855.2881), np.float32(2253.4692)]
2025-09-14 20:30:21,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:30:21,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 35 minutes, 42 seconds)
2025-09-14 20:33:27,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:33:38,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1461.61865 ± 502.539
2025-09-14 20:33:38,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1007.48505), np.float32(2687.4775), np.float32(1460.3895), np.float32(1017.63116), np.float32(1018.6556), np.float32(1361.8716), np.float32(1505.5905), np.float32(1006.2152), np.float32(1716.682), np.float32(1834.1888)]
2025-09-14 20:33:38,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:33:38,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 30 minutes, 38 seconds)
2025-09-14 20:36:52,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:37:03,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1296.24048 ± 256.027
2025-09-14 20:37:03,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1045.8029), np.float32(1429.4142), np.float32(1117.2825), np.float32(1118.4156), np.float32(1875.5208), np.float32(1466.2878), np.float32(1264.9073), np.float32(1200.5762), np.float32(971.31287), np.float32(1472.8844)]
2025-09-14 20:37:03,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:37:03,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 26 minutes, 50 seconds)
2025-09-14 20:40:20,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:40:32,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1586.71191 ± 495.082
2025-09-14 20:40:32,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1021.4506), np.float32(1329.654), np.float32(1759.214), np.float32(2509.743), np.float32(2168.4258), np.float32(1941.5737), np.float32(1073.7484), np.float32(1056.5011), np.float32(1787.3673), np.float32(1219.441)]
2025-09-14 20:40:32,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:40:32,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 24 minutes, 9 seconds)
2025-09-14 20:43:46,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:43:57,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1252.89209 ± 341.411
2025-09-14 20:43:57,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(966.4475), np.float32(1064.4675), np.float32(2201.4475), np.float32(1347.5387), np.float32(1015.49896), np.float32(1319.5892), np.float32(1002.3776), np.float32(1133.0251), np.float32(1262.0814), np.float32(1216.4469)]
2025-09-14 20:43:57,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:43:57,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 21 minutes, 11 seconds)
2025-09-14 20:47:01,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:47:13,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1367.35876 ± 293.278
2025-09-14 20:47:13,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1035.3899), np.float32(1240.6788), np.float32(1099.477), np.float32(1069.7725), np.float32(1394.4524), np.float32(1474.3801), np.float32(2038.6127), np.float32(1456.954), np.float32(1214.6641), np.float32(1649.2062)]
2025-09-14 20:47:13,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:47:13,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 17 minutes, 32 seconds)
2025-09-14 20:50:17,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:50:28,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1210.16309 ± 283.406
2025-09-14 20:50:28,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1270.0209), np.float32(1080.3356), np.float32(1125.9329), np.float32(1431.7487), np.float32(1042.7258), np.float32(975.6748), np.float32(965.88116), np.float32(1102.2439), np.float32(1963.6603), np.float32(1143.4076)]
2025-09-14 20:50:28,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:50:28,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 14 minutes, 6 seconds)
2025-09-14 20:53:34,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:53:45,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1219.17188 ± 382.859
2025-09-14 20:53:45,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1467.1289), np.float32(936.312), np.float32(1016.5768), np.float32(1086.1366), np.float32(762.2686), np.float32(2125.514), np.float32(1053.5903), np.float32(1061.9423), np.float32(1618.1396), np.float32(1064.1089)]
2025-09-14 20:53:45,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:53:45,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 10 minutes, 7 seconds)
2025-09-14 20:56:56,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 20:57:08,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1393.07886 ± 480.497
2025-09-14 20:57:08,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2620.8604), np.float32(1524.397), np.float32(1106.4047), np.float32(1705.9017), np.float32(1023.3525), np.float32(1029.897), np.float32(1128.2767), np.float32(1605.8787), np.float32(1234.0334), np.float32(951.786)]
2025-09-14 20:57:08,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:57:08,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 6 minutes, 22 seconds)
2025-09-14 21:00:22,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 21:00:33,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1470.42566 ± 484.334
2025-09-14 21:00:33,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(982.574), np.float32(2314.2976), np.float32(1270.1611), np.float32(1444.3473), np.float32(1156.4188), np.float32(1256.9028), np.float32(2429.703), np.float32(1098.2542), np.float32(1116.8909), np.float32(1634.7079)]
2025-09-14 21:00:33,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:00:33,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 3 minutes, 4 seconds)
2025-09-14 21:03:43,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 21:03:53,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1817.08142 ± 624.345
2025-09-14 21:03:53,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1325.2888), np.float32(1072.4059), np.float32(2833.3289), np.float32(1219.4629), np.float32(2384.1633), np.float32(1461.4967), np.float32(1045.8596), np.float32(2174.0938), np.float32(2429.174), np.float32(2225.5415)]
2025-09-14 21:03:53,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:03:53,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (1817.08) for latency 24
2025-09-14 21:03:53,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 1 second)
2025-09-14 21:06:58,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 21:07:09,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1241.47974 ± 314.870
2025-09-14 21:07:09,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1325.908), np.float32(1266.069), np.float32(1133.4923), np.float32(1143.413), np.float32(1106.4541), np.float32(1027.6481), np.float32(2142.1052), np.float32(1186.6736), np.float32(994.8181), np.float32(1088.2148)]
2025-09-14 21:07:09,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:07:09,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 56 minutes, 43 seconds)
2025-09-14 21:10:18,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 21:10:29,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1375.17847 ± 185.394
2025-09-14 21:10:29,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1710.6643), np.float32(1512.6101), np.float32(1414.0203), np.float32(1248.3448), np.float32(1203.9545), np.float32(1577.7279), np.float32(1124.1483), np.float32(1340.2103), np.float32(1149.4137), np.float32(1470.6897)]
2025-09-14 21:10:29,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:10:29,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 53 minutes, 33 seconds)
2025-09-14 21:13:35,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 21:13:46,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1381.32153 ± 480.707
2025-09-14 21:13:46,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(926.7113), np.float32(1376.9574), np.float32(1873.6859), np.float32(1519.694), np.float32(951.9259), np.float32(1875.5293), np.float32(907.6293), np.float32(913.5615), np.float32(2334.7188), np.float32(1132.8004)]
2025-09-14 21:13:46,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:13:46,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 49 minutes, 55 seconds)
2025-09-14 21:16:54,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 21:17:06,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1830.23828 ± 517.486
2025-09-14 21:17:06,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1780.2375), np.float32(1556.555), np.float32(2591.4207), np.float32(1983.5121), np.float32(1351.5881), np.float32(2456.9094), np.float32(2293.243), np.float32(1092.9153), np.float32(1093.0691), np.float32(2102.9321)]
2025-09-14 21:17:06,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:17:06,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (1830.24) for latency 24
2025-09-14 21:17:06,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 46 minutes, 19 seconds)
2025-09-14 21:20:21,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 21:20:32,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1334.90674 ± 319.019
2025-09-14 21:20:32,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1356.638), np.float32(1052.5981), np.float32(1308.4769), np.float32(1977.0276), np.float32(1317.1177), np.float32(1054.9147), np.float32(1063.6793), np.float32(975.9735), np.float32(1820.5269), np.float32(1422.114)]
2025-09-14 21:20:32,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:20:32,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 43 minutes, 18 seconds)
2025-09-14 21:23:44,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 21:23:56,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1298.46790 ± 255.539
2025-09-14 21:23:56,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(904.943), np.float32(1135.4207), np.float32(1130.4175), np.float32(1612.5104), np.float32(1144.5548), np.float32(1593.5284), np.float32(1531.6849), np.float32(1554.5201), np.float32(1400.4762), np.float32(976.622)]
2025-09-14 21:23:56,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:23:56,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 40 minutes, 16 seconds)
2025-09-14 21:27:02,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 21:27:13,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1410.90649 ± 286.595
2025-09-14 21:27:13,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1900.967), np.float32(1781.0892), np.float32(1058.9966), np.float32(1645.4254), np.float32(1242.6952), np.float32(1299.7511), np.float32(1023.75183), np.float32(1166.86), np.float32(1470.5713), np.float32(1518.9572)]
2025-09-14 21:27:13,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:27:13,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 36 minutes, 48 seconds)
2025-09-14 21:30:21,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 21:30:33,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1426.82886 ± 415.640
2025-09-14 21:30:33,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1078.8462), np.float32(1541.4226), np.float32(1290.131), np.float32(2393.1636), np.float32(1010.06866), np.float32(1569.0632), np.float32(1422.9822), np.float32(823.7334), np.float32(1423.637), np.float32(1715.2402)]
2025-09-14 21:30:33,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:30:33,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 33 minutes, 33 seconds)
2025-09-14 21:33:40,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 21:33:52,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1199.35327 ± 508.441
2025-09-14 21:33:52,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(948.98505), np.float32(-162.72185), np.float32(1711.0787), np.float32(1064.2007), np.float32(1556.1661), np.float32(1390.071), np.float32(1365.976), np.float32(1569.838), np.float32(1428.124), np.float32(1121.8159)]
2025-09-14 21:33:52,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:33:52,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 30 minutes, 11 seconds)
2025-09-14 21:36:56,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 21:37:09,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1710.39844 ± 520.894
2025-09-14 21:37:09,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1143.5807), np.float32(1405.4089), np.float32(1717.3784), np.float32(1701.8899), np.float32(2882.97), np.float32(2350.4226), np.float32(1152.8813), np.float32(1262.1002), np.float32(1730.977), np.float32(1756.3752)]
2025-09-14 21:37:09,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:37:09,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 26 minutes, 34 seconds)
2025-09-14 21:40:22,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 21:40:34,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1289.33484 ± 423.391
2025-09-14 21:40:34,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1241.0232), np.float32(1053.0438), np.float32(1191.7714), np.float32(1803.9344), np.float32(1811.4243), np.float32(371.62848), np.float32(1275.5988), np.float32(1418.472), np.float32(964.7602), np.float32(1761.6926)]
2025-09-14 21:40:34,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:40:34,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 23 minutes, 16 seconds)
2025-09-14 21:43:45,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 21:43:55,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1537.01184 ± 282.582
2025-09-14 21:43:55,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1265.7198), np.float32(1736.112), np.float32(1186.405), np.float32(1488.1445), np.float32(1715.9741), np.float32(1986.5211), np.float32(1610.1545), np.float32(1785.3197), np.float32(1029.0737), np.float32(1566.6923)]
2025-09-14 21:43:55,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:43:55,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 20 minutes, 1 second)
2025-09-14 21:46:53,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 21:47:03,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1316.70020 ± 268.931
2025-09-14 21:47:03,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1317.026), np.float32(1788.816), np.float32(1603.4507), np.float32(1025.7517), np.float32(1189.4934), np.float32(1065.0319), np.float32(1143.5299), np.float32(1566.2296), np.float32(961.84076), np.float32(1505.8313)]
2025-09-14 21:47:03,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:47:03,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 16 minutes, 30 seconds)
2025-09-14 21:49:56,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 21:50:06,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1358.73376 ± 333.899
2025-09-14 21:50:06,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2222.22), np.float32(1591.2106), np.float32(1462.2079), np.float32(1119.4105), np.float32(1064.822), np.float32(1376.1493), np.float32(1305.851), np.float32(1204.7689), np.float32(1221.3593), np.float32(1019.3386)]
2025-09-14 21:50:06,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:50:06,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 12 minutes, 58 seconds)
2025-09-14 21:52:58,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 21:53:08,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1492.81506 ± 298.243
2025-09-14 21:53:08,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1588.0759), np.float32(1481.9253), np.float32(1670.4027), np.float32(2038.8907), np.float32(1494.5287), np.float32(1312.2983), np.float32(1253.4946), np.float32(1879.9854), np.float32(1170.861), np.float32(1037.6871)]
2025-09-14 21:53:08,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:53:08,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 9 minutes, 35 seconds)
2025-09-14 21:56:01,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 21:56:11,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1606.02478 ± 426.832
2025-09-14 21:56:11,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1029.6732), np.float32(1334.6603), np.float32(1008.2688), np.float32(2016.2921), np.float32(1773.9669), np.float32(2089.988), np.float32(1983.9973), np.float32(1205.8557), np.float32(2157.172), np.float32(1460.3728)]
2025-09-14 21:56:11,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:56:11,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 6 minutes, 14 seconds)
2025-09-14 21:58:54,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 21:59:04,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1228.25403 ± 398.982
2025-09-14 21:59:04,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2305.7012), np.float32(1073.1669), np.float32(1011.6628), np.float32(886.2171), np.float32(1468.7075), np.float32(859.89075), np.float32(1303.4567), np.float32(1131.1632), np.float32(1045.2965), np.float32(1197.2776)]
2025-09-14 21:59:04,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:59:04,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 3 minutes, 1 second)
2025-09-14 22:01:48,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 24...
2025-09-14 22:01:58,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1674.62073 ± 522.183
2025-09-14 22:01:58,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2233.9111), np.float32(915.78094), np.float32(1513.313), np.float32(1830.5933), np.float32(1380.5297), np.float32(1376.9583), np.float32(2431.8218), np.float32(2519.1726), np.float32(1303.1814), np.float32(1240.945)]
2025-09-14 22:01:58,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 22:01:58,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1251 [DEBUG]: Training session finished
