2025-09-14 12:51:43,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1108 [DEBUG]: logdir: _logs/noise-eval-v2/halfcheetah/bpql-noise_0.200-delay_15
2025-09-14 12:51:43,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1109 [DEBUG]: trainer_prefix: noise-eval-v2/halfcheetah/bpql-noise_0.200-delay_15
2025-09-14 12:51:43,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'15': <latency_env.delayed_mdp.ConstantDelay object at 0x7f9481003c80>}
2025-09-14 12:51:43,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1111 [DEBUG]: using device: cpu
2025-09-14 12:51:43,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-14 12:51:43,372 baseline-bpql-noisepromille200-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=107, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-14 12:51:43,373 baseline-bpql-noisepromille200-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-14 12:51:45,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-14 12:51:45,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-14 12:54:18,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 12:54:25,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: -386.65381 ± 62.427
2025-09-14 12:54:25,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-416.70316), np.float32(-391.43793), np.float32(-445.57077), np.float32(-358.27173), np.float32(-274.23505), np.float32(-451.26068), np.float32(-483.74716), np.float32(-332.8489), np.float32(-393.53882), np.float32(-318.9239)]
2025-09-14 12:54:25,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:54:25,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (-386.65) for latency 15
2025-09-14 12:54:25,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 4 hours, 24 minutes, 20 seconds)
2025-09-14 12:57:00,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 12:57:07,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: -291.05927 ± 49.685
2025-09-14 12:57:07,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-319.6584), np.float32(-186.44202), np.float32(-264.33774), np.float32(-297.48526), np.float32(-377.51398), np.float32(-265.81357), np.float32(-320.4792), np.float32(-338.95206), np.float32(-268.14423), np.float32(-271.76642)]
2025-09-14 12:57:07,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:57:07,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (-291.06) for latency 15
2025-09-14 12:57:07,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 4 hours, 23 minutes, 28 seconds)
2025-09-14 12:59:43,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 12:59:50,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: -153.55583 ± 105.395
2025-09-14 12:59:50,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-39.51456), np.float32(-234.83235), np.float32(-201.82518), np.float32(-353.68594), np.float32(-44.716187), np.float32(-118.42602), np.float32(-22.111664), np.float32(-70.36488), np.float32(-196.86014), np.float32(-253.22139)]
2025-09-14 12:59:50,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:59:50,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (-153.56) for latency 15
2025-09-14 12:59:50,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 4 hours, 21 minutes, 41 seconds)
2025-09-14 13:02:26,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 13:02:33,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: -209.07866 ± 71.446
2025-09-14 13:02:33,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-58.175236), np.float32(-244.72826), np.float32(-234.12813), np.float32(-309.97763), np.float32(-210.1763), np.float32(-102.101425), np.float32(-194.81438), np.float32(-234.76134), np.float32(-237.20895), np.float32(-264.7149)]
2025-09-14 13:02:33,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:02:33,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 4 hours, 19 minutes, 9 seconds)
2025-09-14 13:05:08,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 13:05:15,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: -102.49089 ± 136.557
2025-09-14 13:05:15,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(19.779644), np.float32(-185.31702), np.float32(-167.12715), np.float32(-103.5054), np.float32(-7.0460763), np.float32(-148.40466), np.float32(162.67061), np.float32(-361.41098), np.float32(-193.04738), np.float32(-41.50057)]
2025-09-14 13:05:15,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:05:15,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (-102.49) for latency 15
2025-09-14 13:05:15,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 4 hours, 16 minutes, 29 seconds)
2025-09-14 13:07:50,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 13:07:57,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: -5.08287 ± 101.552
2025-09-14 13:07:57,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-187.17326), np.float32(175.93808), np.float32(14.667413), np.float32(-67.33426), np.float32(41.548183), np.float32(14.672974), np.float32(-63.369045), np.float32(83.70391), np.float32(-125.737656), np.float32(62.254982)]
2025-09-14 13:07:57,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:07:57,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (-5.08) for latency 15
2025-09-14 13:07:57,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 14 minutes, 22 seconds)
2025-09-14 13:10:32,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 13:10:39,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 28.62351 ± 181.806
2025-09-14 13:10:39,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-28.387693), np.float32(67.37924), np.float32(-423.97952), np.float32(219.56952), np.float32(-26.95802), np.float32(-8.030374), np.float32(-8.406948), np.float32(265.03897), np.float32(52.34055), np.float32(177.6694)]
2025-09-14 13:10:39,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:10:39,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (28.62) for latency 15
2025-09-14 13:10:39,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 11 minutes, 31 seconds)
2025-09-14 13:13:18,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 13:13:25,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 235.23909 ± 122.006
2025-09-14 13:13:25,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(151.35698), np.float32(329.9449), np.float32(453.4835), np.float32(193.42813), np.float32(186.1282), np.float32(231.92906), np.float32(210.66031), np.float32(102.72265), np.float32(423.3377), np.float32(69.39937)]
2025-09-14 13:13:25,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:13:25,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (235.24) for latency 15
2025-09-14 13:13:25,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 9 minutes, 42 seconds)
2025-09-14 13:16:00,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 13:16:07,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 264.33459 ± 48.244
2025-09-14 13:16:07,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(323.51483), np.float32(226.5289), np.float32(360.593), np.float32(304.277), np.float32(231.423), np.float32(284.72302), np.float32(224.48251), np.float32(236.3507), np.float32(243.48787), np.float32(207.96532)]
2025-09-14 13:16:07,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:16:07,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (264.33) for latency 15
2025-09-14 13:16:07,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 4 hours, 6 minutes, 58 seconds)
2025-09-14 13:18:43,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 13:18:50,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 332.70441 ± 256.555
2025-09-14 13:18:50,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(480.4817), np.float32(470.1939), np.float32(261.80518), np.float32(239.73517), np.float32(464.33453), np.float32(364.89972), np.float32(447.0198), np.float32(-388.76947), np.float32(496.5296), np.float32(490.81396)]
2025-09-14 13:18:50,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:18:50,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (332.70) for latency 15
2025-09-14 13:18:50,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 4 minutes, 36 seconds)
2025-09-14 13:21:25,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 13:21:32,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 587.30957 ± 72.971
2025-09-14 13:21:32,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(629.7864), np.float32(631.3607), np.float32(606.45526), np.float32(611.4955), np.float32(537.0879), np.float32(603.34357), np.float32(580.98303), np.float32(674.2373), np.float32(392.77774), np.float32(605.56854)]
2025-09-14 13:21:32,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:21:32,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (587.31) for latency 15
2025-09-14 13:21:32,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 1 minute, 52 seconds)
2025-09-14 13:24:07,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 13:24:14,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 557.93005 ± 270.892
2025-09-14 13:24:14,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(761.0503), np.float32(787.2081), np.float32(563.6275), np.float32(681.6605), np.float32(609.9584), np.float32(-206.85773), np.float32(543.92126), np.float32(677.68097), np.float32(684.1882), np.float32(476.86398)]
2025-09-14 13:24:14,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:24:14,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 3 hours, 59 minutes, 7 seconds)
2025-09-14 13:26:49,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 13:26:56,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 708.37439 ± 67.990
2025-09-14 13:26:56,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(756.9354), np.float32(603.4078), np.float32(760.9869), np.float32(757.3349), np.float32(786.8465), np.float32(663.74), np.float32(735.27014), np.float32(655.76605), np.float32(596.2197), np.float32(767.23676)]
2025-09-14 13:26:56,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:26:56,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (708.37) for latency 15
2025-09-14 13:26:56,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 3 hours, 55 minutes, 23 seconds)
2025-09-14 13:29:32,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 13:29:39,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 714.88306 ± 44.181
2025-09-14 13:29:39,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(748.0587), np.float32(668.7696), np.float32(793.8809), np.float32(686.8274), np.float32(764.74646), np.float32(646.6561), np.float32(687.31665), np.float32(693.95874), np.float32(745.27686), np.float32(713.33905)]
2025-09-14 13:29:39,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:29:39,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (714.88) for latency 15
2025-09-14 13:29:39,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 3 hours, 52 minutes, 41 seconds)
2025-09-14 13:32:14,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 13:32:21,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 821.53333 ± 120.889
2025-09-14 13:32:21,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(889.7898), np.float32(715.10144), np.float32(788.8449), np.float32(643.69104), np.float32(1086.7258), np.float32(723.409), np.float32(808.78796), np.float32(756.2285), np.float32(888.60706), np.float32(914.14734)]
2025-09-14 13:32:21,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:32:21,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (821.53) for latency 15
2025-09-14 13:32:21,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 3 hours, 49 minutes, 52 seconds)
2025-09-14 13:34:56,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 13:35:03,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 719.33832 ± 124.318
2025-09-14 13:35:03,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(737.205), np.float32(818.5009), np.float32(902.0436), np.float32(893.85834), np.float32(607.89294), np.float32(811.03577), np.float32(642.1724), np.float32(603.2806), np.float32(644.24695), np.float32(533.1463)]
2025-09-14 13:35:03,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:35:03,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 3 hours, 47 minutes, 7 seconds)
2025-09-14 13:37:39,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 13:37:45,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 785.02655 ± 83.646
2025-09-14 13:37:45,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(737.66895), np.float32(748.90753), np.float32(789.45465), np.float32(726.86725), np.float32(979.7652), np.float32(664.9022), np.float32(859.20557), np.float32(817.15326), np.float32(723.33624), np.float32(803.00464)]
2025-09-14 13:37:45,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:37:45,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 3 hours, 44 minutes, 32 seconds)
2025-09-14 13:40:21,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 13:40:28,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 858.50391 ± 132.460
2025-09-14 13:40:28,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1004.20953), np.float32(778.61456), np.float32(824.8341), np.float32(958.5821), np.float32(801.9159), np.float32(949.9406), np.float32(836.49365), np.float32(1077.3572), np.float32(603.97845), np.float32(749.11316)]
2025-09-14 13:40:28,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:40:28,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (858.50) for latency 15
2025-09-14 13:40:28,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 3 hours, 41 minutes, 47 seconds)
2025-09-14 13:43:03,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 13:43:10,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 900.34753 ± 117.464
2025-09-14 13:43:10,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(889.0248), np.float32(783.9583), np.float32(1126.7628), np.float32(976.1911), np.float32(910.06726), np.float32(1056.3665), np.float32(907.1967), np.float32(744.79645), np.float32(815.83673), np.float32(793.2754)]
2025-09-14 13:43:10,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:43:10,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (900.35) for latency 15
2025-09-14 13:43:10,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 3 hours, 38 minutes, 59 seconds)
2025-09-14 13:45:45,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 13:45:52,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 664.20923 ± 423.542
2025-09-14 13:45:52,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(913.8683), np.float32(861.0306), np.float32(768.10767), np.float32(16.580254), np.float32(1191.657), np.float32(705.80853), np.float32(747.28766), np.float32(931.7031), np.float32(-278.98093), np.float32(785.03046)]
2025-09-14 13:45:52,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:45:52,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 3 hours, 36 minutes, 17 seconds)
2025-09-14 13:48:28,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 13:48:35,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 954.27283 ± 168.481
2025-09-14 13:48:35,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(969.3111), np.float32(1071.2181), np.float32(964.7564), np.float32(1311.1707), np.float32(809.57196), np.float32(700.877), np.float32(1093.0443), np.float32(778.3447), np.float32(972.5191), np.float32(871.9146)]
2025-09-14 13:48:35,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:48:35,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (954.27) for latency 15
2025-09-14 13:48:35,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 3 hours, 33 minutes, 42 seconds)
2025-09-14 13:51:11,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 13:51:18,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 869.54724 ± 57.911
2025-09-14 13:51:18,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(944.12946), np.float32(815.0598), np.float32(850.32697), np.float32(800.9882), np.float32(965.9822), np.float32(840.72144), np.float32(887.9751), np.float32(942.35815), np.float32(821.83954), np.float32(826.0919)]
2025-09-14 13:51:18,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:51:18,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 3 hours, 31 minutes, 10 seconds)
2025-09-14 13:53:49,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 13:53:56,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 940.54181 ± 146.999
2025-09-14 13:53:56,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(856.28156), np.float32(818.0258), np.float32(901.2297), np.float32(852.15454), np.float32(1173.5399), np.float32(900.1267), np.float32(1040.7646), np.float32(781.7187), np.float32(849.3058), np.float32(1232.271)]
2025-09-14 13:53:56,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:53:56,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 3 hours, 27 minutes, 25 seconds)
2025-09-14 13:56:23,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 13:56:30,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 993.32520 ± 153.002
2025-09-14 13:56:30,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1107.034), np.float32(1030.089), np.float32(938.9742), np.float32(923.2722), np.float32(958.85205), np.float32(1177.2241), np.float32(903.1534), np.float32(749.82806), np.float32(852.1676), np.float32(1292.6575)]
2025-09-14 13:56:30,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:56:30,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (993.33) for latency 15
2025-09-14 13:56:30,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 3 hours, 22 minutes, 44 seconds)
2025-09-14 13:58:58,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 13:59:05,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 958.11523 ± 325.635
2025-09-14 13:59:05,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(777.9651), np.float32(827.35504), np.float32(640.0599), np.float32(1161.0428), np.float32(1066.3401), np.float32(923.3319), np.float32(1509.5795), np.float32(892.65576), np.float32(1410.1936), np.float32(372.62885)]
2025-09-14 13:59:05,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:59:05,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 3 hours, 18 minutes, 5 seconds)
2025-09-14 14:01:32,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:01:39,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 796.49255 ± 353.594
2025-09-14 14:01:39,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(792.884), np.float32(1114.1023), np.float32(1235.2471), np.float32(-169.32867), np.float32(856.9943), np.float32(859.008), np.float32(771.8766), np.float32(842.92896), np.float32(755.0286), np.float32(906.1836)]
2025-09-14 14:01:39,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:01:39,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 13 minutes, 27 seconds)
2025-09-14 14:04:07,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:04:14,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 929.04187 ± 107.846
2025-09-14 14:04:14,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(880.4094), np.float32(855.12366), np.float32(924.871), np.float32(879.76074), np.float32(922.8129), np.float32(914.5423), np.float32(840.90546), np.float32(866.6423), np.float32(1232.7207), np.float32(972.63043)]
2025-09-14 14:04:14,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:04:14,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 8 minutes, 56 seconds)
2025-09-14 14:06:41,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:06:48,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1022.39270 ± 144.471
2025-09-14 14:06:48,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(983.1027), np.float32(1235.7383), np.float32(1188.3438), np.float32(1112.1373), np.float32(1010.87317), np.float32(1139.4768), np.float32(924.41907), np.float32(736.2183), np.float32(1015.00555), np.float32(878.611)]
2025-09-14 14:06:48,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:06:48,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1022.39) for latency 15
2025-09-14 14:06:48,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 3 hours, 5 minutes, 17 seconds)
2025-09-14 14:09:16,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:09:23,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 859.17352 ± 176.505
2025-09-14 14:09:23,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(584.66486), np.float32(964.34357), np.float32(948.2831), np.float32(1110.0105), np.float32(849.8289), np.float32(997.2258), np.float32(900.9758), np.float32(859.412), np.float32(494.70575), np.float32(882.286)]
2025-09-14 14:09:23,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:09:23,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 2 minutes, 53 seconds)
2025-09-14 14:11:50,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:11:57,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 958.58514 ± 179.368
2025-09-14 14:11:57,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(858.8951), np.float32(719.65546), np.float32(1312.2992), np.float32(836.6772), np.float32(1225.0038), np.float32(784.8134), np.float32(1021.8966), np.float32(892.22723), np.float32(917.22363), np.float32(1017.16034)]
2025-09-14 14:11:57,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:11:57,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 9 seconds)
2025-09-14 14:14:27,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:14:34,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 894.89502 ± 139.767
2025-09-14 14:14:34,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(810.64044), np.float32(1019.3818), np.float32(873.47656), np.float32(1163.2133), np.float32(1084.6097), np.float32(759.1846), np.float32(867.0202), np.float32(853.4972), np.float32(815.7263), np.float32(702.1998)]
2025-09-14 14:14:34,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:14:34,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 58 minutes, 9 seconds)
2025-09-14 14:17:04,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:17:11,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1061.91345 ± 162.144
2025-09-14 14:17:11,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(955.03546), np.float32(1367.4401), np.float32(925.30804), np.float32(1056.1909), np.float32(1000.6598), np.float32(1373.1527), np.float32(967.61334), np.float32(1086.5066), np.float32(917.7786), np.float32(969.4488)]
2025-09-14 14:17:11,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:17:11,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1061.91) for latency 15
2025-09-14 14:17:11,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 56 minutes, 5 seconds)
2025-09-14 14:19:42,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:19:49,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1036.09302 ± 115.001
2025-09-14 14:19:49,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1039.6951), np.float32(1176.6738), np.float32(997.9757), np.float32(878.5313), np.float32(1238.8287), np.float32(854.5746), np.float32(1028.9027), np.float32(1000.6766), np.float32(1008.56134), np.float32(1136.5107)]
2025-09-14 14:19:49,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:19:49,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 54 minutes, 27 seconds)
2025-09-14 14:22:20,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:22:26,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 967.72510 ± 204.900
2025-09-14 14:22:26,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1126.01), np.float32(968.18774), np.float32(818.2606), np.float32(1160.6007), np.float32(833.8075), np.float32(631.9891), np.float32(739.4167), np.float32(1208.0746), np.float32(1271.6239), np.float32(919.2802)]
2025-09-14 14:22:26,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:22:27,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 52 minutes, 26 seconds)
2025-09-14 14:24:59,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:25:05,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1070.79956 ± 264.880
2025-09-14 14:25:05,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(804.99884), np.float32(1080.3872), np.float32(904.7806), np.float32(908.82434), np.float32(1683.0077), np.float32(1374.1931), np.float32(1179.5613), np.float32(1070.2784), np.float32(821.3222), np.float32(880.64154)]
2025-09-14 14:25:05,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:25:05,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1070.80) for latency 15
2025-09-14 14:25:05,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 50 minutes, 50 seconds)
2025-09-14 14:27:36,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:27:43,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 837.55432 ± 210.528
2025-09-14 14:27:43,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(731.638), np.float32(740.9636), np.float32(915.878), np.float32(857.92993), np.float32(1005.1365), np.float32(389.97482), np.float32(835.3683), np.float32(817.78894), np.float32(1269.5598), np.float32(811.3055)]
2025-09-14 14:27:43,439 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:27:43,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 48 minutes, 23 seconds)
2025-09-14 14:30:14,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:30:21,356 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1216.78979 ± 195.139
2025-09-14 14:30:21,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(901.9114), np.float32(1322.7654), np.float32(1054.7357), np.float32(1135.8116), np.float32(1545.131), np.float32(1338.9005), np.float32(1135.0935), np.float32(1365.9304), np.float32(979.13403), np.float32(1388.4851)]
2025-09-14 14:30:21,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:30:21,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1216.79) for latency 15
2025-09-14 14:30:21,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 45 minutes, 51 seconds)
2025-09-14 14:32:50,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:32:57,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1214.83472 ± 270.435
2025-09-14 14:32:57,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(854.9157), np.float32(1685.9039), np.float32(951.3661), np.float32(1312.3761), np.float32(1446.6932), np.float32(1530.5913), np.float32(938.1809), np.float32(955.3284), np.float32(1199.7971), np.float32(1273.1945)]
2025-09-14 14:32:57,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:32:57,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 42 minutes, 50 seconds)
2025-09-14 14:35:25,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:35:32,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1129.03198 ± 292.689
2025-09-14 14:35:32,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1391.0083), np.float32(1462.2917), np.float32(869.17676), np.float32(808.15314), np.float32(842.4805), np.float32(736.61035), np.float32(1504.3998), np.float32(1332.6873), np.float32(966.3847), np.float32(1377.1268)]
2025-09-14 14:35:32,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:35:32,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 39 minutes, 38 seconds)
2025-09-14 14:38:02,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:38:08,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1082.58911 ± 218.611
2025-09-14 14:38:08,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(812.81506), np.float32(966.02765), np.float32(1050.9363), np.float32(1282.9036), np.float32(877.27527), np.float32(1338.2358), np.float32(901.8048), np.float32(1363.836), np.float32(862.16876), np.float32(1369.8865)]
2025-09-14 14:38:08,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:38:08,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 36 minutes, 36 seconds)
2025-09-14 14:40:38,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:40:45,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1088.85168 ± 261.619
2025-09-14 14:40:45,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(824.61414), np.float32(1269.8787), np.float32(755.2387), np.float32(898.8094), np.float32(1065.08), np.float32(905.06335), np.float32(1631.2166), np.float32(1411.8322), np.float32(1050.7252), np.float32(1076.0575)]
2025-09-14 14:40:45,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:40:45,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 33 minutes, 46 seconds)
2025-09-14 14:43:15,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:43:22,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1114.09973 ± 254.120
2025-09-14 14:43:22,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1050.6842), np.float32(1662.3075), np.float32(1007.6944), np.float32(867.2532), np.float32(1064.872), np.float32(1309.8903), np.float32(1196.9633), np.float32(783.9198), np.float32(862.22504), np.float32(1335.1871)]
2025-09-14 14:43:22,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:43:22,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 31 minutes, 3 seconds)
2025-09-14 14:45:52,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:45:59,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1142.13745 ± 289.337
2025-09-14 14:45:59,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1526.6945), np.float32(993.6585), np.float32(991.69934), np.float32(911.3844), np.float32(941.16003), np.float32(799.28217), np.float32(1200.0549), np.float32(1628.9896), np.float32(912.0358), np.float32(1516.4161)]
2025-09-14 14:45:59,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:45:59,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 28 minutes, 37 seconds)
2025-09-14 14:48:29,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:48:36,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1141.27185 ± 310.926
2025-09-14 14:48:36,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1087.8984), np.float32(1120.2706), np.float32(850.2486), np.float32(1595.9832), np.float32(1658.7526), np.float32(922.9182), np.float32(868.4756), np.float32(831.778), np.float32(948.7619), np.float32(1527.632)]
2025-09-14 14:48:36,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:48:36,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 26 minutes, 26 seconds)
2025-09-14 14:51:03,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:51:10,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1135.26843 ± 242.915
2025-09-14 14:51:10,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(953.0571), np.float32(1303.811), np.float32(1319.2737), np.float32(1053.8217), np.float32(956.52637), np.float32(1238.7115), np.float32(1025.5775), np.float32(1162.4299), np.float32(705.95746), np.float32(1633.5186)]
2025-09-14 14:51:10,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:51:10,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 23 minutes, 19 seconds)
2025-09-14 14:53:37,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:53:44,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1113.06799 ± 342.805
2025-09-14 14:53:44,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(957.29034), np.float32(932.86536), np.float32(1026.9948), np.float32(1513.131), np.float32(980.7775), np.float32(776.11914), np.float32(753.4219), np.float32(1829.6903), np.float32(1470.0366), np.float32(890.35315)]
2025-09-14 14:53:44,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:53:44,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 20 minutes, 16 seconds)
2025-09-14 14:56:12,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:56:18,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1090.00623 ± 191.012
2025-09-14 14:56:18,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1181.8583), np.float32(779.19037), np.float32(997.0476), np.float32(1061.0345), np.float32(1266.55), np.float32(849.48553), np.float32(898.1576), np.float32(1279.8007), np.float32(1346.8186), np.float32(1240.1189)]
2025-09-14 14:56:18,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:56:18,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 17 minutes, 8 seconds)
2025-09-14 14:58:46,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 14:58:53,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1269.65198 ± 300.226
2025-09-14 14:58:53,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1372.1304), np.float32(1379.3184), np.float32(1555.3123), np.float32(907.41534), np.float32(1010.979), np.float32(1099.0448), np.float32(1189.6152), np.float32(1967.3834), np.float32(997.49316), np.float32(1217.8274)]
2025-09-14 14:58:53,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:58:53,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1269.65) for latency 15
2025-09-14 14:58:53,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 14 minutes, 10 seconds)
2025-09-14 15:01:21,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:01:28,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1151.68420 ± 220.001
2025-09-14 15:01:28,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1181.6418), np.float32(935.1472), np.float32(1641.5023), np.float32(1101.5493), np.float32(804.1998), np.float32(1347.9712), np.float32(1232.7639), np.float32(1129.6014), np.float32(986.0803), np.float32(1156.3848)]
2025-09-14 15:01:28,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:01:28,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 11 minutes, 14 seconds)
2025-09-14 15:03:56,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:04:03,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1254.09680 ± 258.500
2025-09-14 15:04:03,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1516.8204), np.float32(1144.1351), np.float32(1557.6882), np.float32(857.7333), np.float32(1137.0481), np.float32(873.96466), np.float32(1259.5786), np.float32(1649.6648), np.float32(1166.1543), np.float32(1378.1808)]
2025-09-14 15:04:03,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:04:03,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 8 minutes, 50 seconds)
2025-09-14 15:06:12,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:06:19,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1020.30841 ± 149.959
2025-09-14 15:06:19,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(986.5072), np.float32(1005.8536), np.float32(960.9193), np.float32(737.1181), np.float32(957.7939), np.float32(1016.80237), np.float32(1314.7208), np.float32(1212.2887), np.float32(928.3712), np.float32(1082.7092)]
2025-09-14 15:06:19,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:06:19,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 3 minutes, 17 seconds)
2025-09-14 15:08:26,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:08:33,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1056.09033 ± 233.892
2025-09-14 15:08:33,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(882.83966), np.float32(870.0483), np.float32(899.48126), np.float32(1504.7402), np.float32(868.3716), np.float32(1245.5912), np.float32(970.4735), np.float32(993.7017), np.float32(887.639), np.float32(1438.0168)]
2025-09-14 15:08:33,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:08:33,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 57 minutes, 28 seconds)
2025-09-14 15:10:41,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:10:48,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1155.84509 ± 462.490
2025-09-14 15:10:48,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(894.03986), np.float32(988.00867), np.float32(1052.4823), np.float32(932.27057), np.float32(559.62085), np.float32(1236.3478), np.float32(2028.4042), np.float32(990.96497), np.float32(858.6379), np.float32(2017.6737)]
2025-09-14 15:10:48,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:10:48,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 51 minutes, 57 seconds)
2025-09-14 15:12:53,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:12:59,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1074.07715 ± 209.211
2025-09-14 15:12:59,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1299.501), np.float32(884.7483), np.float32(1443.5305), np.float32(944.5522), np.float32(1061.3196), np.float32(946.48065), np.float32(890.90076), np.float32(1107.458), np.float32(810.44727), np.float32(1351.8333)]
2025-09-14 15:12:59,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:12:59,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 45 minutes, 59 seconds)
2025-09-14 15:15:04,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:15:11,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1384.69263 ± 637.023
2025-09-14 15:15:11,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2205.3325), np.float32(2354.9922), np.float32(222.01219), np.float32(1366.644), np.float32(858.2287), np.float32(1196.9567), np.float32(1763.4707), np.float32(750.70184), np.float32(1273.9792), np.float32(1854.6072)]
2025-09-14 15:15:11,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:15:11,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1384.69) for latency 15
2025-09-14 15:15:11,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 40 minutes, 6 seconds)
2025-09-14 15:17:15,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:17:22,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1271.06140 ± 318.346
2025-09-14 15:17:22,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(886.50195), np.float32(1153.6494), np.float32(1001.5001), np.float32(1320.4131), np.float32(950.8906), np.float32(1048.517), np.float32(1585.9841), np.float32(1280.183), np.float32(1944.926), np.float32(1538.0498)]
2025-09-14 15:17:22,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:17:22,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 37 minutes, 15 seconds)
2025-09-14 15:19:26,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:19:33,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1146.93970 ± 285.345
2025-09-14 15:19:33,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1425.6765), np.float32(761.34454), np.float32(983.315), np.float32(919.8065), np.float32(1430.6696), np.float32(1223.8588), np.float32(1652.1384), np.float32(1275.0223), np.float32(1017.6606), np.float32(779.90454)]
2025-09-14 15:19:33,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:19:33,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 34 minutes, 40 seconds)
2025-09-14 15:21:39,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:21:46,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1385.50098 ± 424.498
2025-09-14 15:21:46,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1090.8739), np.float32(982.4973), np.float32(1758.7507), np.float32(931.8559), np.float32(1949.5481), np.float32(934.42053), np.float32(922.31506), np.float32(1610.5928), np.float32(1917.6515), np.float32(1756.5048)]
2025-09-14 15:21:46,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:21:46,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1385.50) for latency 15
2025-09-14 15:21:46,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 32 minutes, 6 seconds)
2025-09-14 15:24:07,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:24:13,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1068.83472 ± 340.306
2025-09-14 15:24:13,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1129.9783), np.float32(862.6903), np.float32(1556.4144), np.float32(899.3352), np.float32(932.30237), np.float32(1016.5626), np.float32(1523.1973), np.float32(971.2004), np.float32(1422.6846), np.float32(373.98166)]
2025-09-14 15:24:13,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:24:13,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 32 minutes, 7 seconds)
2025-09-14 15:26:36,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:26:42,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1284.52979 ± 282.357
2025-09-14 15:26:42,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1025.5806), np.float32(969.2103), np.float32(1466.3431), np.float32(1188.0743), np.float32(1881.5414), np.float32(1229.2246), np.float32(885.5168), np.float32(1270.2263), np.float32(1417.1982), np.float32(1512.3821)]
2025-09-14 15:26:42,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:26:42,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 32 minutes, 13 seconds)
2025-09-14 15:29:11,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:29:18,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1247.23401 ± 333.288
2025-09-14 15:29:18,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1048.5134), np.float32(1371.9055), np.float32(845.6417), np.float32(1047.45), np.float32(961.9853), np.float32(1596.213), np.float32(1518.4293), np.float32(1960.2491), np.float32(1053.4524), np.float32(1068.4996)]
2025-09-14 15:29:18,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:29:18,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 33 minutes, 4 seconds)
2025-09-14 15:31:48,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:31:55,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1242.87573 ± 320.871
2025-09-14 15:31:55,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1221.5271), np.float32(1382.2783), np.float32(840.535), np.float32(1506.8744), np.float32(913.9941), np.float32(1382.466), np.float32(949.5054), np.float32(1135.6313), np.float32(1976.1892), np.float32(1119.757)]
2025-09-14 15:31:55,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:31:55,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 33 minutes, 57 seconds)
2025-09-14 15:34:02,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:34:09,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1430.56897 ± 427.064
2025-09-14 15:34:09,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1093.5562), np.float32(2008.5847), np.float32(1293.4059), np.float32(1285.3491), np.float32(1630.4536), np.float32(865.2687), np.float32(868.31415), np.float32(1284.3278), np.float32(1895.2535), np.float32(2081.176)]
2025-09-14 15:34:09,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:34:09,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1430.57) for latency 15
2025-09-14 15:34:09,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 31 minutes, 39 seconds)
2025-09-14 15:36:16,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:36:23,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1361.11646 ± 350.930
2025-09-14 15:36:23,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1717.8782), np.float32(954.30536), np.float32(932.6981), np.float32(1201.6722), np.float32(1670.8541), np.float32(958.6039), np.float32(1080.9034), np.float32(1883.9568), np.float32(1625.6941), np.float32(1584.5988)]
2025-09-14 15:36:23,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:36:23,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 27 minutes, 34 seconds)
2025-09-14 15:38:31,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:38:37,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1177.82349 ± 306.643
2025-09-14 15:38:37,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1005.49603), np.float32(1214.3335), np.float32(832.3837), np.float32(1816.5994), np.float32(1646.1844), np.float32(1247.3578), np.float32(862.1666), np.float32(1069.7009), np.float32(1110.868), np.float32(973.14417)]
2025-09-14 15:38:37,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:38:37,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 23 minutes, 25 seconds)
2025-09-14 15:40:45,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:40:51,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1520.04028 ± 260.728
2025-09-14 15:40:51,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1986.354), np.float32(1132.5222), np.float32(1446.0793), np.float32(1415.337), np.float32(1523.8257), np.float32(1408.7521), np.float32(1456.7506), np.float32(2022.0068), np.float32(1386.8698), np.float32(1421.9055)]
2025-09-14 15:40:51,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:40:51,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1520.04) for latency 15
2025-09-14 15:40:51,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 18 minutes, 34 seconds)
2025-09-14 15:42:58,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:43:05,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1331.68164 ± 274.490
2025-09-14 15:43:05,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1827.3405), np.float32(1206.5242), np.float32(1193.4596), np.float32(1099.0988), np.float32(1616.7552), np.float32(1320.0802), np.float32(858.62714), np.float32(1456.8398), np.float32(1150.7921), np.float32(1587.2993)]
2025-09-14 15:43:05,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:43:05,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 13 minutes, 44 seconds)
2025-09-14 15:45:12,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:45:19,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1233.98499 ± 172.781
2025-09-14 15:45:19,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1001.8013), np.float32(1226.7231), np.float32(1307.0327), np.float32(1550.186), np.float32(1320.3232), np.float32(1258.6055), np.float32(1267.4001), np.float32(894.7395), np.float32(1172.9358), np.float32(1340.1028)]
2025-09-14 15:45:19,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:45:19,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 11 minutes, 28 seconds)
2025-09-14 15:47:26,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:47:33,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1359.73987 ± 360.408
2025-09-14 15:47:33,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1384.2302), np.float32(1628.0895), np.float32(825.5464), np.float32(1498.1246), np.float32(1049.3884), np.float32(971.79755), np.float32(988.4975), np.float32(1577.5867), np.float32(1967.514), np.float32(1706.6233)]
2025-09-14 15:47:33,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:47:33,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 9 minutes, 10 seconds)
2025-09-14 15:49:40,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:49:47,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1463.72681 ± 271.881
2025-09-14 15:49:47,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1218.7678), np.float32(2111.2773), np.float32(1261.1993), np.float32(1177.8171), np.float32(1481.1917), np.float32(1652.0597), np.float32(1444.3168), np.float32(1596.459), np.float32(1179.5651), np.float32(1514.613)]
2025-09-14 15:49:47,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:49:47,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 6 minutes, 55 seconds)
2025-09-14 15:51:58,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:52:05,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1439.29224 ± 599.232
2025-09-14 15:52:05,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(996.7585), np.float32(2456.4524), np.float32(713.8329), np.float32(1902.8577), np.float32(1636.6937), np.float32(909.6897), np.float32(836.8282), np.float32(1545.6611), np.float32(2326.5698), np.float32(1067.579)]
2025-09-14 15:52:05,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:52:05,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 5 minutes, 4 seconds)
2025-09-14 15:54:27,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:54:33,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1269.46509 ± 356.337
2025-09-14 15:54:33,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1379.3925), np.float32(1219.5969), np.float32(1275.1617), np.float32(829.44006), np.float32(1062.5675), np.float32(1774.9258), np.float32(1019.0857), np.float32(1075.6771), np.float32(1015.65875), np.float32(2043.1442)]
2025-09-14 15:54:33,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:54:33,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 4 minutes, 13 seconds)
2025-09-14 15:56:57,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:57:03,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1294.12415 ± 367.417
2025-09-14 15:57:03,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1824.7421), np.float32(988.2576), np.float32(874.3853), np.float32(1601.768), np.float32(1676.8702), np.float32(870.41815), np.float32(1534.7188), np.float32(1601.9677), np.float32(1112.6288), np.float32(855.48456)]
2025-09-14 15:57:03,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:57:03,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 3 minutes, 22 seconds)
2025-09-14 15:59:35,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 15:59:42,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1325.45581 ± 312.784
2025-09-14 15:59:42,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1142.9384), np.float32(1101.8177), np.float32(1341.6554), np.float32(1817.975), np.float32(1681.1163), np.float32(1447.9326), np.float32(1701.4664), np.float32(805.36365), np.float32(1065.4332), np.float32(1148.8585)]
2025-09-14 15:59:42,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:59:42,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 3 minutes, 12 seconds)
2025-09-14 16:02:14,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 16:02:20,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1208.86401 ± 326.243
2025-09-14 16:02:20,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(943.22723), np.float32(1761.4419), np.float32(953.0587), np.float32(1252.3007), np.float32(808.08453), np.float32(1329.443), np.float32(1829.2351), np.float32(1023.788), np.float32(1109.306), np.float32(1078.7563)]
2025-09-14 16:02:20,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:02:20,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 2 minutes, 49 seconds)
2025-09-14 16:04:52,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 16:04:59,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1537.73474 ± 538.826
2025-09-14 16:04:59,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(855.3906), np.float32(2250.0818), np.float32(1102.1173), np.float32(2034.0128), np.float32(1752.8477), np.float32(1050.7714), np.float32(942.8903), np.float32(1714.3295), np.float32(2405.8523), np.float32(1269.0538)]
2025-09-14 16:04:59,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:04:59,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1537.73) for latency 15
2025-09-14 16:04:59,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 1 minute, 57 seconds)
2025-09-14 16:07:31,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 16:07:38,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1288.69397 ± 304.458
2025-09-14 16:07:38,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1125.376), np.float32(1305.6212), np.float32(849.42957), np.float32(1771.4489), np.float32(1270.6362), np.float32(1561.29), np.float32(879.4626), np.float32(1247.0006), np.float32(1133.9094), np.float32(1742.7644)]
2025-09-14 16:07:38,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:07:38,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 10 seconds)
2025-09-14 16:09:49,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 16:09:56,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1482.66809 ± 511.375
2025-09-14 16:09:56,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(869.47363), np.float32(2120.165), np.float32(2521.2844), np.float32(1337.4926), np.float32(1565.147), np.float32(1674.473), np.float32(1640.8794), np.float32(1011.792), np.float32(875.8846), np.float32(1210.089)]
2025-09-14 16:09:56,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:09:56,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 56 minutes, 38 seconds)
2025-09-14 16:12:04,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 16:12:11,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1192.34448 ± 231.223
2025-09-14 16:12:11,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1090.8932), np.float32(1050.492), np.float32(1491.9205), np.float32(1000.8599), np.float32(914.9335), np.float32(1666.4618), np.float32(1346.3998), np.float32(1233.6285), np.float32(969.98816), np.float32(1157.8667)]
2025-09-14 16:12:11,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:12:11,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 52 minutes, 25 seconds)
2025-09-14 16:14:19,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 16:14:25,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1654.03320 ± 602.599
2025-09-14 16:14:25,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1824.5654), np.float32(1000.76263), np.float32(2534.3125), np.float32(1704.8363), np.float32(2336.0469), np.float32(779.87445), np.float32(1290.7941), np.float32(2456.8965), np.float32(1580.0057), np.float32(1032.2369)]
2025-09-14 16:14:25,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:14:25,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1654.03) for latency 15
2025-09-14 16:14:25,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 48 minutes, 20 seconds)
2025-09-14 16:16:33,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 16:16:40,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1402.05481 ± 367.950
2025-09-14 16:16:40,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1250.6782), np.float32(2047.2804), np.float32(1894.9878), np.float32(1289.2084), np.float32(1464.7019), np.float32(1146.0199), np.float32(1334.2324), np.float32(902.25037), np.float32(1750.9747), np.float32(940.2135)]
2025-09-14 16:16:40,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:16:40,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 44 minutes, 24 seconds)
2025-09-14 16:18:48,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 16:18:55,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1272.20032 ± 367.693
2025-09-14 16:18:55,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1121.6455), np.float32(924.2387), np.float32(1051.9148), np.float32(1621.3708), np.float32(875.2933), np.float32(1964.4166), np.float32(974.6178), np.float32(1699.2598), np.float32(991.0105), np.float32(1498.2341)]
2025-09-14 16:18:55,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:18:55,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 40 minutes, 35 seconds)
2025-09-14 16:21:03,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 16:21:09,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1561.49341 ± 509.101
2025-09-14 16:21:09,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1747.7919), np.float32(2412.5344), np.float32(1272.9052), np.float32(1254.5962), np.float32(1956.1074), np.float32(1098.5334), np.float32(1995.0449), np.float32(955.8736), np.float32(2056.7217), np.float32(864.8252)]
2025-09-14 16:21:09,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:21:09,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 38 minutes, 10 seconds)
2025-09-14 16:23:17,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 16:23:24,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1181.92114 ± 244.030
2025-09-14 16:23:24,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1429.0797), np.float32(1584.4321), np.float32(1020.8058), np.float32(1533.491), np.float32(1068.0938), np.float32(878.68164), np.float32(970.1993), np.float32(1307.3612), np.float32(1031.7489), np.float32(995.317)]
2025-09-14 16:23:24,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:23:24,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 35 minutes, 52 seconds)
2025-09-14 16:25:31,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 16:25:38,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1523.28064 ± 377.568
2025-09-14 16:25:38,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1861.0422), np.float32(1293.3073), np.float32(1225.4825), np.float32(996.15356), np.float32(2224.5364), np.float32(1788.501), np.float32(1136.2428), np.float32(1879.0741), np.float32(1536.2715), np.float32(1292.195)]
2025-09-14 16:25:38,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:25:38,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 33 minutes, 38 seconds)
2025-09-14 16:27:46,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 16:27:52,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1266.45996 ± 452.486
2025-09-14 16:27:52,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(858.38806), np.float32(2161.9321), np.float32(1078.9065), np.float32(872.9387), np.float32(840.3946), np.float32(842.77405), np.float32(1917.0026), np.float32(1235.7626), np.float32(1588.6595), np.float32(1267.8408)]
2025-09-14 16:27:52,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:27:52,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 31 minutes, 22 seconds)
2025-09-14 16:30:00,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 16:30:07,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1179.46448 ± 488.345
2025-09-14 16:30:07,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1600.1963), np.float32(988.34973), np.float32(997.1694), np.float32(923.8545), np.float32(312.03268), np.float32(1568.9457), np.float32(934.8459), np.float32(787.651), np.float32(2041.4071), np.float32(1640.1923)]
2025-09-14 16:30:07,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:30:07,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 29 minutes, 7 seconds)
2025-09-14 16:32:14,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 16:32:21,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1419.81482 ± 458.874
2025-09-14 16:32:21,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1211.0387), np.float32(1202.2808), np.float32(827.09454), np.float32(2081.9915), np.float32(1842.9111), np.float32(1733.6699), np.float32(997.40216), np.float32(805.1703), np.float32(1452.1401), np.float32(2044.4496)]
2025-09-14 16:32:21,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:32:21,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 26 minutes, 52 seconds)
2025-09-14 16:34:43,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 16:34:50,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1292.92883 ± 374.302
2025-09-14 16:34:50,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1123.7388), np.float32(1355.6613), np.float32(1820.6003), np.float32(1229.1359), np.float32(856.8871), np.float32(1210.0482), np.float32(1171.5459), np.float32(2138.9019), np.float32(1013.8676), np.float32(1008.9011)]
2025-09-14 16:34:50,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:34:50,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 25 minutes, 9 seconds)
2025-09-14 16:37:21,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 16:37:28,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1297.17847 ± 437.667
2025-09-14 16:37:28,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1931.2872), np.float32(1005.481), np.float32(1636.2036), np.float32(871.33795), np.float32(1088.8214), np.float32(2120.5767), np.float32(890.8323), np.float32(1126.2306), np.float32(856.14417), np.float32(1444.8702)]
2025-09-14 16:37:28,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:37:28,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 23 minutes, 38 seconds)
2025-09-14 16:39:58,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 16:40:05,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1322.14539 ± 323.433
2025-09-14 16:40:05,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1887.8251), np.float32(1221.6283), np.float32(1516.5426), np.float32(1035.7491), np.float32(1305.3479), np.float32(1850.0013), np.float32(883.639), np.float32(978.62775), np.float32(1300.6409), np.float32(1241.4518)]
2025-09-14 16:40:05,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:40:05,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 21 minutes, 58 seconds)
2025-09-14 16:42:21,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 16:42:28,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1505.54272 ± 702.043
2025-09-14 16:42:28,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1946.6473), np.float32(1871.1696), np.float32(858.77234), np.float32(2223.5461), np.float32(1010.77136), np.float32(2278.2866), np.float32(2466.3557), np.float32(1195.108), np.float32(897.08905), np.float32(307.68048)]
2025-09-14 16:42:28,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:42:28,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 19 minutes, 45 seconds)
2025-09-14 16:44:50,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 16:44:57,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1570.42944 ± 476.395
2025-09-14 16:44:57,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1885.5333), np.float32(1452.5068), np.float32(1279.8296), np.float32(1332.0216), np.float32(820.9552), np.float32(1552.247), np.float32(1175.8906), np.float32(2553.583), np.float32(1509.4375), np.float32(2142.2903)]
2025-09-14 16:44:57,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:44:57,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 17 minutes, 37 seconds)
2025-09-14 16:47:19,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 16:47:26,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1247.43958 ± 444.076
2025-09-14 16:47:26,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(863.0542), np.float32(1664.4475), np.float32(1722.9874), np.float32(1192.355), np.float32(1444.5049), np.float32(1293.455), np.float32(954.2135), np.float32(1202.3925), np.float32(1852.1376), np.float32(284.84784)]
2025-09-14 16:47:26,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:47:26,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 15 minutes, 7 seconds)
2025-09-14 16:49:33,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 16:49:40,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1539.03064 ± 556.458
2025-09-14 16:49:40,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2125.2126), np.float32(2109.9568), np.float32(909.06854), np.float32(1179.4967), np.float32(2002.8164), np.float32(1123.1471), np.float32(2530.383), np.float32(1308.6053), np.float32(1042.9915), np.float32(1058.63)]
2025-09-14 16:49:40,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:49:40,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 12 minutes, 12 seconds)
2025-09-14 16:51:48,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 16:51:54,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1433.04883 ± 463.371
2025-09-14 16:51:54,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1461.8567), np.float32(1392.705), np.float32(1066.0267), np.float32(930.6248), np.float32(2175.308), np.float32(2107.7793), np.float32(1003.0767), np.float32(946.46954), np.float32(1981.9866), np.float32(1264.6549)]
2025-09-14 16:51:54,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:51:54,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 9 minutes, 27 seconds)
2025-09-14 16:54:02,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 16:54:08,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1599.86243 ± 569.789
2025-09-14 16:54:08,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1386.637), np.float32(2274.8591), np.float32(2109.8823), np.float32(1050.4852), np.float32(2213.051), np.float32(921.21075), np.float32(1882.2035), np.float32(977.78143), np.float32(927.8732), np.float32(2254.6406)]
2025-09-14 16:54:08,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:54:08,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 7 minutes)
2025-09-14 16:56:16,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 16:56:23,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1476.50757 ± 487.844
2025-09-14 16:56:23,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1664.4735), np.float32(985.2858), np.float32(2032.2197), np.float32(1115.196), np.float32(1545.757), np.float32(1438.3173), np.float32(1444.2905), np.float32(1270.4183), np.float32(750.19604), np.float32(2518.922)]
2025-09-14 16:56:23,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:56:23,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 34 seconds)
2025-09-14 16:58:29,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 16:58:36,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1320.65259 ± 359.882
2025-09-14 16:58:36,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1919.065), np.float32(1561.1252), np.float32(869.45764), np.float32(1382.3229), np.float32(913.7195), np.float32(1346.4205), np.float32(1700.4073), np.float32(1611.0221), np.float32(998.93567), np.float32(904.05)]
2025-09-14 16:58:36,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:58:36,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 13 seconds)
2025-09-14 17:00:43,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-14 17:00:50,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1907.26343 ± 716.606
2025-09-14 17:00:50,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2326.7234), np.float32(2380.5632), np.float32(827.00885), np.float32(2356.1233), np.float32(2589.0422), np.float32(2225.9482), np.float32(492.65335), np.float32(2447.1592), np.float32(1248.4431), np.float32(2178.9714)]
2025-09-14 17:00:50,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:00:50,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1907.26) for latency 15
2025-09-14 17:00:50,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1251 [DEBUG]: Training session finished
