2025-09-14 14:39:53,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1108 [DEBUG]: logdir: _logs/noise-eval-v2/halfcheetah/bpql-noise_0.200-delay_21
2025-09-14 14:39:53,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1109 [DEBUG]: trainer_prefix: noise-eval-v2/halfcheetah/bpql-noise_0.200-delay_21
2025-09-14 14:39:53,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'21': <latency_env.delayed_mdp.ConstantDelay object at 0x7fd3add43cb0>}
2025-09-14 14:39:53,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1111 [DEBUG]: using device: cpu
2025-09-14 14:39:53,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-14 14:39:54,051 baseline-bpql-noisepromille200-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=143, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-14 14:39:54,051 baseline-bpql-noisepromille200-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-14 14:39:55,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-14 14:39:55,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-14 15:17:25,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:17:33,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: -304.76657 ± 40.863
2025-09-14 15:17:33,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-237.83504), np.float32(-378.27463), np.float32(-298.23236), np.float32(-266.31296), np.float32(-350.392), np.float32(-332.36685), np.float32(-295.04883), np.float32(-327.89487), np.float32(-298.7885), np.float32(-262.51956)]
2025-09-14 15:17:33,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:17:33,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (-304.77) for latency 21
2025-09-14 15:17:33,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 62 hours, 6 minutes, 6 seconds)
2025-09-14 15:54:05,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:54:13,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: -295.11627 ± 45.884
2025-09-14 15:54:13,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-200.71962), np.float32(-249.8391), np.float32(-278.6599), np.float32(-327.1094), np.float32(-316.76016), np.float32(-292.15662), np.float32(-295.17648), np.float32(-275.10583), np.float32(-363.169), np.float32(-352.46646)]
2025-09-14 15:54:13,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:54:13,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (-295.12) for latency 21
2025-09-14 15:54:13,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 60 hours, 40 minutes, 59 seconds)
2025-09-14 16:39:47,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:39:55,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: -260.10138 ± 84.756
2025-09-14 16:39:55,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-264.62823), np.float32(-441.254), np.float32(-318.27814), np.float32(-313.63934), np.float32(-270.7535), np.float32(-231.8167), np.float32(-187.11879), np.float32(-100.82381), np.float32(-234.09338), np.float32(-238.60783)]
2025-09-14 16:39:55,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:39:55,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (-260.10) for latency 21
2025-09-14 16:39:55,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 64 hours, 40 minutes, 5 seconds)
2025-09-14 17:07:35,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:07:43,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: -225.74405 ± 57.940
2025-09-14 17:07:43,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-315.77444), np.float32(-226.2134), np.float32(-153.68231), np.float32(-130.67407), np.float32(-219.69801), np.float32(-257.8499), np.float32(-259.05182), np.float32(-157.17723), np.float32(-286.99155), np.float32(-250.32776)]
2025-09-14 17:07:43,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:07:43,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (-225.74) for latency 21
2025-09-14 17:07:43,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 59 hours, 7 minutes, 4 seconds)
2025-09-14 17:22:20,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:22:29,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: -118.58636 ± 90.695
2025-09-14 17:22:29,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-235.81819), np.float32(-145.82146), np.float32(-230.41716), np.float32(27.923426), np.float32(-171.27213), np.float32(-86.955284), np.float32(-55.059834), np.float32(-212.02425), np.float32(-84.768555), np.float32(8.349785)]
2025-09-14 17:22:29,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:22:29,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (-118.59) for latency 21
2025-09-14 17:22:29,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 51 hours, 28 minutes, 36 seconds)
2025-09-14 17:33:55,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:34:03,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: -100.17917 ± 76.166
2025-09-14 17:34:03,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-53.69208), np.float32(-151.33939), np.float32(-229.14618), np.float32(-87.24909), np.float32(9.271475), np.float32(9.088669), np.float32(-110.07626), np.float32(-202.45003), np.float32(-123.58021), np.float32(-62.6186)]
2025-09-14 17:34:03,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:34:03,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (-100.18) for latency 21
2025-09-14 17:34:03,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 42 hours, 46 minutes, 14 seconds)
2025-09-14 17:55:55,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:56:04,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: -68.51091 ± 90.447
2025-09-14 17:56:04,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-29.487495), np.float32(1.7715365), np.float32(-84.16123), np.float32(-301.07098), np.float32(-61.413628), np.float32(-1.2706339), np.float32(-113.74106), np.float32(21.541014), np.float32(-3.440821), np.float32(-113.83581)]
2025-09-14 17:56:04,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:56:04,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (-68.51) for latency 21
2025-09-14 17:56:04,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 37 hours, 46 minutes, 10 seconds)
2025-09-14 18:01:45,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:01:53,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 15.16862 ± 158.227
2025-09-14 18:01:53,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(174.05038), np.float32(-79.32858), np.float32(57.74871), np.float32(211.97012), np.float32(-36.25397), np.float32(-4.6260777), np.float32(-140.27267), np.float32(113.08477), np.float32(176.19199), np.float32(-320.87845)]
2025-09-14 18:01:53,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:01:53,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (15.17) for latency 21
2025-09-14 18:01:53,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 25 hours, 8 minutes, 8 seconds)
2025-09-14 18:06:08,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:06:17,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 427.72772 ± 256.336
2025-09-14 18:06:17,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(663.89264), np.float32(673.4071), np.float32(512.8629), np.float32(535.48175), np.float32(84.23542), np.float32(629.37683), np.float32(107.47489), np.float32(577.0383), np.float32(540.9335), np.float32(-47.426155)]
2025-09-14 18:06:17,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:06:17,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (427.73) for latency 21
2025-09-14 18:06:17,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 17 hours, 45 minutes, 50 seconds)
2025-09-14 18:21:47,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:21:55,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 500.80630 ± 236.548
2025-09-14 18:21:55,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(307.21777), np.float32(663.2786), np.float32(633.0668), np.float32(475.04654), np.float32(664.9229), np.float32(613.08014), np.float32(107.13343), np.float32(85.88071), np.float32(786.1996), np.float32(672.2365)]
2025-09-14 18:21:55,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:21:55,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (500.81) for latency 21
2025-09-14 18:21:55,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 17 hours, 49 minutes, 55 seconds)
2025-09-14 18:30:22,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:30:30,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 271.80045 ± 373.188
2025-09-14 18:30:30,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(356.28116), np.float32(393.12454), np.float32(247.99844), np.float32(478.88364), np.float32(399.00455), np.float32(540.5385), np.float32(-410.85226), np.float32(-476.36276), np.float32(546.2252), np.float32(643.1632)]
2025-09-14 18:30:30,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:30:30,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 16 hours, 44 minutes, 42 seconds)
2025-09-14 18:34:51,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:34:59,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 307.12863 ± 378.641
2025-09-14 18:34:59,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(254.50868), np.float32(-374.9916), np.float32(241.86514), np.float32(531.84125), np.float32(772.3332), np.float32(732.1827), np.float32(439.518), np.float32(432.60974), np.float32(418.48523), np.float32(-377.0662)]
2025-09-14 18:34:59,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:34:59,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 11 hours, 25 minutes, 7 seconds)
2025-09-14 18:39:18,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:39:26,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 559.52197 ± 132.583
2025-09-14 18:39:26,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(498.01508), np.float32(292.30258), np.float32(690.8054), np.float32(733.78955), np.float32(720.5644), np.float32(450.03256), np.float32(488.76166), np.float32(536.2527), np.float32(653.01825), np.float32(531.6776)]
2025-09-14 18:39:26,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:39:26,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (559.52) for latency 21
2025-09-14 18:39:26,331 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 10 hours, 53 minutes, 17 seconds)
2025-09-14 18:43:58,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:44:06,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 425.62030 ± 292.006
2025-09-14 18:44:06,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(428.60413), np.float32(794.3689), np.float32(457.34494), np.float32(505.07938), np.float32(435.31433), np.float32(813.1173), np.float32(450.73477), np.float32(337.78372), np.float32(-312.24948), np.float32(346.10498)]
2025-09-14 18:44:06,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:44:06,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 10 hours, 50 minutes, 39 seconds)
2025-09-14 18:48:25,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:48:33,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 339.86618 ± 421.919
2025-09-14 18:48:33,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(411.45776), np.float32(609.40594), np.float32(-375.87143), np.float32(863.9308), np.float32(-81.2965), np.float32(308.8224), np.float32(536.0083), np.float32(770.537), np.float32(657.7574), np.float32(-302.08963)]
2025-09-14 18:48:33,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:48:33,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 7 hours, 32 minutes, 48 seconds)
2025-09-14 18:52:56,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:53:04,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 521.51697 ± 169.259
2025-09-14 18:53:04,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(366.02475), np.float32(488.86154), np.float32(526.4732), np.float32(686.5618), np.float32(687.1939), np.float32(752.6429), np.float32(645.235), np.float32(512.3612), np.float32(185.22162), np.float32(364.5936)]
2025-09-14 18:53:04,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:53:04,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 6 hours, 19 minutes, 3 seconds)
2025-09-14 18:57:31,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:57:39,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 394.53238 ± 281.598
2025-09-14 18:57:39,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(417.0253), np.float32(394.05557), np.float32(729.6102), np.float32(528.5482), np.float32(524.0757), np.float32(366.34113), np.float32(329.531), np.float32(454.68723), np.float32(582.47296), np.float32(-381.0234)]
2025-09-14 18:57:39,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:57:39,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 6 hours, 16 minutes, 13 seconds)
2025-09-14 19:02:09,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 19:02:17,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 543.74902 ± 299.056
2025-09-14 19:02:17,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(796.23364), np.float32(729.369), np.float32(765.34186), np.float32(625.7429), np.float32(352.4534), np.float32(631.42615), np.float32(210.05719), np.float32(-152.43935), np.float32(847.46515), np.float32(631.8402)]
2025-09-14 19:02:17,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:02:17,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 6 hours, 14 minutes, 45 seconds)
2025-09-14 19:06:37,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 19:06:45,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 555.95306 ± 166.266
2025-09-14 19:06:45,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(662.8621), np.float32(696.06226), np.float32(376.73505), np.float32(611.4739), np.float32(547.5283), np.float32(387.57147), np.float32(671.54205), np.float32(666.9577), np.float32(735.0028), np.float32(203.79475)]
2025-09-14 19:06:45,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:06:45,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 6 hours, 6 minutes, 48 seconds)
2025-09-14 19:10:56,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 19:11:04,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 516.31085 ± 231.595
2025-09-14 19:11:04,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(712.6183), np.float32(526.8889), np.float32(597.29395), np.float32(371.61963), np.float32(435.4264), np.float32(590.83765), np.float32(590.8108), np.float32(733.3027), np.float32(698.23706), np.float32(-93.9266)]
2025-09-14 19:11:04,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:11:04,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 6 hours, 16 seconds)
2025-09-14 19:15:10,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 19:15:18,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 559.10046 ± 218.278
2025-09-14 19:15:18,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(257.87933), np.float32(729.56995), np.float32(755.55286), np.float32(788.185), np.float32(198.93488), np.float32(539.1609), np.float32(294.74332), np.float32(688.95087), np.float32(558.1972), np.float32(779.8303)]
2025-09-14 19:15:18,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:15:18,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 5 hours, 51 minutes, 23 seconds)
2025-09-14 19:19:37,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 19:19:45,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 571.38489 ± 181.107
2025-09-14 19:19:45,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(603.55585), np.float32(748.64923), np.float32(691.58905), np.float32(503.09232), np.float32(191.97177), np.float32(739.16144), np.float32(805.4085), np.float32(361.57034), np.float32(571.84607), np.float32(497.00397)]
2025-09-14 19:19:45,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:19:45,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (571.38) for latency 21
2025-09-14 19:19:45,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 5 hours, 44 minutes, 44 seconds)
2025-09-14 19:23:57,967 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 19:24:06,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 533.68097 ± 241.150
2025-09-14 19:24:06,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(468.57706), np.float32(725.5393), np.float32(554.994), np.float32(465.27884), np.float32(654.951), np.float32(739.81415), np.float32(-111.51735), np.float32(558.1008), np.float32(779.17505), np.float32(501.89642)]
2025-09-14 19:24:06,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:24:06,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 5 hours, 35 minutes, 54 seconds)
2025-09-14 19:28:36,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 19:28:44,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 508.32471 ± 230.914
2025-09-14 19:28:44,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(207.40848), np.float32(704.6166), np.float32(257.14914), np.float32(425.49637), np.float32(260.56512), np.float32(727.2106), np.float32(270.91614), np.float32(784.0705), np.float32(708.80743), np.float32(737.0066)]
2025-09-14 19:28:44,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:28:44,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 5 hours, 34 minutes, 10 seconds)
2025-09-14 19:33:29,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 19:33:37,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 640.36438 ± 203.620
2025-09-14 19:33:37,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(617.57324), np.float32(203.40953), np.float32(332.8823), np.float32(837.7166), np.float32(730.5787), np.float32(722.23364), np.float32(641.13007), np.float32(657.17303), np.float32(796.83716), np.float32(864.10925)]
2025-09-14 19:33:37,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:33:37,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (640.36) for latency 21
2025-09-14 19:33:37,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 5 hours, 38 minutes, 14 seconds)
2025-09-14 19:38:02,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 19:38:10,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 687.25110 ± 106.661
2025-09-14 19:38:10,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(642.68396), np.float32(443.69446), np.float32(748.5911), np.float32(759.8947), np.float32(671.45636), np.float32(803.3102), np.float32(632.2726), np.float32(807.0845), np.float32(603.4765), np.float32(760.047)]
2025-09-14 19:38:10,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:38:10,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (687.25) for latency 21
2025-09-14 19:38:10,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 5 hours, 38 minutes, 21 seconds)
2025-09-14 19:42:28,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 19:42:37,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 519.21863 ± 289.134
2025-09-14 19:42:37,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(707.8486), np.float32(688.0351), np.float32(457.49338), np.float32(446.91473), np.float32(858.1873), np.float32(356.6999), np.float32(335.79675), np.float32(-165.501), np.float32(684.69867), np.float32(822.0128)]
2025-09-14 19:42:37,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:42:37,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 5 hours, 33 minutes, 43 seconds)
2025-09-14 19:46:48,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 19:46:56,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 713.36853 ± 112.693
2025-09-14 19:46:56,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(804.5971), np.float32(538.1026), np.float32(864.08954), np.float32(766.07935), np.float32(606.38403), np.float32(625.1235), np.float32(699.53827), np.float32(888.7877), np.float32(736.0228), np.float32(604.9607)]
2025-09-14 19:46:56,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:46:56,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (713.37) for latency 21
2025-09-14 19:46:56,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 5 hours, 28 minutes, 58 seconds)
2025-09-14 19:51:23,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 19:51:31,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 661.41241 ± 149.649
2025-09-14 19:51:31,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(435.51163), np.float32(516.4283), np.float32(844.12164), np.float32(479.306), np.float32(788.40283), np.float32(751.4828), np.float32(795.76605), np.float32(706.89355), np.float32(790.652), np.float32(505.55914)]
2025-09-14 19:51:31,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:51:31,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 5 hours, 23 minutes, 39 seconds)
2025-09-14 19:56:01,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 19:56:09,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 627.32666 ± 225.835
2025-09-14 19:56:09,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(471.40533), np.float32(794.12134), np.float32(582.0853), np.float32(452.6587), np.float32(78.04894), np.float32(762.9157), np.float32(707.5336), np.float32(792.3173), np.float32(828.7206), np.float32(803.46)]
2025-09-14 19:56:09,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 19:56:09,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 5 hours, 15 minutes, 27 seconds)
2025-09-14 20:00:44,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 20:00:52,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 705.52112 ± 132.157
2025-09-14 20:00:52,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(666.5821), np.float32(671.10834), np.float32(721.8799), np.float32(609.0665), np.float32(906.8428), np.float32(747.373), np.float32(416.91287), np.float32(810.9074), np.float32(855.53766), np.float32(649.00055)]
2025-09-14 20:00:52,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:00:52,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 5 hours, 13 minutes, 14 seconds)
2025-09-14 20:05:23,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 20:05:31,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 809.62354 ± 98.733
2025-09-14 20:05:31,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(790.276), np.float32(730.802), np.float32(873.907), np.float32(894.4324), np.float32(705.237), np.float32(909.575), np.float32(629.69257), np.float32(747.5975), np.float32(866.38666), np.float32(948.3293)]
2025-09-14 20:05:31,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:05:31,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (809.62) for latency 21
2025-09-14 20:05:31,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 5 hours, 11 minutes, 30 seconds)
2025-09-14 20:09:41,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 20:09:50,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 714.97424 ± 238.957
2025-09-14 20:09:50,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(596.36285), np.float32(866.5672), np.float32(686.54), np.float32(103.97727), np.float32(902.2948), np.float32(1056.3069), np.float32(700.41064), np.float32(688.86646), np.float32(786.8876), np.float32(761.52856)]
2025-09-14 20:09:50,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:09:50,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 5 hours, 6 minutes, 41 seconds)
2025-09-14 20:14:17,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 20:14:26,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 590.54688 ± 319.940
2025-09-14 20:14:26,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(610.0142), np.float32(910.43854), np.float32(-318.02728), np.float32(730.21735), np.float32(793.62933), np.float32(708.9411), np.float32(584.2661), np.float32(524.6742), np.float32(677.9499), np.float32(683.36566)]
2025-09-14 20:14:26,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:14:26,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 5 hours, 2 minutes, 18 seconds)
2025-09-14 20:18:51,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 20:18:59,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 668.14514 ± 250.786
2025-09-14 20:18:59,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(765.91455), np.float32(8.813047), np.float32(678.7953), np.float32(803.36475), np.float32(562.02747), np.float32(794.2003), np.float32(984.81116), np.float32(760.83185), np.float32(784.1201), np.float32(538.57275)]
2025-09-14 20:18:59,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:18:59,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 4 hours, 56 minutes, 47 seconds)
2025-09-14 20:23:52,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 20:24:00,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 692.89233 ± 175.009
2025-09-14 20:24:00,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(435.70612), np.float32(766.169), np.float32(771.90063), np.float32(714.24756), np.float32(679.5583), np.float32(955.70557), np.float32(607.25604), np.float32(981.2925), np.float32(521.6206), np.float32(495.46695)]
2025-09-14 20:24:00,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:24:00,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 4 hours, 56 minutes, 7 seconds)
2025-09-14 20:28:43,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 20:28:51,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 640.82385 ± 93.127
2025-09-14 20:28:51,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(706.7069), np.float32(720.691), np.float32(620.15094), np.float32(564.98553), np.float32(586.1833), np.float32(535.324), np.float32(590.26666), np.float32(524.34125), np.float32(757.8607), np.float32(801.728)]
2025-09-14 20:28:51,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:28:51,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 4 hours, 54 minutes, 2 seconds)
2025-09-14 20:33:31,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 20:33:39,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 644.48340 ± 164.501
2025-09-14 20:33:39,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(367.83298), np.float32(748.77435), np.float32(749.7193), np.float32(452.7876), np.float32(988.66455), np.float32(604.7039), np.float32(666.73804), np.float32(635.1451), np.float32(534.5072), np.float32(695.96106)]
2025-09-14 20:33:39,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:33:39,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 4 hours, 55 minutes, 21 seconds)
2025-09-14 20:38:21,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 20:38:29,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 768.15436 ± 141.244
2025-09-14 20:38:29,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(866.6091), np.float32(661.1429), np.float32(759.35974), np.float32(1072.1049), np.float32(775.50354), np.float32(695.8368), np.float32(866.84143), np.float32(777.0132), np.float32(694.70294), np.float32(512.42914)]
2025-09-14 20:38:29,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:38:29,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 4 hours, 53 minutes, 33 seconds)
2025-09-14 20:42:58,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 20:43:06,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 759.43506 ± 148.847
2025-09-14 20:43:06,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(941.62665), np.float32(877.1491), np.float32(890.1632), np.float32(895.46576), np.float32(786.0353), np.float32(624.20233), np.float32(678.8448), np.float32(842.71875), np.float32(515.8809), np.float32(542.2634)]
2025-09-14 20:43:06,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:43:06,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 4 hours, 49 minutes, 29 seconds)
2025-09-14 20:47:43,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 20:47:51,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 796.48193 ± 126.196
2025-09-14 20:47:51,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(787.01807), np.float32(705.0714), np.float32(794.2461), np.float32(891.4806), np.float32(924.3843), np.float32(894.44543), np.float32(844.4837), np.float32(940.4324), np.float32(658.2219), np.float32(525.0363)]
2025-09-14 20:47:51,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:47:51,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 4 hours, 41 minutes, 25 seconds)
2025-09-14 20:52:22,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 20:52:30,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 875.00818 ± 135.816
2025-09-14 20:52:30,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(848.02374), np.float32(1022.5164), np.float32(686.4376), np.float32(700.7954), np.float32(698.7785), np.float32(820.55865), np.float32(1020.40717), np.float32(1018.3467), np.float32(925.94196), np.float32(1008.27606)]
2025-09-14 20:52:30,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:52:30,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (875.01) for latency 21
2025-09-14 20:52:31,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 4 hours, 34 minutes, 24 seconds)
2025-09-14 20:56:57,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 20:57:05,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 795.95392 ± 57.776
2025-09-14 20:57:05,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(781.4971), np.float32(849.56635), np.float32(797.7101), np.float32(768.5928), np.float32(874.5375), np.float32(772.8598), np.float32(655.1359), np.float32(827.1584), np.float32(840.84454), np.float32(791.6369)]
2025-09-14 20:57:05,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 20:57:05,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 4 hours, 27 minutes, 7 seconds)
2025-09-14 21:01:28,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 21:01:36,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 738.81390 ± 110.569
2025-09-14 21:01:36,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(552.91626), np.float32(709.6161), np.float32(863.0106), np.float32(793.2291), np.float32(592.2074), np.float32(741.6655), np.float32(903.13226), np.float32(657.4271), np.float32(721.89764), np.float32(853.0375)]
2025-09-14 21:01:36,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:01:36,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 4 hours, 18 minutes, 52 seconds)
2025-09-14 21:05:58,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 21:06:06,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 915.55322 ± 126.241
2025-09-14 21:06:06,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(834.81573), np.float32(915.2451), np.float32(908.09155), np.float32(741.6796), np.float32(860.10785), np.float32(894.0421), np.float32(879.67737), np.float32(1036.2556), np.float32(854.68384), np.float32(1230.9333)]
2025-09-14 21:06:06,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:06:06,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (915.55) for latency 21
2025-09-14 21:06:06,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 4 hours, 12 minutes, 54 seconds)
2025-09-14 21:10:21,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 21:10:29,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 732.11798 ± 354.028
2025-09-14 21:10:29,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-283.33664), np.float32(860.6387), np.float32(823.7114), np.float32(759.8776), np.float32(770.17413), np.float32(1120.1754), np.float32(912.0815), np.float32(739.03503), np.float32(801.91364), np.float32(816.90894)]
2025-09-14 21:10:29,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:10:29,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 4 hours, 4 minutes, 29 seconds)
2025-09-14 21:14:40,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 21:14:48,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 848.62512 ± 320.998
2025-09-14 21:14:48,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(950.9944), np.float32(1003.48987), np.float32(855.9233), np.float32(27.135046), np.float32(863.9259), np.float32(1385.3888), np.float32(940.57556), np.float32(927.6883), np.float32(811.3031), np.float32(719.8264)]
2025-09-14 21:14:48,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:14:48,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 3 hours, 56 minutes, 19 seconds)
2025-09-14 21:19:30,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 21:19:38,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 912.92822 ± 218.159
2025-09-14 21:19:38,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(817.70056), np.float32(987.2259), np.float32(1099.7046), np.float32(1123.2899), np.float32(328.66187), np.float32(956.7803), np.float32(947.63306), np.float32(890.82184), np.float32(1102.1577), np.float32(875.30597)]
2025-09-14 21:19:38,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:19:38,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 3 hours, 54 minutes, 36 seconds)
2025-09-14 21:24:10,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 21:24:18,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 728.63068 ± 267.574
2025-09-14 21:24:18,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(713.0271), np.float32(1043.7424), np.float32(791.5449), np.float32(935.80646), np.float32(874.8641), np.float32(721.09125), np.float32(724.41364), np.float32(-4.689555), np.float32(813.5828), np.float32(672.9235)]
2025-09-14 21:24:18,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:24:18,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 3 hours, 51 minutes, 33 seconds)
2025-09-14 21:28:31,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 21:28:40,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 932.00537 ± 147.296
2025-09-14 21:28:40,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1199.0094), np.float32(858.49744), np.float32(794.8628), np.float32(1103.6277), np.float32(938.98645), np.float32(1037.5404), np.float32(891.32825), np.float32(903.5288), np.float32(940.4541), np.float32(652.21857)]
2025-09-14 21:28:40,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:28:40,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (932.01) for latency 21
2025-09-14 21:28:40,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 3 hours, 45 minutes, 37 seconds)
2025-09-14 21:32:57,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 21:33:06,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 959.31708 ± 89.324
2025-09-14 21:33:06,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(866.76715), np.float32(1048.7157), np.float32(996.3521), np.float32(909.43024), np.float32(1041.1027), np.float32(1026.7202), np.float32(1103.8328), np.float32(898.5782), np.float32(847.1424), np.float32(854.5296)]
2025-09-14 21:33:06,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:33:06,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (959.32) for latency 21
2025-09-14 21:33:06,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 3 hours, 41 minutes, 31 seconds)
2025-09-14 21:37:08,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 21:37:16,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 973.50940 ± 172.265
2025-09-14 21:37:16,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1190.404), np.float32(890.52814), np.float32(811.70776), np.float32(1375.0232), np.float32(833.70056), np.float32(843.0278), np.float32(962.18787), np.float32(1026.8367), np.float32(849.23846), np.float32(952.4395)]
2025-09-14 21:37:16,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:37:16,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (973.51) for latency 21
2025-09-14 21:37:16,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 3 hours, 35 minutes, 39 seconds)
2025-09-14 21:41:34,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 21:41:42,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 926.95947 ± 351.130
2025-09-14 21:41:42,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(950.2925), np.float32(1023.8142), np.float32(990.1324), np.float32(1079.5149), np.float32(1397.7231), np.float32(1234.6357), np.float32(775.26245), np.float32(957.2032), np.float32(4.7175984), np.float32(856.2987)]
2025-09-14 21:41:42,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:41:42,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 3 hours, 27 minutes, 23 seconds)
2025-09-14 21:46:25,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 21:46:33,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 958.14374 ± 87.367
2025-09-14 21:46:33,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1115.4044), np.float32(1010.9052), np.float32(887.32825), np.float32(998.34766), np.float32(773.90625), np.float32(896.43665), np.float32(938.43506), np.float32(968.39655), np.float32(1019.0819), np.float32(973.1967)]
2025-09-14 21:46:33,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:46:33,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 3 hours, 24 minutes, 39 seconds)
2025-09-14 21:51:05,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 21:51:13,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 914.23193 ± 104.854
2025-09-14 21:51:13,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(870.92706), np.float32(883.5158), np.float32(1125.6772), np.float32(891.7611), np.float32(1002.41125), np.float32(956.83606), np.float32(939.8076), np.float32(955.5539), np.float32(734.5801), np.float32(781.24866)]
2025-09-14 21:51:13,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:51:13,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 3 hours, 23 minutes, 3 seconds)
2025-09-14 21:56:07,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 21:56:15,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1014.90985 ± 181.841
2025-09-14 21:56:15,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1254.6472), np.float32(922.24146), np.float32(962.129), np.float32(1009.5523), np.float32(1014.25006), np.float32(770.76215), np.float32(795.7826), np.float32(1384.552), np.float32(924.02203), np.float32(1111.1589)]
2025-09-14 21:56:15,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 21:56:15,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1014.91) for latency 21
2025-09-14 21:56:15,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 3 hours, 23 minutes, 44 seconds)
2025-09-14 22:00:49,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 22:00:57,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 892.26483 ± 138.680
2025-09-14 22:00:57,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(965.80945), np.float32(670.31476), np.float32(810.1775), np.float32(753.59326), np.float32(890.7902), np.float32(838.119), np.float32(1178.5428), np.float32(961.2621), np.float32(1026.061), np.float32(827.97784)]
2025-09-14 22:00:57,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 22:00:57,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 3 hours, 23 minutes, 40 seconds)
2025-09-14 22:05:10,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 22:05:19,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 957.82880 ± 116.720
2025-09-14 22:05:19,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(857.97516), np.float32(997.2096), np.float32(937.41785), np.float32(1007.48865), np.float32(816.07745), np.float32(1117.1499), np.float32(891.55756), np.float32(1198.4597), np.float32(901.4127), np.float32(853.5391)]
2025-09-14 22:05:19,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 22:05:19,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 3 hours, 18 minutes, 19 seconds)
2025-09-14 22:09:21,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 22:09:29,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 950.20959 ± 130.950
2025-09-14 22:09:29,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(911.8828), np.float32(1035.8135), np.float32(1104.1042), np.float32(1140.2799), np.float32(868.0838), np.float32(862.37085), np.float32(982.6486), np.float32(745.4914), np.float32(776.9687), np.float32(1074.4526)]
2025-09-14 22:09:29,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 22:09:29,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 3 hours, 8 minutes)
2025-09-14 22:13:46,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 22:13:54,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1006.24396 ± 260.878
2025-09-14 22:13:54,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1180.5992), np.float32(779.5505), np.float32(1629.2574), np.float32(855.96625), np.float32(905.7414), np.float32(773.2097), np.float32(862.9064), np.float32(1000.8207), np.float32(1262.0928), np.float32(812.29504)]
2025-09-14 22:13:54,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 22:13:54,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 3 hours, 1 minute, 23 seconds)
2025-09-14 22:18:09,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 22:18:17,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 990.08429 ± 179.395
2025-09-14 22:18:17,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1214.5826), np.float32(856.2836), np.float32(1020.82965), np.float32(766.99506), np.float32(1004.69025), np.float32(984.7757), np.float32(885.31683), np.float32(1390.0669), np.float32(814.3953), np.float32(962.9074)]
2025-09-14 22:18:17,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 22:18:17,662 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 51 minutes, 54 seconds)
2025-09-14 22:22:38,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 22:22:46,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 876.87561 ± 114.525
2025-09-14 22:22:46,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(833.7912), np.float32(845.0977), np.float32(828.4051), np.float32(791.7528), np.float32(821.2126), np.float32(997.1028), np.float32(903.08954), np.float32(1149.9642), np.float32(716.0892), np.float32(882.2504)]
2025-09-14 22:22:46,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 22:22:46,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 45 minutes, 46 seconds)
2025-09-14 22:27:04,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 22:27:13,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 964.14093 ± 148.997
2025-09-14 22:27:13,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1350.1392), np.float32(1007.4672), np.float32(885.8589), np.float32(980.2561), np.float32(835.8544), np.float32(908.9182), np.float32(1083.8771), np.float32(852.76483), np.float32(862.7528), np.float32(873.5206)]
2025-09-14 22:27:13,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 22:27:13,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 42 minutes, 3 seconds)
2025-09-14 22:31:23,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 22:31:31,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 738.30420 ± 296.890
2025-09-14 22:31:31,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1078.0195), np.float32(763.24493), np.float32(-32.769367), np.float32(779.3273), np.float32(784.1286), np.float32(824.5968), np.float32(746.68506), np.float32(843.5937), np.float32(1063.9225), np.float32(532.29364)]
2025-09-14 22:31:31,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 22:31:31,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 38 minutes, 40 seconds)
2025-09-14 22:35:40,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 22:35:48,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1125.76270 ± 348.084
2025-09-14 22:35:48,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(761.3624), np.float32(906.0625), np.float32(1004.1407), np.float32(1905.8342), np.float32(964.8321), np.float32(1196.9766), np.float32(1000.50055), np.float32(916.2829), np.float32(1658.1909), np.float32(943.44385)]
2025-09-14 22:35:48,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 22:35:48,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1125.76) for latency 21
2025-09-14 22:35:48,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 33 minutes, 18 seconds)
2025-09-14 22:39:58,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 22:40:06,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1036.95874 ± 125.260
2025-09-14 22:40:06,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(919.1361), np.float32(989.3128), np.float32(1127.6926), np.float32(1020.412), np.float32(1175.1377), np.float32(902.4572), np.float32(1051.835), np.float32(879.4213), np.float32(1299.5809), np.float32(1004.60297)]
2025-09-14 22:40:06,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 22:40:06,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 28 minutes, 23 seconds)
2025-09-14 22:44:25,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 22:44:34,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 959.74286 ± 261.184
2025-09-14 22:44:34,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(737.4373), np.float32(769.3238), np.float32(1067.9993), np.float32(806.24994), np.float32(870.6671), np.float32(1386.5886), np.float32(656.87915), np.float32(946.25397), np.float32(1481.2349), np.float32(874.79407)]
2025-09-14 22:44:34,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 22:44:34,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 2 hours, 23 minutes, 51 seconds)
2025-09-14 22:48:54,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 22:49:02,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1031.20776 ± 217.468
2025-09-14 22:49:02,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1023.24365), np.float32(825.1904), np.float32(1030.8114), np.float32(1486.9083), np.float32(1059.874), np.float32(874.29553), np.float32(901.74994), np.float32(873.4908), np.float32(853.18994), np.float32(1383.3247)]
2025-09-14 22:49:02,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 22:49:02,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 2 hours, 19 minutes, 41 seconds)
2025-09-14 22:53:11,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 22:53:19,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1052.22278 ± 158.488
2025-09-14 22:53:19,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1304.9806), np.float32(1057.431), np.float32(1086.0287), np.float32(1101.1958), np.float32(792.7599), np.float32(1134.9839), np.float32(1254.4888), np.float32(883.4759), np.float32(1055.0732), np.float32(851.8099)]
2025-09-14 22:53:19,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 22:53:19,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 2 hours, 15 minutes, 9 seconds)
2025-09-14 22:57:37,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 22:57:46,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1080.24731 ± 292.294
2025-09-14 22:57:46,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(905.89594), np.float32(913.27936), np.float32(1067.9965), np.float32(1159.2373), np.float32(1922.0369), np.float32(877.3282), np.float32(937.7117), np.float32(969.0082), np.float32(1048.526), np.float32(1001.452)]
2025-09-14 22:57:46,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 22:57:46,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 2 hours, 11 minutes, 46 seconds)
2025-09-14 23:02:03,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 23:02:11,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1118.33862 ± 305.512
2025-09-14 23:02:11,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(917.3831), np.float32(1073.0614), np.float32(944.0484), np.float32(1029.5837), np.float32(1224.4851), np.float32(1222.0037), np.float32(968.1394), np.float32(918.3959), np.float32(915.3434), np.float32(1970.9417)]
2025-09-14 23:02:11,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 23:02:11,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 2 hours, 8 minutes, 3 seconds)
2025-09-14 23:06:41,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 23:06:50,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1073.87378 ± 263.461
2025-09-14 23:06:50,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(921.26514), np.float32(1146.3687), np.float32(843.382), np.float32(1213.9657), np.float32(1281.6466), np.float32(1477.8828), np.float32(1466.7972), np.float32(788.905), np.float32(797.6481), np.float32(800.876)]
2025-09-14 23:06:50,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 23:06:50,007 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 2 hours, 4 minutes, 41 seconds)
2025-09-14 23:11:08,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 23:11:16,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1184.31458 ± 260.450
2025-09-14 23:11:16,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1607.4398), np.float32(1290.0768), np.float32(1010.45746), np.float32(862.0099), np.float32(812.0847), np.float32(1086.6523), np.float32(1260.6582), np.float32(1622.2677), np.float32(1197.3121), np.float32(1094.1866)]
2025-09-14 23:11:16,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 23:11:16,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1184.31) for latency 21
2025-09-14 23:11:16,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 2 hours, 3 seconds)
2025-09-14 23:15:49,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 23:15:57,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1158.13647 ± 247.953
2025-09-14 23:15:57,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(941.1258), np.float32(1540.8296), np.float32(1070.2913), np.float32(1300.6501), np.float32(981.5952), np.float32(1066.8253), np.float32(1205.6857), np.float32(888.1082), np.float32(1641.0524), np.float32(945.20026)]
2025-09-14 23:15:57,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 23:15:57,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 57 minutes, 40 seconds)
2025-09-14 23:20:11,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 23:20:19,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1101.32874 ± 467.054
2025-09-14 23:20:19,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1879.19), np.float32(1167.6904), np.float32(880.6613), np.float32(942.11926), np.float32(1118.2773), np.float32(1441.3324), np.float32(1121.9814), np.float32(1169.7998), np.float32(1339.8196), np.float32(-47.5847)]
2025-09-14 23:20:19,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 23:20:19,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 52 minutes, 47 seconds)
2025-09-14 23:24:36,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 23:24:44,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1210.32544 ± 324.891
2025-09-14 23:24:44,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1056.774), np.float32(1414.9174), np.float32(1784.4862), np.float32(1360.2379), np.float32(1575.8519), np.float32(1136.6353), np.float32(789.0072), np.float32(1343.6399), np.float32(809.17236), np.float32(832.5318)]
2025-09-14 23:24:44,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 23:24:44,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1210.33) for latency 21
2025-09-14 23:24:44,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 48 minutes, 12 seconds)
2025-09-14 23:29:01,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 23:29:09,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1174.82458 ± 362.957
2025-09-14 23:29:09,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(918.23987), np.float32(1527.6855), np.float32(965.37445), np.float32(951.98157), np.float32(1291.7938), np.float32(1005.6824), np.float32(937.9587), np.float32(2093.85), np.float32(874.069), np.float32(1181.6107)]
2025-09-14 23:29:09,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 23:29:09,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 42 minutes, 43 seconds)
2025-09-14 23:33:15,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 23:33:23,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1042.05957 ± 233.483
2025-09-14 23:33:23,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(716.5242), np.float32(914.249), np.float32(896.2213), np.float32(961.91504), np.float32(967.6896), np.float32(892.30206), np.float32(1361.4667), np.float32(928.8134), np.float32(1458.637), np.float32(1322.7771)]
2025-09-14 23:33:23,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 23:33:23,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 37 minutes, 16 seconds)
2025-09-14 23:37:52,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 23:38:00,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1256.02039 ± 426.443
2025-09-14 23:38:00,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1886.7706), np.float32(854.65497), np.float32(945.96515), np.float32(1863.0963), np.float32(1015.236), np.float32(891.2786), np.float32(1060.6249), np.float32(1125.4558), np.float32(1941.5917), np.float32(975.53046)]
2025-09-14 23:38:00,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 23:38:00,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1256.02) for latency 21
2025-09-14 23:38:00,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 32 minutes, 37 seconds)
2025-09-14 23:42:30,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 23:42:38,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1119.98547 ± 355.042
2025-09-14 23:42:38,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1358.6293), np.float32(709.0046), np.float32(1072.1846), np.float32(920.7423), np.float32(2037.1908), np.float32(1286.8479), np.float32(1049.9962), np.float32(898.4682), np.float32(933.59265), np.float32(933.1981)]
2025-09-14 23:42:38,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 23:42:38,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 29 minutes, 14 seconds)
2025-09-14 23:46:56,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 23:47:04,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1312.06262 ± 266.757
2025-09-14 23:47:04,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1285.0801), np.float32(1161.0743), np.float32(1088.3131), np.float32(888.9179), np.float32(1641.4105), np.float32(1780.9917), np.float32(1577.034), np.float32(1289.2181), np.float32(1339.6239), np.float32(1068.9625)]
2025-09-14 23:47:04,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 23:47:04,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1312.06) for latency 21
2025-09-14 23:47:04,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 24 minutes, 53 seconds)
2025-09-14 23:51:42,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 23:51:50,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1244.46875 ± 423.671
2025-09-14 23:51:50,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2237.2097), np.float32(932.0364), np.float32(1282.5592), np.float32(1155.565), np.float32(1338.9492), np.float32(875.49506), np.float32(574.4457), np.float32(1157.056), np.float32(1346.0499), np.float32(1545.3212)]
2025-09-14 23:51:50,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 23:51:50,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 21 minutes, 39 seconds)
2025-09-14 23:56:25,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 23:56:33,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1100.35815 ± 525.682
2025-09-14 23:56:33,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(43.6762), np.float32(938.36896), np.float32(894.0379), np.float32(1235.8492), np.float32(1051.0845), np.float32(1787.6416), np.float32(2048.687), np.float32(872.1492), np.float32(1322.0974), np.float32(809.9895)]
2025-09-14 23:56:33,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 23:56:33,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 18 minutes, 47 seconds)
2025-09-15 00:01:17,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-15 00:01:25,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1345.45129 ± 491.531
2025-09-15 00:01:25,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(947.3773), np.float32(1569.3046), np.float32(2636.8147), np.float32(1422.1456), np.float32(1155.1006), np.float32(1484.8073), np.float32(959.2547), np.float32(964.47174), np.float32(1395.7821), np.float32(919.453)]
2025-09-15 00:01:25,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-15 00:01:25,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1345.45) for latency 21
2025-09-15 00:01:25,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 1 hour, 14 minutes, 56 seconds)
2025-09-15 00:06:15,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-15 00:06:23,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1776.34448 ± 528.878
2025-09-15 00:06:23,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1012.3962), np.float32(1124.4138), np.float32(2140.1055), np.float32(2080.6765), np.float32(2442.9373), np.float32(2243.65), np.float32(930.36914), np.float32(2185.3337), np.float32(1892.852), np.float32(1710.7109)]
2025-09-15 00:06:23,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-15 00:06:23,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1226 [INFO]: New best (1776.34) for latency 21
2025-09-15 00:06:23,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 1 hour, 11 minutes, 14 seconds)
2025-09-15 00:10:39,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-15 00:10:47,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1198.40747 ± 247.607
2025-09-15 00:10:47,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(940.6949), np.float32(1042.025), np.float32(1540.3704), np.float32(1336.9644), np.float32(1611.7008), np.float32(1101.3), np.float32(918.81305), np.float32(1426.115), np.float32(930.2621), np.float32(1135.8302)]
2025-09-15 00:10:47,228 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-15 00:10:47,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 1 hour, 6 minutes, 22 seconds)
2025-09-15 00:15:06,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-15 00:15:14,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1208.87915 ± 237.061
2025-09-15 00:15:14,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1035.8392), np.float32(1014.6569), np.float32(1099.6678), np.float32(1090.9525), np.float32(997.4551), np.float32(1658.853), np.float32(1672.0067), np.float32(1192.0117), np.float32(1128.7244), np.float32(1198.624)]
2025-09-15 00:15:14,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-15 00:15:14,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 1 hour, 48 seconds)
2025-09-15 00:19:22,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-15 00:19:30,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1340.00977 ± 352.520
2025-09-15 00:19:30,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1723.4814), np.float32(1332.9814), np.float32(1703.9227), np.float32(1194.6127), np.float32(770.95233), np.float32(1153.3961), np.float32(1391.896), np.float32(1969.5146), np.float32(1235.97), np.float32(923.3715)]
2025-09-15 00:19:30,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-15 00:19:30,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 55 minutes, 5 seconds)
2025-09-15 00:23:49,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-15 00:23:57,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 949.70148 ± 57.963
2025-09-15 00:23:57,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(898.47064), np.float32(941.5389), np.float32(919.311), np.float32(1012.4205), np.float32(910.74426), np.float32(996.2037), np.float32(872.1525), np.float32(1037.46), np.float32(1019.9675), np.float32(888.7453)]
2025-09-15 00:23:57,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-15 00:23:57,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 49 minutes, 33 seconds)
2025-09-15 00:28:20,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-15 00:28:28,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1084.12427 ± 200.399
2025-09-15 00:28:28,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(981.60626), np.float32(931.3618), np.float32(939.52435), np.float32(947.8176), np.float32(1451.5586), np.float32(1500.747), np.float32(1042.0519), np.float32(1034.2228), np.float32(965.7508), np.float32(1046.6017)]
2025-09-15 00:28:28,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-15 00:28:28,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 44 minutes, 11 seconds)
2025-09-15 00:32:57,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-15 00:33:05,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1483.69824 ± 443.227
2025-09-15 00:33:05,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1974.3267), np.float32(1239.6874), np.float32(1844.2678), np.float32(1375.0298), np.float32(1162.789), np.float32(1223.5094), np.float32(2107.7576), np.float32(896.79736), np.float32(951.0664), np.float32(2061.7515)]
2025-09-15 00:33:05,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-15 00:33:05,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 40 minutes, 9 seconds)
2025-09-15 00:37:25,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-15 00:37:33,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1305.24341 ± 797.331
2025-09-15 00:37:33,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(911.545), np.float32(-70.6678), np.float32(964.9832), np.float32(1721.129), np.float32(997.0113), np.float32(943.30396), np.float32(873.01135), np.float32(1439.9747), np.float32(2482.2642), np.float32(2789.8787)]
2025-09-15 00:37:33,175 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-15 00:37:33,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 35 minutes, 42 seconds)
2025-09-15 00:41:30,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-15 00:41:39,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1603.53296 ± 565.271
2025-09-15 00:41:39,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1444.1958), np.float32(2818.151), np.float32(1670.6903), np.float32(1310.4083), np.float32(1132.7285), np.float32(1330.6455), np.float32(1388.8453), np.float32(2455.8914), np.float32(1611.9447), np.float32(871.8285)]
2025-09-15 00:41:39,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-15 00:41:39,023 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 30 minutes, 59 seconds)
2025-09-15 00:46:06,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-15 00:46:14,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1105.55298 ± 339.483
2025-09-15 00:46:14,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(876.85156), np.float32(1690.3221), np.float32(960.0462), np.float32(979.4282), np.float32(948.0652), np.float32(870.38153), np.float32(781.26855), np.float32(1834.2543), np.float32(1068.6475), np.float32(1046.2645)]
2025-09-15 00:46:14,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-15 00:46:14,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 26 minutes, 44 seconds)
2025-09-15 00:50:52,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-15 00:51:00,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1424.90125 ± 388.077
2025-09-15 00:51:00,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1959.1696), np.float32(1564.4028), np.float32(853.1027), np.float32(2014.713), np.float32(1592.797), np.float32(960.7922), np.float32(1672.7894), np.float32(1396.6825), np.float32(1242.2107), np.float32(992.3529)]
2025-09-15 00:51:00,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-15 00:51:00,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 22 minutes, 31 seconds)
2025-09-15 00:55:43,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-15 00:55:51,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1371.76868 ± 518.540
2025-09-15 00:55:51,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1280.3811), np.float32(1880.8542), np.float32(794.05365), np.float32(1011.0518), np.float32(1039.5269), np.float32(2075.4695), np.float32(1004.5957), np.float32(1287.9497), np.float32(2389.488), np.float32(954.31506)]
2025-09-15 00:55:51,293 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-15 00:55:51,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 18 minutes, 12 seconds)
2025-09-15 01:00:07,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-15 01:00:15,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1127.60742 ± 203.691
2025-09-15 01:00:15,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(863.3425), np.float32(1120.6643), np.float32(987.76013), np.float32(849.3686), np.float32(951.15704), np.float32(1134.4608), np.float32(1455.3214), np.float32(1326.6465), np.float32(1207.9805), np.float32(1379.3726)]
2025-09-15 01:00:15,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-15 01:00:15,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 13 minutes, 37 seconds)
2025-09-15 01:04:55,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-15 01:05:03,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1047.49231 ± 150.055
2025-09-15 01:05:03,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(935.84924), np.float32(901.63257), np.float32(1026.6011), np.float32(980.666), np.float32(1144.6285), np.float32(983.6793), np.float32(1074.9185), np.float32(904.74567), np.float32(1086.9878), np.float32(1435.2141)]
2025-09-15 01:05:03,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-15 01:05:03,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 9 minutes, 21 seconds)
2025-09-15 01:09:33,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-15 01:09:41,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1537.42053 ± 521.595
2025-09-15 01:09:41,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1090.1322), np.float32(791.68384), np.float32(1917.893), np.float32(1215.1676), np.float32(1833.8456), np.float32(1632.3481), np.float32(2628.3364), np.float32(1038.5869), np.float32(1322.8047), np.float32(1903.4062)]
2025-09-15 01:09:41,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-15 01:09:41,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 4 minutes, 41 seconds)
2025-09-15 01:14:25,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-15 01:14:33,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1221 [DEBUG]: Total Reward: 1615.22437 ± 638.310
2025-09-15 01:14:33,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2541.1257), np.float32(1100.9014), np.float32(1974.6312), np.float32(1331.9921), np.float32(2142.5466), np.float32(811.9115), np.float32(1813.8639), np.float32(854.3937), np.float32(2545.4487), np.float32(1035.4291)]
2025-09-15 01:14:33,233 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-15 01:14:33,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille200-halfcheetah):1251 [DEBUG]: Training session finished
