2025-09-14 13:59:53,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1108 [DEBUG]: logdir: _logs/noise-eval-v2/halfcheetah/bpql-noise_0.075-delay_21
2025-09-14 13:59:53,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1109 [DEBUG]: trainer_prefix: noise-eval-v2/halfcheetah/bpql-noise_0.075-delay_21
2025-09-14 13:59:53,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'21': <latency_env.delayed_mdp.ConstantDelay object at 0x7f77b7797cb0>}
2025-09-14 13:59:53,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1111 [DEBUG]: using device: cpu
2025-09-14 13:59:53,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-14 13:59:53,776 baseline-bpql-noisepromille75-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=143, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-14 13:59:53,776 baseline-bpql-noisepromille75-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-14 13:59:54,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-14 13:59:54,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-14 14:02:34,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:02:43,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: -426.95883 ± 72.007
2025-09-14 14:02:43,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-418.3481), np.float32(-616.5432), np.float32(-431.4341), np.float32(-438.81317), np.float32(-325.83557), np.float32(-394.50705), np.float32(-447.15854), np.float32(-427.37427), np.float32(-370.1478), np.float32(-399.4266)]
2025-09-14 14:02:43,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:02:43,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (-426.96) for latency 21
2025-09-14 14:02:43,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 4 hours, 37 minutes, 28 seconds)
2025-09-14 14:05:24,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:05:32,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: -206.36221 ± 39.000
2025-09-14 14:05:32,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-170.64304), np.float32(-169.63902), np.float32(-192.0288), np.float32(-161.66939), np.float32(-173.5943), np.float32(-226.333), np.float32(-241.24399), np.float32(-203.75436), np.float32(-288.96466), np.float32(-235.75148)]
2025-09-14 14:05:32,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:05:32,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (-206.36) for latency 21
2025-09-14 14:05:32,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 4 hours, 35 minutes, 26 seconds)
2025-09-14 14:08:09,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:08:18,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: -170.61916 ± 103.903
2025-09-14 14:08:18,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-130.39343), np.float32(-142.14014), np.float32(-174.64462), np.float32(-306.20834), np.float32(-30.362383), np.float32(-194.46262), np.float32(-170.64551), np.float32(11.372159), np.float32(-221.5219), np.float32(-347.1848)]
2025-09-14 14:08:18,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:08:18,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (-170.62) for latency 21
2025-09-14 14:08:18,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 4 hours, 31 minutes, 6 seconds)
2025-09-14 14:10:57,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:11:05,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: -73.82997 ± 62.925
2025-09-14 14:11:05,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-130.61671), np.float32(-68.60419), np.float32(-134.55905), np.float32(-138.37965), np.float32(63.09465), np.float32(9.67414), np.float32(-102.5872), np.float32(-113.84093), np.float32(-64.58899), np.float32(-57.891808)]
2025-09-14 14:11:05,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:11:05,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (-73.83) for latency 21
2025-09-14 14:11:05,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 4 hours, 28 minutes, 18 seconds)
2025-09-14 14:13:44,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:13:53,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 59.95959 ± 120.266
2025-09-14 14:13:53,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(18.922066), np.float32(210.08115), np.float32(-242.1686), np.float32(46.187626), np.float32(-11.464521), np.float32(120.10387), np.float32(171.65916), np.float32(74.82181), np.float32(146.60611), np.float32(64.84732)]
2025-09-14 14:13:53,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:13:53,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (59.96) for latency 21
2025-09-14 14:13:53,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 4 hours, 25 minutes, 22 seconds)
2025-09-14 14:16:30,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:16:38,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 270.37387 ± 77.274
2025-09-14 14:16:38,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(158.963), np.float32(335.46152), np.float32(271.16766), np.float32(211.56245), np.float32(377.21243), np.float32(327.35413), np.float32(316.81628), np.float32(303.29218), np.float32(123.28818), np.float32(278.62103)]
2025-09-14 14:16:38,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:16:38,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (270.37) for latency 21
2025-09-14 14:16:38,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 21 minutes, 50 seconds)
2025-09-14 14:19:15,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:19:23,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 473.90045 ± 236.219
2025-09-14 14:19:23,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(771.6323), np.float32(239.47052), np.float32(188.4206), np.float32(507.08206), np.float32(373.0088), np.float32(57.82849), np.float32(774.28564), np.float32(628.0385), np.float32(561.5582), np.float32(637.6793)]
2025-09-14 14:19:23,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:19:23,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (473.90) for latency 21
2025-09-14 14:19:23,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 17 minutes, 48 seconds)
2025-09-14 14:22:03,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:22:11,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 563.06927 ± 222.229
2025-09-14 14:22:11,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(423.74948), np.float32(685.32526), np.float32(359.16815), np.float32(658.87665), np.float32(986.0611), np.float32(336.55798), np.float32(533.7054), np.float32(845.25397), np.float32(250.17715), np.float32(551.8172)]
2025-09-14 14:22:11,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:22:11,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (563.07) for latency 21
2025-09-14 14:22:11,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 15 minutes, 39 seconds)
2025-09-14 14:24:51,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:24:59,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 687.27820 ± 120.622
2025-09-14 14:24:59,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(712.49), np.float32(778.0103), np.float32(715.3662), np.float32(458.112), np.float32(558.1012), np.float32(836.71704), np.float32(581.93713), np.float32(828.25287), np.float32(621.56604), np.float32(782.2289)]
2025-09-14 14:24:59,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:24:59,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (687.28) for latency 21
2025-09-14 14:24:59,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 4 hours, 12 minutes, 56 seconds)
2025-09-14 14:27:39,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:27:47,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 861.21777 ± 91.616
2025-09-14 14:27:47,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(813.8191), np.float32(876.02136), np.float32(986.23883), np.float32(735.48114), np.float32(932.973), np.float32(810.78894), np.float32(970.3309), np.float32(959.0542), np.float32(733.29596), np.float32(794.1741)]
2025-09-14 14:27:47,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:27:47,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (861.22) for latency 21
2025-09-14 14:27:47,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 10 minutes, 22 seconds)
2025-09-14 14:30:12,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:30:20,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 902.69403 ± 102.991
2025-09-14 14:30:20,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(996.5683), np.float32(806.5196), np.float32(924.11743), np.float32(839.4274), np.float32(712.00165), np.float32(1105.0916), np.float32(934.31384), np.float32(912.24774), np.float32(948.7738), np.float32(847.87885)]
2025-09-14 14:30:20,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:30:20,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (902.69) for latency 21
2025-09-14 14:30:20,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 3 minutes, 51 seconds)
2025-09-14 14:32:43,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:32:51,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 835.71985 ± 132.566
2025-09-14 14:32:51,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(909.77893), np.float32(985.08075), np.float32(1003.01), np.float32(792.43604), np.float32(893.66565), np.float32(784.76575), np.float32(973.0724), np.float32(647.9586), np.float32(760.5278), np.float32(606.902)]
2025-09-14 14:32:51,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:32:51,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 3 hours, 56 minutes, 51 seconds)
2025-09-14 14:35:14,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:35:22,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 951.15302 ± 115.094
2025-09-14 14:35:22,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(790.2796), np.float32(1069.1105), np.float32(765.0735), np.float32(1093.3091), np.float32(989.623), np.float32(1047.3094), np.float32(964.2756), np.float32(801.036), np.float32(981.178), np.float32(1010.33606)]
2025-09-14 14:35:22,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:35:22,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (951.15) for latency 21
2025-09-14 14:35:22,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 3 hours, 49 minutes, 14 seconds)
2025-09-14 14:37:47,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:37:55,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 987.34894 ± 161.375
2025-09-14 14:37:55,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1126.2393), np.float32(1093.5634), np.float32(975.13904), np.float32(984.183), np.float32(884.7608), np.float32(1016.8944), np.float32(993.61536), np.float32(561.2541), np.float32(1086.7285), np.float32(1151.1115)]
2025-09-14 14:37:55,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:37:55,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (987.35) for latency 21
2025-09-14 14:37:55,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 3 hours, 42 minutes, 29 seconds)
2025-09-14 14:40:29,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:40:37,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 984.77374 ± 61.296
2025-09-14 14:40:37,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(856.70575), np.float32(971.50085), np.float32(964.33203), np.float32(987.3445), np.float32(978.1746), np.float32(1122.7765), np.float32(1005.3511), np.float32(963.5019), np.float32(989.0325), np.float32(1009.0178)]
2025-09-14 14:40:37,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:40:37,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 3 hours, 38 minutes, 6 seconds)
2025-09-14 14:43:07,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:43:15,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1033.77869 ± 66.300
2025-09-14 14:43:15,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(919.14374), np.float32(1023.4372), np.float32(972.55365), np.float32(1036.5752), np.float32(1117.6486), np.float32(1066.2487), np.float32(1136.3551), np.float32(1036.0737), np.float32(1076.6093), np.float32(953.14124)]
2025-09-14 14:43:15,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:43:15,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (1033.78) for latency 21
2025-09-14 14:43:15,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 3 hours, 36 minutes, 54 seconds)
2025-09-14 14:45:41,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:45:49,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1012.49481 ± 146.421
2025-09-14 14:45:49,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1133.6864), np.float32(1092.5433), np.float32(1200.9178), np.float32(818.6469), np.float32(1145.2957), np.float32(902.53845), np.float32(880.2082), np.float32(1073.293), np.float32(1109.8801), np.float32(767.9387)]
2025-09-14 14:45:49,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:45:49,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 3 hours, 35 minutes, 14 seconds)
2025-09-14 14:48:15,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:48:23,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 975.13440 ± 154.534
2025-09-14 14:48:23,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(786.9425), np.float32(936.97406), np.float32(1071.8541), np.float32(1065.0331), np.float32(1036.9683), np.float32(828.3392), np.float32(1032.1467), np.float32(678.7185), np.float32(1127.773), np.float32(1186.594)]
2025-09-14 14:48:23,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:48:23,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 3 hours, 33 minutes, 36 seconds)
2025-09-14 14:50:52,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:51:00,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1132.32263 ± 154.232
2025-09-14 14:51:00,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1039.0282), np.float32(1122.2031), np.float32(1251.4113), np.float32(909.48193), np.float32(1513.4343), np.float32(1046.1938), np.float32(1082.2338), np.float32(1060.8849), np.float32(1102.358), np.float32(1195.9973)]
2025-09-14 14:51:00,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:51:00,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (1132.32) for latency 21
2025-09-14 14:51:00,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 3 hours, 31 minutes, 59 seconds)
2025-09-14 14:53:41,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:53:49,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1166.37183 ± 136.681
2025-09-14 14:53:49,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(981.43085), np.float32(1233.018), np.float32(1038.206), np.float32(1041.5948), np.float32(1244.9332), np.float32(1422.4109), np.float32(1205.6414), np.float32(1328.2406), np.float32(1044.2749), np.float32(1123.9685)]
2025-09-14 14:53:49,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:53:49,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (1166.37) for latency 21
2025-09-14 14:53:49,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 3 hours, 31 minutes, 18 seconds)
2025-09-14 14:56:24,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:56:32,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1031.84668 ± 107.628
2025-09-14 14:56:32,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(774.4029), np.float32(1100.9547), np.float32(1110.8114), np.float32(1064.158), np.float32(1095.4457), np.float32(1098.0066), np.float32(928.67957), np.float32(1101.5863), np.float32(940.3887), np.float32(1104.0332)]
2025-09-14 14:56:32,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:56:32,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 3 hours, 29 minutes, 47 seconds)
2025-09-14 14:58:57,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 14:59:05,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1232.55859 ± 144.912
2025-09-14 14:59:05,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1231.2577), np.float32(1119.5463), np.float32(1142.2512), np.float32(1172.8672), np.float32(1274.347), np.float32(1549.2758), np.float32(1135.5435), np.float32(1208.7611), np.float32(1054.6683), np.float32(1437.0686)]
2025-09-14 14:59:05,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:59:05,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (1232.56) for latency 21
2025-09-14 14:59:05,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 3 hours, 26 minutes, 58 seconds)
2025-09-14 15:01:28,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:01:36,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1089.02454 ± 204.629
2025-09-14 15:01:36,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1139.239), np.float32(1090.6833), np.float32(1137.9729), np.float32(1136.0702), np.float32(1231.4745), np.float32(1221.2341), np.float32(1131.6106), np.float32(1146.7333), np.float32(487.0198), np.float32(1168.208)]
2025-09-14 15:01:36,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:01:36,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 3 hours, 23 minutes, 33 seconds)
2025-09-14 15:03:53,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:04:01,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1096.80078 ± 320.067
2025-09-14 15:04:01,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1129.068), np.float32(1266.8362), np.float32(963.8041), np.float32(1250.2188), np.float32(1194.067), np.float32(1099.0896), np.float32(1656.5066), np.float32(322.13068), np.float32(932.1463), np.float32(1154.1409)]
2025-09-14 15:04:01,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:04:01,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 3 hours, 17 minutes, 47 seconds)
2025-09-14 15:06:18,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:06:26,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1148.73035 ± 117.326
2025-09-14 15:06:26,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(939.11096), np.float32(1233.4379), np.float32(1219.7393), np.float32(1324.1715), np.float32(1217.0625), np.float32(983.2618), np.float32(1114.9653), np.float32(1072.4386), np.float32(1128.1649), np.float32(1254.9515)]
2025-09-14 15:06:26,330 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:06:26,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 3 hours, 9 minutes, 8 seconds)
2025-09-14 15:08:49,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:08:58,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1212.39062 ± 170.847
2025-09-14 15:08:58,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1134.3114), np.float32(1167.2468), np.float32(1467.2858), np.float32(1471.2684), np.float32(1100.8529), np.float32(1314.8986), np.float32(921.6296), np.float32(1089.8865), np.float32(1108.3427), np.float32(1348.1833)]
2025-09-14 15:08:58,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:08:58,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 3 minutes, 58 seconds)
2025-09-14 15:11:22,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:11:31,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1137.50952 ± 100.269
2025-09-14 15:11:31,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1030.0769), np.float32(1081.1652), np.float32(1171.4767), np.float32(1117.7618), np.float32(1206.75), np.float32(982.9835), np.float32(1322.9152), np.float32(1255.8816), np.float32(1053.8334), np.float32(1152.2512)]
2025-09-14 15:11:31,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:11:31,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 1 minute, 25 seconds)
2025-09-14 15:13:55,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:14:03,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1242.19568 ± 497.128
2025-09-14 15:14:03,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(180.30884), np.float32(1376.598), np.float32(1238.7667), np.float32(1808.6675), np.float32(1115.4496), np.float32(1778.9485), np.float32(1318.6055), np.float32(1208.6531), np.float32(610.71814), np.float32(1785.2418)]
2025-09-14 15:14:03,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:14:03,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (1242.20) for latency 21
2025-09-14 15:14:03,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 59 minutes, 12 seconds)
2025-09-14 15:16:13,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:16:21,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1175.47290 ± 131.470
2025-09-14 15:16:21,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1102.213), np.float32(1037.5945), np.float32(1252.5707), np.float32(1007.5728), np.float32(1111.1162), np.float32(1300.4506), np.float32(1316.0509), np.float32(1229.8906), np.float32(1010.521), np.float32(1386.7478)]
2025-09-14 15:16:21,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:16:21,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 55 minutes, 10 seconds)
2025-09-14 15:18:32,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:18:40,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1231.71387 ± 204.680
2025-09-14 15:18:40,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1224.4902), np.float32(1157.831), np.float32(1225.6578), np.float32(1020.8599), np.float32(1084.7495), np.float32(1793.4846), np.float32(1086.9597), np.float32(1236.7284), np.float32(1170.7141), np.float32(1315.6637)]
2025-09-14 15:18:40,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:18:40,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 51 minutes, 16 seconds)
2025-09-14 15:20:50,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:20:58,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1428.29138 ± 360.710
2025-09-14 15:20:58,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1241.6785), np.float32(2246.3267), np.float32(1178.2068), np.float32(1694.808), np.float32(1269.0809), np.float32(1871.431), np.float32(1323.3615), np.float32(1196.4728), np.float32(1126.7604), np.float32(1134.7885)]
2025-09-14 15:20:58,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:20:58,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (1428.29) for latency 21
2025-09-14 15:20:58,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 45 minutes, 44 seconds)
2025-09-14 15:23:09,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:23:17,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1328.08325 ± 169.739
2025-09-14 15:23:17,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1192.4203), np.float32(1078.562), np.float32(1586.768), np.float32(1153.6964), np.float32(1480.2762), np.float32(1324.7561), np.float32(1380.8811), np.float32(1430.1417), np.float32(1518.6802), np.float32(1134.6501)]
2025-09-14 15:23:17,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:23:17,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 40 minutes, 2 seconds)
2025-09-14 15:25:27,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:25:35,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1333.84314 ± 214.873
2025-09-14 15:25:35,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1209.5084), np.float32(1350.8298), np.float32(1143.1943), np.float32(1607.4159), np.float32(1715.8099), np.float32(1125.2834), np.float32(1297.848), np.float32(1225.5654), np.float32(1588.7261), np.float32(1074.25)]
2025-09-14 15:25:35,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:25:35,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 34 minutes, 36 seconds)
2025-09-14 15:27:48,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:27:56,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1430.62695 ± 264.563
2025-09-14 15:27:56,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1352.4397), np.float32(1725.8893), np.float32(1286.9973), np.float32(1372.9193), np.float32(1083.7604), np.float32(1521.3641), np.float32(1393.1355), np.float32(1097.6974), np.float32(2014.7244), np.float32(1457.3422)]
2025-09-14 15:27:56,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:27:56,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (1430.63) for latency 21
2025-09-14 15:27:56,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 32 minutes, 53 seconds)
2025-09-14 15:30:18,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:30:26,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1339.28357 ± 221.092
2025-09-14 15:30:26,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1362.7079), np.float32(1651.4536), np.float32(1190.3682), np.float32(1158.683), np.float32(1126.9965), np.float32(1174.6982), np.float32(1834.8535), np.float32(1204.2617), np.float32(1344.6754), np.float32(1344.1372)]
2025-09-14 15:30:26,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:30:26,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 33 minutes, 5 seconds)
2025-09-14 15:32:53,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:33:01,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1350.95935 ± 326.843
2025-09-14 15:33:01,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1291.6652), np.float32(649.3956), np.float32(1389.231), np.float32(1774.094), np.float32(1920.4315), np.float32(1236.0781), np.float32(1156.893), np.float32(1355.3329), np.float32(1458.4509), np.float32(1278.0219)]
2025-09-14 15:33:01,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:33:01,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 34 minutes, 9 seconds)
2025-09-14 15:35:28,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:35:37,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1366.43823 ± 244.820
2025-09-14 15:35:37,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1180.2268), np.float32(1206.1697), np.float32(1636.9335), np.float32(1195.1069), np.float32(1345.3081), np.float32(1444.1968), np.float32(1911.3234), np.float32(1247.9644), np.float32(1034.948), np.float32(1462.2041)]
2025-09-14 15:35:37,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:35:37,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 35 minutes, 22 seconds)
2025-09-14 15:37:54,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:38:02,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1329.29651 ± 238.325
2025-09-14 15:38:02,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1173.9606), np.float32(1207.2507), np.float32(1607.1892), np.float32(1545.483), np.float32(1519.9414), np.float32(1018.1125), np.float32(1596.5697), np.float32(1030.861), np.float32(1063.1227), np.float32(1530.4738)]
2025-09-14 15:38:02,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:38:02,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 34 minutes, 19 seconds)
2025-09-14 15:40:17,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:40:25,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1333.47009 ± 140.387
2025-09-14 15:40:25,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1201.2522), np.float32(1286.1501), np.float32(1563.496), np.float32(1274.8794), np.float32(1139.9176), np.float32(1289.0393), np.float32(1220.0374), np.float32(1496.2806), np.float32(1324.9099), np.float32(1538.7378)]
2025-09-14 15:40:25,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:40:25,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 32 minutes, 14 seconds)
2025-09-14 15:42:40,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:42:48,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1361.92139 ± 148.893
2025-09-14 15:42:48,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1257.4033), np.float32(1426.2904), np.float32(1197.9493), np.float32(1196.4633), np.float32(1373.2587), np.float32(1311.5851), np.float32(1566.8832), np.float32(1497.9202), np.float32(1604.1849), np.float32(1187.275)]
2025-09-14 15:42:48,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:42:48,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 28 minutes, 18 seconds)
2025-09-14 15:45:03,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:45:11,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1364.94409 ± 104.802
2025-09-14 15:45:11,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1251.679), np.float32(1315.7108), np.float32(1585.2922), np.float32(1200.148), np.float32(1480.903), np.float32(1371.982), np.float32(1361.9941), np.float32(1380.3162), np.float32(1399.3083), np.float32(1302.1077)]
2025-09-14 15:45:11,097 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:45:11,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 23 minutes, 31 seconds)
2025-09-14 15:47:22,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:47:30,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1411.29883 ± 308.210
2025-09-14 15:47:30,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1470.1846), np.float32(1144.1285), np.float32(1629.0575), np.float32(1243.4392), np.float32(1203.6512), np.float32(1565.0847), np.float32(1187.672), np.float32(2197.8682), np.float32(1200.3811), np.float32(1271.5212)]
2025-09-14 15:47:30,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:47:30,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 17 minutes, 56 seconds)
2025-09-14 15:49:41,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:49:49,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1616.60645 ± 490.268
2025-09-14 15:49:49,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1188.6696), np.float32(2330.487), np.float32(1409.5961), np.float32(1146.9678), np.float32(2484.7153), np.float32(1660.5166), np.float32(2145.2783), np.float32(1414.555), np.float32(1114.85), np.float32(1270.4286)]
2025-09-14 15:49:49,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:49:49,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (1616.61) for latency 21
2025-09-14 15:49:49,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 14 minutes, 23 seconds)
2025-09-14 15:52:00,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:52:09,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1427.99780 ± 313.113
2025-09-14 15:52:09,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2047.0541), np.float32(1196.7244), np.float32(1361.7626), np.float32(1073.0974), np.float32(1906.0189), np.float32(1466.0681), np.float32(1408.7559), np.float32(1528.0281), np.float32(1061.8875), np.float32(1230.5812)]
2025-09-14 15:52:09,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:52:09,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 11 minutes, 20 seconds)
2025-09-14 15:54:20,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:54:28,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1503.43127 ± 303.897
2025-09-14 15:54:28,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1248.9844), np.float32(1317.8583), np.float32(1488.4834), np.float32(1848.4606), np.float32(1310.5664), np.float32(2140.211), np.float32(1721.3563), np.float32(1571.4332), np.float32(1098.2362), np.float32(1288.7224)]
2025-09-14 15:54:28,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:54:28,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 8 minutes, 22 seconds)
2025-09-14 15:56:43,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:56:51,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1633.21460 ± 570.217
2025-09-14 15:56:51,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1884.2999), np.float32(1081.4921), np.float32(1718.3438), np.float32(1172.8126), np.float32(2362.9019), np.float32(2756.9807), np.float32(1152.1399), np.float32(838.8192), np.float32(1648.8057), np.float32(1715.5502)]
2025-09-14 15:56:51,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:56:51,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (1633.21) for latency 21
2025-09-14 15:56:51,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 6 minutes)
2025-09-14 15:59:06,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 15:59:14,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1699.38342 ± 420.094
2025-09-14 15:59:14,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1314.5701), np.float32(1556.5646), np.float32(1437.362), np.float32(1916.9657), np.float32(1419.8135), np.float32(2838.0354), np.float32(1461.3486), np.float32(1745.9376), np.float32(1506.3048), np.float32(1796.9314)]
2025-09-14 15:59:14,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 15:59:14,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (1699.38) for latency 21
2025-09-14 15:59:14,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 4 minutes, 26 seconds)
2025-09-14 16:01:29,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:01:37,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1676.20923 ± 417.401
2025-09-14 16:01:37,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1434.2203), np.float32(2116.089), np.float32(1848.7522), np.float32(1429.7196), np.float32(1577.3975), np.float32(1176.4421), np.float32(2687.101), np.float32(1548.1113), np.float32(1568.3435), np.float32(1375.9143)]
2025-09-14 16:01:37,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:01:37,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 2 minutes, 35 seconds)
2025-09-14 16:04:00,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:04:08,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1599.56592 ± 590.385
2025-09-14 16:04:08,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1079.383), np.float32(1345.401), np.float32(2648.799), np.float32(1347.376), np.float32(1376.6443), np.float32(1553.5166), np.float32(1309.2908), np.float32(1122.1207), np.float32(1362.3389), np.float32(2850.7893)]
2025-09-14 16:04:08,872 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:04:08,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 2 minutes, 21 seconds)
2025-09-14 16:06:26,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:06:34,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1437.87183 ± 341.667
2025-09-14 16:06:34,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1364.75), np.float32(1358.9039), np.float32(2069.138), np.float32(1203.0099), np.float32(1528.1454), np.float32(1206.8567), np.float32(1194.8396), np.float32(1099.2772), np.float32(2096.7979), np.float32(1257.0006)]
2025-09-14 16:06:34,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:06:34,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 59 seconds)
2025-09-14 16:08:59,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:09:07,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1808.45410 ± 566.784
2025-09-14 16:09:07,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1243.1576), np.float32(1189.328), np.float32(3020.4622), np.float32(1517.1976), np.float32(2407.407), np.float32(1814.1508), np.float32(1314.4734), np.float32(2237.5225), np.float32(1926.1207), np.float32(1414.721)]
2025-09-14 16:09:07,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:09:07,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (1808.45) for latency 21
2025-09-14 16:09:07,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 12 seconds)
2025-09-14 16:11:31,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:11:39,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1455.27966 ± 179.843
2025-09-14 16:11:39,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1284.8629), np.float32(1598.7307), np.float32(1431.789), np.float32(1459.9381), np.float32(1542.6069), np.float32(1294.1876), np.float32(1669.182), np.float32(1189.1544), np.float32(1308.4357), np.float32(1773.9097)]
2025-09-14 16:11:39,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:11:39,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 59 minutes, 11 seconds)
2025-09-14 16:14:04,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:14:12,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1954.18652 ± 728.981
2025-09-14 16:14:12,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2316.825), np.float32(2115.234), np.float32(1392.82), np.float32(1208.1692), np.float32(2449.6753), np.float32(1346.5641), np.float32(3161.6033), np.float32(1275.7667), np.float32(1219.7496), np.float32(3055.4592)]
2025-09-14 16:14:12,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:14:12,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (1954.19) for latency 21
2025-09-14 16:14:12,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 58 minutes, 20 seconds)
2025-09-14 16:16:36,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:16:44,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1945.03418 ± 805.519
2025-09-14 16:16:44,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2372.8447), np.float32(1169.022), np.float32(1722.5043), np.float32(3135.3315), np.float32(2376.123), np.float32(3466.7825), np.float32(1372.8099), np.float32(1495.7355), np.float32(1127.2538), np.float32(1211.9332)]
2025-09-14 16:16:44,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:16:44,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 55 minutes, 55 seconds)
2025-09-14 16:19:07,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:19:15,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2157.25610 ± 872.576
2025-09-14 16:19:15,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1273.5325), np.float32(2145.4126), np.float32(1301.9475), np.float32(1112.852), np.float32(1292.745), np.float32(3262.8518), np.float32(1985.0048), np.float32(3331.6987), np.float32(2470.9895), np.float32(3395.5256)]
2025-09-14 16:19:15,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:19:15,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (2157.26) for latency 21
2025-09-14 16:19:15,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 54 minutes, 6 seconds)
2025-09-14 16:21:36,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:21:44,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1890.65234 ± 518.428
2025-09-14 16:21:44,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1664.1603), np.float32(1222.4169), np.float32(2376.404), np.float32(2697.9375), np.float32(1468.9027), np.float32(1365.7985), np.float32(2512.6997), np.float32(1925.0327), np.float32(2313.4797), np.float32(1359.6904)]
2025-09-14 16:21:44,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:21:44,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 51 minutes, 3 seconds)
2025-09-14 16:24:05,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:24:13,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2109.27490 ± 622.078
2025-09-14 16:24:13,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1736.3816), np.float32(1263.8689), np.float32(2043.0981), np.float32(2001.0831), np.float32(3372.2048), np.float32(1431.5353), np.float32(2535.5437), np.float32(1640.5973), np.float32(2836.8105), np.float32(2231.6248)]
2025-09-14 16:24:13,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:24:13,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 48 minutes, 2 seconds)
2025-09-14 16:26:39,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:26:47,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2190.10425 ± 490.734
2025-09-14 16:26:47,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3041.6035), np.float32(2411.9084), np.float32(2492.263), np.float32(2893.8242), np.float32(1991.701), np.float32(1428.6019), np.float32(1921.0326), np.float32(1860.6696), np.float32(2165.999), np.float32(1693.4418)]
2025-09-14 16:26:47,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:26:47,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (2190.10) for latency 21
2025-09-14 16:26:47,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 45 minutes, 41 seconds)
2025-09-14 16:29:05,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:29:13,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2517.22119 ± 769.947
2025-09-14 16:29:13,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3325.6118), np.float32(3380.6738), np.float32(3367.0388), np.float32(1417.7323), np.float32(1861.9081), np.float32(2368.8647), np.float32(2136.377), np.float32(1290.4784), np.float32(2950.325), np.float32(3073.2014)]
2025-09-14 16:29:13,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:29:13,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (2517.22) for latency 21
2025-09-14 16:29:13,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 42 minutes, 20 seconds)
2025-09-14 16:31:32,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:31:40,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2451.78931 ± 685.247
2025-09-14 16:31:40,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1940.2072), np.float32(2314.2427), np.float32(2332.8508), np.float32(2759.4932), np.float32(3512.4792), np.float32(2523.169), np.float32(2078.0508), np.float32(3648.7227), np.float32(1211.2274), np.float32(2197.451)]
2025-09-14 16:31:40,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:31:40,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 39 minutes, 17 seconds)
2025-09-14 16:33:58,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:34:06,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2111.48169 ± 795.474
2025-09-14 16:34:06,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1563.4209), np.float32(3500.4849), np.float32(1359.5878), np.float32(1460.405), np.float32(2899.6226), np.float32(1682.5872), np.float32(1656.195), np.float32(1882.3495), np.float32(1646.6852), np.float32(3463.4788)]
2025-09-14 16:34:06,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:34:06,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 36 minutes, 27 seconds)
2025-09-14 16:36:24,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:36:32,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2087.08325 ± 667.412
2025-09-14 16:36:32,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1355.8995), np.float32(2633.205), np.float32(1495.708), np.float32(2616.4966), np.float32(1514.9816), np.float32(3217.8833), np.float32(2622.2678), np.float32(1521.5864), np.float32(2554.558), np.float32(1338.247)]
2025-09-14 16:36:32,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:36:32,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 33 minutes, 39 seconds)
2025-09-14 16:38:52,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:39:00,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2114.55298 ± 1014.639
2025-09-14 16:39:00,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3464.5046), np.float32(3179.1165), np.float32(1510.5646), np.float32(1568.8033), np.float32(1131.1517), np.float32(1153.7261), np.float32(3299.7783), np.float32(1137.4164), np.float32(3426.2668), np.float32(1274.1986)]
2025-09-14 16:39:00,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:39:00,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 30 minutes, 26 seconds)
2025-09-14 16:41:21,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:41:29,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2433.20630 ± 635.227
2025-09-14 16:41:29,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3488.819), np.float32(2376.6726), np.float32(2551.1433), np.float32(3395.6606), np.float32(1920.414), np.float32(1452.0927), np.float32(1985.2908), np.float32(2971.3079), np.float32(1966.1927), np.float32(2224.4678)]
2025-09-14 16:41:29,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:41:29,129 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 28 minutes, 15 seconds)
2025-09-14 16:43:45,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:43:54,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1352.20642 ± 307.681
2025-09-14 16:43:54,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1174.0951), np.float32(1441.4414), np.float32(806.24506), np.float32(1177.319), np.float32(1322.7272), np.float32(2075.4294), np.float32(1388.4741), np.float32(1242.5269), np.float32(1554.4471), np.float32(1339.3593)]
2025-09-14 16:43:54,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:43:54,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 25 minutes, 37 seconds)
2025-09-14 16:46:08,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:46:16,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2080.35254 ± 1004.920
2025-09-14 16:46:16,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1501.7197), np.float32(969.43), np.float32(3152.9011), np.float32(2552.6643), np.float32(867.429), np.float32(1557.2833), np.float32(1506.4915), np.float32(1469.2585), np.float32(3543.8213), np.float32(3682.5256)]
2025-09-14 16:46:16,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:46:16,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 22 minutes, 44 seconds)
2025-09-14 16:48:30,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:48:39,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2255.96021 ± 745.992
2025-09-14 16:48:39,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1726.94), np.float32(1892.9701), np.float32(2388.5034), np.float32(1493.6329), np.float32(2208.1575), np.float32(2775.8547), np.float32(2421.0034), np.float32(1432.9365), np.float32(2080.3777), np.float32(4139.2266)]
2025-09-14 16:48:39,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:48:39,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 19 minutes, 51 seconds)
2025-09-14 16:50:53,672 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:51:01,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2621.52222 ± 950.345
2025-09-14 16:51:01,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3202.8945), np.float32(3495.456), np.float32(1977.0826), np.float32(1258.547), np.float32(3525.091), np.float32(1443.3516), np.float32(2107.132), np.float32(3795.5808), np.float32(1757.5939), np.float32(3652.4915)]
2025-09-14 16:51:01,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:51:01,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (2621.52) for latency 21
2025-09-14 16:51:01,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 16 minutes, 55 seconds)
2025-09-14 16:53:16,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:53:24,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2624.20117 ± 954.486
2025-09-14 16:53:24,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4120.273), np.float32(2424.4792), np.float32(1626.7263), np.float32(3347.6196), np.float32(2694.6448), np.float32(1337.212), np.float32(3186.457), np.float32(1337.3511), np.float32(3876.8677), np.float32(2290.3835)]
2025-09-14 16:53:24,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:53:24,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (2624.20) for latency 21
2025-09-14 16:53:24,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 13 minutes, 55 seconds)
2025-09-14 16:55:39,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:55:47,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3542.03979 ± 774.011
2025-09-14 16:55:47,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2740.975), np.float32(4247.0186), np.float32(4129.7734), np.float32(4136.6357), np.float32(4207.3706), np.float32(3481.061), np.float32(3684.7869), np.float32(4184.5186), np.float32(1980.9647), np.float32(2627.2917)]
2025-09-14 16:55:47,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:55:47,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (3542.04) for latency 21
2025-09-14 16:55:47,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 11 minutes, 18 seconds)
2025-09-14 16:58:01,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 16:58:09,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2919.39697 ± 1048.371
2025-09-14 16:58:09,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2373.3323), np.float32(3116.0776), np.float32(4206.063), np.float32(3937.9094), np.float32(1194.5659), np.float32(2377.0864), np.float32(2425.7048), np.float32(4020.2588), np.float32(1508.047), np.float32(4034.9211)]
2025-09-14 16:58:09,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 16:58:09,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 8 minutes, 56 seconds)
2025-09-14 17:00:24,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:00:32,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2281.16235 ± 880.129
2025-09-14 17:00:32,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1520.749), np.float32(2391.5781), np.float32(2981.528), np.float32(1687.9579), np.float32(2184.6555), np.float32(1423.8524), np.float32(3928.414), np.float32(1219.6614), np.float32(3547.1892), np.float32(1926.0394)]
2025-09-14 17:00:32,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:00:32,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 6 minutes, 33 seconds)
2025-09-14 17:02:46,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:02:54,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3244.39941 ± 1273.948
2025-09-14 17:02:54,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4297.722), np.float32(4338.8384), np.float32(2893.1174), np.float32(1604.314), np.float32(4121.806), np.float32(1312.394), np.float32(4305.816), np.float32(1291.5652), np.float32(4319.2573), np.float32(3959.162)]
2025-09-14 17:02:54,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:02:54,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 4 minutes, 10 seconds)
2025-09-14 17:05:09,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:05:17,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3113.33813 ± 1104.837
2025-09-14 17:05:17,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4079.0098), np.float32(4130.7646), np.float32(4181.645), np.float32(2111.63), np.float32(3917.5784), np.float32(1268.6678), np.float32(1789.6487), np.float32(2255.8132), np.float32(4359.0166), np.float32(3039.6077)]
2025-09-14 17:05:17,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:05:17,948 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 1 minute, 49 seconds)
2025-09-14 17:07:32,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:07:40,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2386.47070 ± 978.203
2025-09-14 17:07:40,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1989.1906), np.float32(4325.2817), np.float32(2390.3176), np.float32(2030.268), np.float32(1543.6261), np.float32(2556.6626), np.float32(1396.7625), np.float32(1369.5504), np.float32(2219.9612), np.float32(4043.085)]
2025-09-14 17:07:40,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:07:40,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 59 minutes, 28 seconds)
2025-09-14 17:09:55,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:10:03,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3495.89209 ± 939.806
2025-09-14 17:10:03,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3375.3982), np.float32(4406.342), np.float32(4231.3975), np.float32(1612.7959), np.float32(4230.0728), np.float32(2550.942), np.float32(2394.9937), np.float32(3583.9368), np.float32(4290.607), np.float32(4282.4346)]
2025-09-14 17:10:03,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:10:03,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 57 minutes, 8 seconds)
2025-09-14 17:12:18,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:12:26,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3614.13599 ± 622.267
2025-09-14 17:12:26,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2486.1855), np.float32(3669.1965), np.float32(2470.4753), np.float32(3420.7773), np.float32(3674.451), np.float32(4017.7932), np.float32(4396.7075), np.float32(4114.39), np.float32(3910.9758), np.float32(3980.406)]
2025-09-14 17:12:26,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:12:26,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (3614.14) for latency 21
2025-09-14 17:12:26,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 54 minutes, 44 seconds)
2025-09-14 17:14:40,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:14:48,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2516.65503 ± 1128.360
2025-09-14 17:14:48,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3651.1904), np.float32(1165.1581), np.float32(4247.741), np.float32(2159.5842), np.float32(2577.1086), np.float32(4336.4507), np.float32(1791.2751), np.float32(2478.7102), np.float32(1560.8871), np.float32(1198.4453)]
2025-09-14 17:14:48,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:14:48,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 52 minutes, 19 seconds)
2025-09-14 17:17:02,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:17:10,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3409.77490 ± 937.556
2025-09-14 17:17:10,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3793.9524), np.float32(2151.841), np.float32(4220.105), np.float32(4230.1997), np.float32(2806.3738), np.float32(4224.479), np.float32(3602.9653), np.float32(4178.5444), np.float32(3496.1377), np.float32(1393.1542)]
2025-09-14 17:17:10,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:17:10,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 49 minutes, 52 seconds)
2025-09-14 17:19:24,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:19:32,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2501.33105 ± 1039.121
2025-09-14 17:19:32,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4042.0693), np.float32(1741.4785), np.float32(2664.042), np.float32(3912.2517), np.float32(3007.88), np.float32(1376.0696), np.float32(3637.72), np.float32(1657.9832), np.float32(1837.4633), np.float32(1136.3527)]
2025-09-14 17:19:32,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:19:32,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 47 minutes, 26 seconds)
2025-09-14 17:21:46,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:21:54,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2711.76025 ± 1286.340
2025-09-14 17:21:54,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2255.063), np.float32(4383.454), np.float32(2119.3394), np.float32(1295.1954), np.float32(4409.266), np.float32(4379.273), np.float32(1653.0471), np.float32(1716.8114), np.float32(1172.6086), np.float32(3733.5432)]
2025-09-14 17:21:54,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:21:54,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 45 minutes, 1 second)
2025-09-14 17:24:08,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:24:16,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4050.05396 ± 373.502
2025-09-14 17:24:16,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3854.2104), np.float32(4125.006), np.float32(4062.5413), np.float32(4171.8486), np.float32(4253.901), np.float32(4303.4326), np.float32(4225.7085), np.float32(4255.699), np.float32(2993.898), np.float32(4254.292)]
2025-09-14 17:24:16,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:24:16,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (4050.05) for latency 21
2025-09-14 17:24:16,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 42 minutes, 37 seconds)
2025-09-14 17:26:30,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:26:38,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3091.12842 ± 1139.246
2025-09-14 17:26:38,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4419.913), np.float32(1470.9437), np.float32(2888.433), np.float32(4282.026), np.float32(4416.4985), np.float32(2122.4746), np.float32(2063.8494), np.float32(4583.131), np.float32(2336.0842), np.float32(2327.9307)]
2025-09-14 17:26:38,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:26:38,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 40 minutes, 15 seconds)
2025-09-14 17:28:52,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:29:00,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3621.90186 ± 890.875
2025-09-14 17:29:00,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2138.879), np.float32(3812.3835), np.float32(4323.306), np.float32(3886.6003), np.float32(3081.011), np.float32(4403.7153), np.float32(4277.4824), np.float32(1865.5428), np.float32(4306.935), np.float32(4123.1616)]
2025-09-14 17:29:00,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:29:00,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 37 minutes, 53 seconds)
2025-09-14 17:31:14,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:31:22,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3656.29614 ± 1148.369
2025-09-14 17:31:22,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4255.1733), np.float32(3411.8018), np.float32(1539.0044), np.float32(4412.3643), np.float32(4199.903), np.float32(4392.8115), np.float32(1328.3047), np.float32(4493.101), np.float32(4352.215), np.float32(4178.282)]
2025-09-14 17:31:22,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:31:22,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 35 minutes, 30 seconds)
2025-09-14 17:33:36,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:33:44,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3020.06006 ± 1168.668
2025-09-14 17:33:44,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3355.6274), np.float32(1250.4263), np.float32(2211.4624), np.float32(4393.3984), np.float32(3342.2637), np.float32(4451.1294), np.float32(4012.6538), np.float32(1639.5758), np.float32(3935.638), np.float32(1608.4258)]
2025-09-14 17:33:44,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:33:44,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 33 minutes, 8 seconds)
2025-09-14 17:35:58,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:36:06,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3326.91406 ± 1125.290
2025-09-14 17:36:06,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4035.3015), np.float32(4195.285), np.float32(1755.1799), np.float32(4492.1265), np.float32(1226.535), np.float32(4140.2993), np.float32(2105.1047), np.float32(4209.402), np.float32(3299.979), np.float32(3809.9275)]
2025-09-14 17:36:06,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:36:06,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 30 minutes, 46 seconds)
2025-09-14 17:38:20,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:38:28,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4018.60889 ± 535.928
2025-09-14 17:38:28,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3973.7166), np.float32(3926.3489), np.float32(4037.0461), np.float32(4354.7197), np.float32(2491.4856), np.float32(4421.978), np.float32(4362.6333), np.float32(4165.504), np.float32(4343.186), np.float32(4109.4736)]
2025-09-14 17:38:28,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:38:28,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 28 minutes, 23 seconds)
2025-09-14 17:40:43,258 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:40:51,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3030.02295 ± 1121.644
2025-09-14 17:40:51,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1261.7573), np.float32(4429.452), np.float32(2690.8452), np.float32(2816.2966), np.float32(3333.6174), np.float32(1195.4027), np.float32(4606.5146), np.float32(4182.1494), np.float32(2777.631), np.float32(3006.562)]
2025-09-14 17:40:51,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:40:51,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 26 minutes, 2 seconds)
2025-09-14 17:43:05,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:43:13,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2158.28369 ± 696.827
2025-09-14 17:43:13,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1948.7478), np.float32(1225.7393), np.float32(1649.6296), np.float32(2625.3232), np.float32(1264.8387), np.float32(1954.7396), np.float32(3437.4133), np.float32(3012.838), np.float32(1879.3844), np.float32(2584.1829)]
2025-09-14 17:43:13,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:43:13,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 23 minutes, 41 seconds)
2025-09-14 17:45:27,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:45:35,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3351.62231 ± 1259.365
2025-09-14 17:45:35,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4122.3623), np.float32(3043.3489), np.float32(1516.4247), np.float32(1602.1678), np.float32(4629.7217), np.float32(1480.3451), np.float32(3952.465), np.float32(4472.3984), np.float32(4312.966), np.float32(4384.022)]
2025-09-14 17:45:35,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:45:35,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 21 minutes, 19 seconds)
2025-09-14 17:47:49,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:47:57,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2773.15576 ± 1288.691
2025-09-14 17:47:57,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1216.8229), np.float32(1970.9779), np.float32(2166.098), np.float32(4602.9175), np.float32(1718.8861), np.float32(2280.9922), np.float32(4424.7627), np.float32(1404.6506), np.float32(4677.2783), np.float32(3268.172)]
2025-09-14 17:47:57,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:47:57,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 18 minutes, 57 seconds)
2025-09-14 17:50:12,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:50:20,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3649.05933 ± 972.323
2025-09-14 17:50:20,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4049.944), np.float32(3445.7012), np.float32(4571.201), np.float32(4595.902), np.float32(1461.9349), np.float32(3695.706), np.float32(4156.582), np.float32(3998.13), np.float32(2249.322), np.float32(4266.1685)]
2025-09-14 17:50:20,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:50:20,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 16 minutes, 36 seconds)
2025-09-14 17:52:34,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:52:42,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 4128.38623 ± 580.591
2025-09-14 17:52:42,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3791.6377), np.float32(4508.762), np.float32(2528.2502), np.float32(4394.48), np.float32(4232.887), np.float32(4326.841), np.float32(4427.507), np.float32(4039.93), np.float32(4389.863), np.float32(4643.7075)]
2025-09-14 17:52:42,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:52:42,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (4128.39) for latency 21
2025-09-14 17:52:42,179 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 14 minutes, 13 seconds)
2025-09-14 17:54:56,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:55:04,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2146.64087 ± 824.480
2025-09-14 17:55:04,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1983.8575), np.float32(2546.775), np.float32(1600.2666), np.float32(4159.1655), np.float32(1300.2091), np.float32(1497.9773), np.float32(1844.888), np.float32(2743.6694), np.float32(2419.5547), np.float32(1370.0446)]
2025-09-14 17:55:04,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:55:04,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 11 minutes, 50 seconds)
2025-09-14 17:57:28,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 17:57:36,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3446.87891 ± 1105.220
2025-09-14 17:57:36,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2439.5396), np.float32(1732.8452), np.float32(3369.9255), np.float32(4400.1455), np.float32(4619.9414), np.float32(2003.4557), np.float32(4164.589), np.float32(2568.375), np.float32(4658.5215), np.float32(4511.453)]
2025-09-14 17:57:36,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 17:57:36,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 9 minutes, 36 seconds)
2025-09-14 18:00:02,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:00:10,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3427.13672 ± 1350.740
2025-09-14 18:00:10,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4613.857), np.float32(4574.604), np.float32(1251.4047), np.float32(4570.7163), np.float32(4444.4575), np.float32(4121.197), np.float32(4654.61), np.float32(1460.9409), np.float32(2321.5261), np.float32(2258.0544)]
2025-09-14 18:00:10,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:00:10,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 7 minutes, 19 seconds)
2025-09-14 18:02:32,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:02:40,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2577.63794 ± 1036.646
2025-09-14 18:02:40,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3584.053), np.float32(1507.5856), np.float32(1829.9674), np.float32(2463.743), np.float32(4337.3247), np.float32(1257.2653), np.float32(1708.0797), np.float32(3476.609), np.float32(3674.2454), np.float32(1937.5045)]
2025-09-14 18:02:40,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:02:40,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 56 seconds)
2025-09-14 18:04:57,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:05:05,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2547.80811 ± 1309.021
2025-09-14 18:05:05,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1168.0627), np.float32(1344.0659), np.float32(1664.7594), np.float32(3500.4492), np.float32(4064.5613), np.float32(4325.3555), np.float32(4330.2207), np.float32(2611.3867), np.float32(1183.9204), np.float32(1285.2997)]
2025-09-14 18:05:05,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:05:05,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 28 seconds)
2025-09-14 18:07:27,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 21...
2025-09-14 18:07:35,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3439.27734 ± 1059.922
2025-09-14 18:07:35,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2312.1396), np.float32(4361.7163), np.float32(2289.778), np.float32(3087.0444), np.float32(2099.5464), np.float32(2273.1846), np.float32(4662.5537), np.float32(4307.433), np.float32(4583.03), np.float32(4416.3486)]
2025-09-14 18:07:35,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 18:07:35,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1251 [DEBUG]: Training session finished
