2025-09-14 08:43:01,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1108 [DEBUG]: logdir: _logs/noise-eval-v2/halfcheetah/bpql-noise_0.075-delay_12
2025-09-14 08:43:01,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1109 [DEBUG]: trainer_prefix: noise-eval-v2/halfcheetah/bpql-noise_0.075-delay_12
2025-09-14 08:43:01,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'12': <latency_env.delayed_mdp.ConstantDelay object at 0x7f77bba96e10>}
2025-09-14 08:43:01,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1111 [DEBUG]: using device: cpu
2025-09-14 08:43:01,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-14 08:43:01,607 baseline-bpql-noisepromille75-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=89, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-14 08:43:01,607 baseline-bpql-noisepromille75-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-14 08:43:03,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-14 08:43:03,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-14 08:45:33,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 08:45:39,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: -169.73917 ± 121.277
2025-09-14 08:45:39,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-311.12347), np.float32(-327.76263), np.float32(-190.19804), np.float32(-54.93959), np.float32(-250.9115), np.float32(-14.339247), np.float32(-329.13354), np.float32(-102.73856), np.float32(-19.565027), np.float32(-96.68019)]
2025-09-14 08:45:39,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 08:45:39,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (-169.74) for latency 12
2025-09-14 08:45:39,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 4 hours, 18 minutes, 27 seconds)
2025-09-14 08:48:11,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 08:48:18,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: -197.78024 ± 44.494
2025-09-14 08:48:18,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-267.12247), np.float32(-148.45937), np.float32(-233.41129), np.float32(-203.80554), np.float32(-237.10948), np.float32(-166.10669), np.float32(-238.77742), np.float32(-126.02652), np.float32(-154.75755), np.float32(-202.2259)]
2025-09-14 08:48:18,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 08:48:18,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 4 hours, 17 minutes, 45 seconds)
2025-09-14 08:50:59,955 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 08:51:07,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: -161.03194 ± 78.266
2025-09-14 08:51:07,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-163.64561), np.float32(-161.99628), np.float32(-60.545315), np.float32(-154.49886), np.float32(-95.98212), np.float32(-186.7459), np.float32(-237.16829), np.float32(-93.648026), np.float32(-112.07721), np.float32(-344.01172)]
2025-09-14 08:51:07,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 08:51:07,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (-161.03) for latency 12
2025-09-14 08:51:07,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 4 hours, 20 minutes, 41 seconds)
2025-09-14 08:53:47,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 08:53:54,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 211.91826 ± 171.590
2025-09-14 08:53:54,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(80.04903), np.float32(137.9937), np.float32(510.3984), np.float32(549.13696), np.float32(222.79437), np.float32(0.08584903), np.float32(172.51779), np.float32(218.05798), np.float32(153.06284), np.float32(75.085495)]
2025-09-14 08:53:54,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 08:53:54,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (211.92) for latency 12
2025-09-14 08:53:54,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 4 hours, 20 minutes, 28 seconds)
2025-09-14 08:56:41,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 08:56:50,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 433.51691 ± 284.320
2025-09-14 08:56:50,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(612.9072), np.float32(164.01407), np.float32(214.33444), np.float32(365.16306), np.float32(825.73267), np.float32(291.6928), np.float32(412.90408), np.float32(60.309032), np.float32(376.69644), np.float32(1011.4151)]
2025-09-14 08:56:50,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 08:56:50,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (433.52) for latency 12
2025-09-14 08:56:50,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 4 hours, 21 minutes, 54 seconds)
2025-09-14 09:00:03,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 09:00:12,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 685.98065 ± 216.394
2025-09-14 09:00:12,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(391.78796), np.float32(1027.9391), np.float32(820.01526), np.float32(794.0186), np.float32(802.6249), np.float32(466.3904), np.float32(460.82074), np.float32(657.90955), np.float32(474.58032), np.float32(963.7196)]
2025-09-14 09:00:12,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:00:12,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (685.98) for latency 12
2025-09-14 09:00:12,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 33 minutes, 20 seconds)
2025-09-14 09:03:28,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 09:03:37,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1227.06860 ± 506.375
2025-09-14 09:03:37,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(770.11523), np.float32(1769.7816), np.float32(611.5901), np.float32(592.17865), np.float32(1529.3147), np.float32(1716.9648), np.float32(1479.1388), np.float32(2019.1187), np.float32(1015.9868), np.float32(766.4961)]
2025-09-14 09:03:37,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:03:37,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (1227.07) for latency 12
2025-09-14 09:03:37,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 44 minutes, 47 seconds)
2025-09-14 09:06:48,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 09:06:57,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 714.80530 ± 293.301
2025-09-14 09:06:57,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(414.30136), np.float32(983.11316), np.float32(1060.0153), np.float32(401.3623), np.float32(461.12622), np.float32(1260.3508), np.float32(900.84546), np.float32(568.3339), np.float32(611.46265), np.float32(487.14154)]
2025-09-14 09:06:57,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:06:57,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 51 minutes, 37 seconds)
2025-09-14 09:10:07,157 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 09:10:16,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1481.99146 ± 626.294
2025-09-14 09:10:16,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2223.7583), np.float32(2514.9265), np.float32(1546.9565), np.float32(2008.29), np.float32(516.68964), np.float32(1474.9059), np.float32(1783.0099), np.float32(819.1371), np.float32(1088.5344), np.float32(843.70703)]
2025-09-14 09:10:16,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:10:16,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (1481.99) for latency 12
2025-09-14 09:10:16,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 4 hours, 57 minutes, 47 seconds)
2025-09-14 09:13:25,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 09:13:34,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2056.38110 ± 192.852
2025-09-14 09:13:34,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1927.5167), np.float32(1873.6501), np.float32(2258.039), np.float32(2051.0422), np.float32(1694.3265), np.float32(2112.4907), np.float32(1919.756), np.float32(2132.4763), np.float32(2240.254), np.float32(2354.259)]
2025-09-14 09:13:34,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:13:34,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (2056.38) for latency 12
2025-09-14 09:13:34,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 5 hours, 1 minute, 16 seconds)
2025-09-14 09:16:44,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 09:16:53,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1941.18225 ± 597.154
2025-09-14 09:16:53,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2157.284), np.float32(2624.0251), np.float32(2626.9941), np.float32(2162.958), np.float32(2614.4182), np.float32(1302.0437), np.float32(1231.5009), np.float32(865.22), np.float32(1980.1506), np.float32(1847.2261)]
2025-09-14 09:16:53,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:16:53,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 57 minutes)
2025-09-14 09:20:04,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 09:20:14,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2183.39404 ± 708.870
2025-09-14 09:20:14,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1536.5051), np.float32(2860.877), np.float32(2993.8442), np.float32(1637.822), np.float32(2836.2937), np.float32(1313.7933), np.float32(3108.6277), np.float32(2585.2246), np.float32(1529.7234), np.float32(1431.2305)]
2025-09-14 09:20:14,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:20:14,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (2183.39) for latency 12
2025-09-14 09:20:14,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 52 minutes, 26 seconds)
2025-09-14 09:23:36,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 09:23:46,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1990.74280 ± 591.193
2025-09-14 09:23:46,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2661.1238), np.float32(2475.0896), np.float32(2716.3005), np.float32(2720.2898), np.float32(1565.6339), np.float32(1976.5433), np.float32(1769.1234), np.float32(1754.2174), np.float32(1147.0804), np.float32(1122.0247)]
2025-09-14 09:23:46,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:23:46,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 52 minutes, 35 seconds)
2025-09-14 09:27:08,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 09:27:18,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1714.45386 ± 627.825
2025-09-14 09:27:18,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2142.586), np.float32(2415.339), np.float32(1792.4796), np.float32(1690.6393), np.float32(1060.7727), np.float32(2444.3345), np.float32(919.59094), np.float32(1067.1897), np.float32(2590.9731), np.float32(1020.6348)]
2025-09-14 09:27:18,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:27:18,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 53 minutes, 11 seconds)
2025-09-14 09:30:41,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 09:30:51,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2000.87830 ± 886.691
2025-09-14 09:30:51,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2648.1191), np.float32(1433.8528), np.float32(439.28104), np.float32(3567.1807), np.float32(2345.3064), np.float32(923.8538), np.float32(1565.0995), np.float32(2525.3828), np.float32(2686.5876), np.float32(1874.119)]
2025-09-14 09:30:51,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:30:51,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 53 minutes, 48 seconds)
2025-09-14 09:34:00,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 09:34:08,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1604.74451 ± 531.427
2025-09-14 09:34:08,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1511.3029), np.float32(1541.9144), np.float32(1172.1836), np.float32(1434.4291), np.float32(1295.8444), np.float32(2898.0044), np.float32(1396.5215), np.float32(2266.8357), np.float32(1025.2917), np.float32(1505.1171)]
2025-09-14 09:34:08,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:34:08,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 4 hours, 49 minutes, 53 seconds)
2025-09-14 09:36:54,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 09:37:01,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1742.20081 ± 595.122
2025-09-14 09:37:01,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1602.5356), np.float32(1432.8456), np.float32(1776.6578), np.float32(1314.8956), np.float32(3164.3035), np.float32(1257.9833), np.float32(1243.9675), np.float32(2382.971), np.float32(1236.53), np.float32(2009.3181)]
2025-09-14 09:37:01,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:37:01,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours, 38 minutes, 38 seconds)
2025-09-14 09:39:33,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 09:39:41,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1663.40527 ± 346.455
2025-09-14 09:39:41,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1728.8771), np.float32(1333.2812), np.float32(1438.3213), np.float32(1605.594), np.float32(1632.9188), np.float32(1178.2726), np.float32(1827.345), np.float32(2178.6016), np.float32(2319.4856), np.float32(1391.3561)]
2025-09-14 09:39:41,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:39:41,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 20 minutes, 47 seconds)
2025-09-14 09:42:13,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 09:42:20,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1790.59900 ± 736.397
2025-09-14 09:42:20,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1224.6097), np.float32(3390.8608), np.float32(1898.1261), np.float32(2484.4983), np.float32(1127.0024), np.float32(1306.1039), np.float32(1234.7161), np.float32(2570.2722), np.float32(1478.5139), np.float32(1191.2845)]
2025-09-14 09:42:20,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:42:20,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 4 hours, 3 minutes, 22 seconds)
2025-09-14 09:45:15,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 09:45:25,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1611.55396 ± 843.319
2025-09-14 09:45:25,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1134.1907), np.float32(1490.1914), np.float32(3296.41), np.float32(1512.1067), np.float32(1596.781), np.float32(2456.5764), np.float32(2018.2946), np.float32(1514.7108), np.float32(1207.6223), np.float32(-111.34485)]
2025-09-14 09:45:25,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:45:25,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 3 hours, 53 minutes, 10 seconds)
2025-09-14 09:48:50,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 09:49:01,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1544.38977 ± 364.972
2025-09-14 09:49:01,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1225.4489), np.float32(1480.6501), np.float32(1692.6813), np.float32(1941.9238), np.float32(2264.2083), np.float32(1439.8271), np.float32(1102.3665), np.float32(1824.0582), np.float32(1072.3451), np.float32(1400.3887)]
2025-09-14 09:49:01,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:49:01,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 3 hours, 54 minutes, 57 seconds)
2025-09-14 09:52:26,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 09:52:36,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1942.16187 ± 908.594
2025-09-14 09:52:36,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(68.54452), np.float32(3683.4124), np.float32(1844.9249), np.float32(1528.9082), np.float32(2824.9343), np.float32(1647.3864), np.float32(2488.4307), np.float32(1764.0638), np.float32(2164.5986), np.float32(1406.4154)]
2025-09-14 09:52:36,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:52:36,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 4 hours, 3 minutes, 8 seconds)
2025-09-14 09:56:01,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 09:56:11,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2193.14014 ± 773.885
2025-09-14 09:56:11,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2932.9187), np.float32(2181.7244), np.float32(1281.1327), np.float32(2432.0808), np.float32(3288.794), np.float32(1346.2657), np.float32(2416.255), np.float32(1260.3999), np.float32(1492.2965), np.float32(3299.5334)]
2025-09-14 09:56:11,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:56:11,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (2193.14) for latency 12
2025-09-14 09:56:11,459 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 4 hours, 14 minutes, 12 seconds)
2025-09-14 09:59:35,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 09:59:45,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1965.68689 ± 704.879
2025-09-14 09:59:45,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1996.1453), np.float32(1195.0527), np.float32(1193.9408), np.float32(1609.6501), np.float32(1964.7089), np.float32(2374.9768), np.float32(3555.1387), np.float32(1213.1146), np.float32(1963.0457), np.float32(2591.0962)]
2025-09-14 09:59:45,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:59:45,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 4 hours, 24 minutes, 49 seconds)
2025-09-14 10:03:10,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 10:03:20,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2013.90076 ± 891.943
2025-09-14 10:03:20,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1385.2095), np.float32(672.5908), np.float32(2825.1687), np.float32(1152.7341), np.float32(2498.1082), np.float32(1992.6633), np.float32(2165.359), np.float32(1062.2555), np.float32(3650.731), np.float32(2734.1875)]
2025-09-14 10:03:20,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:03:20,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 4 hours, 28 minutes, 40 seconds)
2025-09-14 10:06:46,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 10:06:56,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2476.14746 ± 760.529
2025-09-14 10:06:56,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3125.0398), np.float32(2060.2861), np.float32(3557.2622), np.float32(1529.0677), np.float32(3325.636), np.float32(2880.694), np.float32(2895.0889), np.float32(2311.9336), np.float32(1165.9879), np.float32(1910.4792)]
2025-09-14 10:06:56,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:06:56,396 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (2476.15) for latency 12
2025-09-14 10:06:56,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 4 hours, 25 minutes, 15 seconds)
2025-09-14 10:10:21,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 10:10:31,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2131.86108 ± 796.385
2025-09-14 10:10:31,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2797.1018), np.float32(2571.9272), np.float32(1288.6938), np.float32(3782.2192), np.float32(1316.4319), np.float32(1382.8995), np.float32(1557.9634), np.float32(1975.4507), np.float32(1766.749), np.float32(2879.1736)]
2025-09-14 10:10:31,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:10:31,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 4 hours, 21 minutes, 35 seconds)
2025-09-14 10:13:56,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 10:14:06,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1926.87048 ± 768.436
2025-09-14 10:14:06,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2163.3472), np.float32(2293.611), np.float32(528.511), np.float32(1291.917), np.float32(1223.8416), np.float32(2713.3862), np.float32(3044.4312), np.float32(1197.6196), np.float32(2366.177), np.float32(2445.8628)]
2025-09-14 10:14:06,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:14:06,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 4 hours, 18 minutes, 5 seconds)
2025-09-14 10:17:31,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 10:17:41,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1503.55432 ± 294.508
2025-09-14 10:17:41,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1622.6913), np.float32(1208.5463), np.float32(1399.0516), np.float32(1379.5023), np.float32(1719.3125), np.float32(1074.5319), np.float32(1694.1912), np.float32(1675.6078), np.float32(2084.2349), np.float32(1177.8728)]
2025-09-14 10:17:41,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:17:41,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 4 hours, 14 minutes, 37 seconds)
2025-09-14 10:21:05,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 10:21:16,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2184.01416 ± 522.673
2025-09-14 10:21:16,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2348.9553), np.float32(1637.5406), np.float32(1379.5576), np.float32(2496.718), np.float32(3008.7815), np.float32(2652.2698), np.float32(1750.586), np.float32(2322.1458), np.float32(1598.8666), np.float32(2644.7212)]
2025-09-14 10:21:16,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:21:16,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 4 hours, 10 minutes, 55 seconds)
2025-09-14 10:24:41,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 10:24:51,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2254.33081 ± 728.680
2025-09-14 10:24:51,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2772.4275), np.float32(1917.0509), np.float32(1321.3688), np.float32(3029.047), np.float32(3404.502), np.float32(3057.4275), np.float32(2148.9517), np.float32(1356.2152), np.float32(1452.9951), np.float32(2083.3206)]
2025-09-14 10:24:51,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:24:51,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 4 hours, 7 minutes, 15 seconds)
2025-09-14 10:28:14,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 10:28:25,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2347.81909 ± 581.029
2025-09-14 10:28:25,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2807.4758), np.float32(1951.5104), np.float32(3219.0193), np.float32(2313.596), np.float32(2350.001), np.float32(2784.467), np.float32(2821.7344), np.float32(1450.2365), np.float32(1338.9583), np.float32(2441.1938)]
2025-09-14 10:28:25,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:28:25,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 4 hours, 3 minutes, 17 seconds)
2025-09-14 10:31:48,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 10:31:59,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2054.02588 ± 713.196
2025-09-14 10:31:59,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2886.454), np.float32(2048.4338), np.float32(1485.1162), np.float32(1456.4844), np.float32(3080.2397), np.float32(1307.7598), np.float32(2508.5684), np.float32(1401.499), np.float32(1335.1532), np.float32(3030.55)]
2025-09-14 10:31:59,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:31:59,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 59 minutes, 28 seconds)
2025-09-14 10:35:23,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 10:35:34,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2295.31421 ± 704.638
2025-09-14 10:35:34,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1209.0315), np.float32(1711.7455), np.float32(2752.4507), np.float32(2614.2444), np.float32(2381.1409), np.float32(3105.3723), np.float32(1921.0354), np.float32(3380.6057), np.float32(2608.879), np.float32(1268.6368)]
2025-09-14 10:35:34,446 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:35:34,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 56 minutes, 1 second)
2025-09-14 10:38:58,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 10:39:09,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2245.82788 ± 878.924
2025-09-14 10:39:09,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3406.9285), np.float32(3570.0278), np.float32(1744.1779), np.float32(1524.8645), np.float32(1963.7611), np.float32(1639.9696), np.float32(3691.3342), np.float32(1959.5563), np.float32(1648.143), np.float32(1309.5173)]
2025-09-14 10:39:09,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:39:09,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 52 minutes, 28 seconds)
2025-09-14 10:42:34,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 10:42:44,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2140.18579 ± 795.029
2025-09-14 10:42:44,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1700.7545), np.float32(2496.3413), np.float32(1717.3525), np.float32(1283.7607), np.float32(796.422), np.float32(3224.613), np.float32(2269.0322), np.float32(3571.8474), np.float32(2309.9453), np.float32(2031.7876)]
2025-09-14 10:42:44,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:42:44,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 49 minutes)
2025-09-14 10:46:08,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 10:46:19,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2496.95630 ± 825.202
2025-09-14 10:46:19,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3192.3281), np.float32(2929.825), np.float32(2943.117), np.float32(2461.8142), np.float32(1285.8994), np.float32(3500.3535), np.float32(1740.3529), np.float32(3643.331), np.float32(1416.3594), np.float32(1856.1818)]
2025-09-14 10:46:19,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:46:19,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (2496.96) for latency 12
2025-09-14 10:46:19,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 45 minutes, 30 seconds)
2025-09-14 10:49:42,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 10:49:52,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2118.13525 ± 778.476
2025-09-14 10:49:52,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3326.112), np.float32(2004.4176), np.float32(1851.5193), np.float32(3206.2231), np.float32(1206.4048), np.float32(1435.1896), np.float32(2092.9392), np.float32(1369.9794), np.float32(1515.8008), np.float32(3172.7668)]
2025-09-14 10:49:52,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:49:52,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 41 minutes, 53 seconds)
2025-09-14 10:53:14,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 10:53:24,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2052.66064 ± 657.003
2025-09-14 10:53:24,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1924.3599), np.float32(2996.0583), np.float32(1582.8022), np.float32(2086.037), np.float32(1732.3986), np.float32(1345.9175), np.float32(1444.1409), np.float32(2587.6304), np.float32(3327.1533), np.float32(1500.1072)]
2025-09-14 10:53:24,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:53:24,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 37 minutes, 30 seconds)
2025-09-14 10:56:42,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 10:56:51,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1909.49976 ± 776.840
2025-09-14 10:56:51,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1441.6548), np.float32(1289.88), np.float32(1302.6133), np.float32(2677.8523), np.float32(2226.2712), np.float32(1266.5591), np.float32(3699.6843), np.float32(1273.581), np.float32(1542.7242), np.float32(2374.1775)]
2025-09-14 10:56:51,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:56:51,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 32 minutes, 33 seconds)
2025-09-14 11:00:07,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 11:00:16,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2230.34033 ± 842.648
2025-09-14 11:00:16,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2528.218), np.float32(3125.7556), np.float32(2043.9468), np.float32(3633.9275), np.float32(1183.0007), np.float32(1922.8666), np.float32(3371.8345), np.float32(1503.2709), np.float32(1720.6493), np.float32(1269.9349)]
2025-09-14 11:00:16,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:00:16,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 26 minutes, 53 seconds)
2025-09-14 11:03:20,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 11:03:29,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2136.84229 ± 805.911
2025-09-14 11:03:29,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1400.842), np.float32(1650.2806), np.float32(2140.0369), np.float32(1439.2588), np.float32(3281.7512), np.float32(2457.7969), np.float32(1480.1194), np.float32(3187.449), np.float32(3232.9937), np.float32(1097.8948)]
2025-09-14 11:03:29,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:03:29,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 19 minutes, 16 seconds)
2025-09-14 11:06:33,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 11:06:42,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2810.28857 ± 890.754
2025-09-14 11:06:42,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3768.7478), np.float32(3031.3584), np.float32(2419.2275), np.float32(3529.479), np.float32(1310.6608), np.float32(1781.5322), np.float32(3341.3284), np.float32(3912.5945), np.float32(1696.269), np.float32(3311.6897)]
2025-09-14 11:06:42,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:06:42,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (2810.29) for latency 12
2025-09-14 11:06:42,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 11 minutes, 46 seconds)
2025-09-14 11:09:31,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 11:09:38,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2386.36670 ± 911.199
2025-09-14 11:09:38,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3229.5771), np.float32(1234.4971), np.float32(1774.603), np.float32(3203.7485), np.float32(3411.3408), np.float32(2895.4246), np.float32(847.1586), np.float32(1471.5359), np.float32(3200.8447), np.float32(2594.935)]
2025-09-14 11:09:38,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:09:38,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 1 minute, 52 seconds)
2025-09-14 11:12:08,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 11:12:15,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1833.61755 ± 562.981
2025-09-14 11:12:15,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1611.5745), np.float32(2112.213), np.float32(1401.3867), np.float32(1749.1589), np.float32(1305.7659), np.float32(2052.663), np.float32(1450.6948), np.float32(2245.6882), np.float32(3186.3457), np.float32(1220.686)]
2025-09-14 11:12:15,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:12:15,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 49 minutes, 19 seconds)
2025-09-14 11:14:45,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 11:14:52,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2174.10791 ± 599.607
2025-09-14 11:14:52,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1975.3596), np.float32(1608.7281), np.float32(1362.7853), np.float32(2331.0051), np.float32(3088.0842), np.float32(3266.4668), np.float32(1905.5901), np.float32(1611.35), np.float32(2065.395), np.float32(2526.3135)]
2025-09-14 11:14:52,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:14:52,555 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 37 minutes, 36 seconds)
2025-09-14 11:17:22,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 11:17:29,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2044.67285 ± 664.414
2025-09-14 11:17:29,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2721.119), np.float32(1333.7411), np.float32(1765.3932), np.float32(1631.3857), np.float32(2538.8833), np.float32(1335.6644), np.float32(2912.33), np.float32(1600.2535), np.float32(3131.8975), np.float32(1476.0602)]
2025-09-14 11:17:29,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:17:29,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 28 minutes, 20 seconds)
2025-09-14 11:19:59,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 11:20:06,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1863.76697 ± 722.093
2025-09-14 11:20:06,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1741.596), np.float32(1144.6476), np.float32(2455.776), np.float32(2253.3938), np.float32(1292.0417), np.float32(3443.131), np.float32(1154.372), np.float32(1464.3932), np.float32(1254.6365), np.float32(2433.6821)]
2025-09-14 11:20:06,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:20:06,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 19 minutes, 27 seconds)
2025-09-14 11:22:36,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 11:22:43,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1970.19373 ± 472.426
2025-09-14 11:22:43,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1450.4695), np.float32(1660.2737), np.float32(1614.6648), np.float32(2137.759), np.float32(1598.5929), np.float32(2614.8428), np.float32(1523.4595), np.float32(2906.6184), np.float32(1941.7118), np.float32(2253.546)]
2025-09-14 11:22:43,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:22:43,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 13 minutes, 31 seconds)
2025-09-14 11:25:13,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 11:25:20,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2088.70459 ± 634.263
2025-09-14 11:25:20,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1788.173), np.float32(1551.9448), np.float32(3322.681), np.float32(2978.8555), np.float32(1226.8506), np.float32(1803.8784), np.float32(2608.4797), np.float32(2054.6797), np.float32(1914.0823), np.float32(1637.4198)]
2025-09-14 11:25:20,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:25:20,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 10 minutes, 54 seconds)
2025-09-14 11:27:50,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 11:27:57,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2440.36670 ± 922.167
2025-09-14 11:27:57,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3138.5107), np.float32(1202.8884), np.float32(1887.176), np.float32(3154.4177), np.float32(2810.4175), np.float32(1598.2845), np.float32(3606.6694), np.float32(1823.3777), np.float32(3823.2249), np.float32(1358.7017)]
2025-09-14 11:27:57,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:27:57,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 8 minutes, 13 seconds)
2025-09-14 11:30:26,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 11:30:34,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2404.48291 ± 671.541
2025-09-14 11:30:34,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2188.1448), np.float32(2949.8542), np.float32(3253.881), np.float32(2060.0107), np.float32(3368.9824), np.float32(2243.6177), np.float32(1344.0986), np.float32(1865.1759), np.float32(1695.1781), np.float32(3075.8848)]
2025-09-14 11:30:34,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:30:34,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 5 minutes, 34 seconds)
2025-09-14 11:33:03,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 11:33:10,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2244.77808 ± 694.541
2025-09-14 11:33:10,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2552.3354), np.float32(1515.2268), np.float32(2579.8425), np.float32(2097.5112), np.float32(1427.1992), np.float32(2104.1726), np.float32(1393.5676), np.float32(3604.7104), np.float32(2054.062), np.float32(3119.1528)]
2025-09-14 11:33:10,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:33:10,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 2 minutes, 49 seconds)
2025-09-14 11:35:40,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 11:35:47,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1946.63647 ± 587.741
2025-09-14 11:35:47,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1216.409), np.float32(1655.4789), np.float32(2360.92), np.float32(1346.7208), np.float32(1621.9156), np.float32(2737.458), np.float32(2747.0125), np.float32(1986.9978), np.float32(2577.4653), np.float32(1215.989)]
2025-09-14 11:35:47,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:35:47,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 8 seconds)
2025-09-14 11:38:17,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 11:38:24,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2079.50659 ± 731.431
2025-09-14 11:38:24,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1445.985), np.float32(3376.9736), np.float32(3141.9287), np.float32(1800.4888), np.float32(1506.2053), np.float32(2841.8267), np.float32(1422.6858), np.float32(2255.5728), np.float32(1598.1604), np.float32(1405.2383)]
2025-09-14 11:38:24,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:38:24,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 57 minutes, 32 seconds)
2025-09-14 11:40:54,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 11:41:01,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1837.11523 ± 589.707
2025-09-14 11:41:01,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1220.1257), np.float32(1295.6674), np.float32(2203.2866), np.float32(1482.2137), np.float32(1839.8047), np.float32(2065.4604), np.float32(1866.7913), np.float32(1223.4299), np.float32(1884.0453), np.float32(3290.3281)]
2025-09-14 11:41:01,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:41:01,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 54 minutes, 58 seconds)
2025-09-14 11:43:31,332 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 11:43:38,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2072.80640 ± 508.281
2025-09-14 11:43:38,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1697.6454), np.float32(2533.6174), np.float32(2404.957), np.float32(1916.8384), np.float32(2036.1014), np.float32(3279.1626), np.float32(1866.2926), np.float32(1535.6814), np.float32(1539.9529), np.float32(1917.8138)]
2025-09-14 11:43:38,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:43:38,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 52 minutes, 25 seconds)
2025-09-14 11:46:08,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 11:46:15,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2378.74683 ± 820.634
2025-09-14 11:46:15,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2633.1477), np.float32(3068.72), np.float32(3170.3398), np.float32(1449.147), np.float32(3540.8813), np.float32(1606.658), np.float32(1334.3798), np.float32(2337.7917), np.float32(1412.8141), np.float32(3233.5889)]
2025-09-14 11:46:15,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:46:15,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 49 minutes, 53 seconds)
2025-09-14 11:48:45,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 11:48:52,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2572.44092 ± 703.701
2025-09-14 11:48:52,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1968.7528), np.float32(3148.979), np.float32(3425.8384), np.float32(2455.4607), np.float32(1473.588), np.float32(2684.5017), np.float32(3126.2676), np.float32(3609.777), np.float32(1761.5176), np.float32(2069.7256)]
2025-09-14 11:48:52,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:48:52,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 47 minutes, 17 seconds)
2025-09-14 11:51:22,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 11:51:29,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1634.94897 ± 366.976
2025-09-14 11:51:29,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1722.1057), np.float32(1404.1844), np.float32(2166.2097), np.float32(1384.7587), np.float32(2345.859), np.float32(1792.37), np.float32(1685.4045), np.float32(1230.9186), np.float32(1436.0801), np.float32(1181.6)]
2025-09-14 11:51:29,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:51:29,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 44 minutes, 40 seconds)
2025-09-14 11:53:59,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 11:54:06,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2284.06372 ± 662.445
2025-09-14 11:54:06,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2373.7297), np.float32(3438.3357), np.float32(2993.3418), np.float32(1460.3568), np.float32(2895.6404), np.float32(1773.961), np.float32(2749.415), np.float32(1613.3041), np.float32(1912.3112), np.float32(1630.243)]
2025-09-14 11:54:06,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:54:06,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 42 minutes, 2 seconds)
2025-09-14 11:56:35,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 11:56:43,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2125.94507 ± 675.807
2025-09-14 11:56:43,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2224.2397), np.float32(2553.228), np.float32(2432.952), np.float32(1187.0322), np.float32(2729.5007), np.float32(2621.4595), np.float32(2014.2147), np.float32(1114.1598), np.float32(1257.3752), np.float32(3125.291)]
2025-09-14 11:56:43,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:56:43,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 39 minutes, 21 seconds)
2025-09-14 11:59:12,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 11:59:19,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2325.49341 ± 673.685
2025-09-14 11:59:19,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2469.346), np.float32(1683.0703), np.float32(1305.3922), np.float32(2783.9011), np.float32(2969.7104), np.float32(2904.511), np.float32(1225.6727), np.float32(3199.5864), np.float32(2610.3235), np.float32(2103.4192)]
2025-09-14 11:59:19,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:59:19,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 36 minutes, 42 seconds)
2025-09-14 12:01:49,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 12:01:56,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2319.58960 ± 693.159
2025-09-14 12:01:56,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1734.863), np.float32(3552.4146), np.float32(2383.5269), np.float32(1672.9596), np.float32(2143.6255), np.float32(1508.9738), np.float32(1833.0227), np.float32(3071.243), np.float32(2009.0013), np.float32(3286.266)]
2025-09-14 12:01:56,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:01:56,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 34 minutes, 6 seconds)
2025-09-14 12:04:26,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 12:04:33,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1930.63904 ± 820.349
2025-09-14 12:04:33,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2936.8586), np.float32(1886.9929), np.float32(2949.4023), np.float32(1503.8492), np.float32(2476.716), np.float32(1358.235), np.float32(1436.3328), np.float32(284.96765), np.float32(2840.5742), np.float32(1632.4614)]
2025-09-14 12:04:33,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:04:33,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 31 minutes, 27 seconds)
2025-09-14 12:07:03,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 12:07:10,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2451.75342 ± 759.810
2025-09-14 12:07:10,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3206.6323), np.float32(2712.7295), np.float32(3289.6807), np.float32(3497.5964), np.float32(2082.926), np.float32(2120.1953), np.float32(1257.034), np.float32(1644.3486), np.float32(3058.39), np.float32(1648.0011)]
2025-09-14 12:07:10,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:07:10,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 28 minutes, 49 seconds)
2025-09-14 12:09:39,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 12:09:47,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1805.85327 ± 698.370
2025-09-14 12:09:47,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1243.7852), np.float32(2822.744), np.float32(2220.4104), np.float32(3305.102), np.float32(1419.6033), np.float32(1492.4324), np.float32(1286.2798), np.float32(1727.7855), np.float32(1341.9231), np.float32(1198.467)]
2025-09-14 12:09:47,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:09:47,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 26 minutes, 14 seconds)
2025-09-14 12:12:16,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 12:12:23,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2118.59180 ± 518.601
2025-09-14 12:12:23,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2038.445), np.float32(2175.519), np.float32(1915.5651), np.float32(2027.0535), np.float32(1277.2383), np.float32(2644.044), np.float32(2809.7388), np.float32(2152.7402), np.float32(1310.1659), np.float32(2835.4077)]
2025-09-14 12:12:23,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:12:23,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 23 minutes, 38 seconds)
2025-09-14 12:14:53,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 12:15:00,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2017.78162 ± 528.006
2025-09-14 12:15:00,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1634.2468), np.float32(2291.32), np.float32(1805.3949), np.float32(2818.943), np.float32(1830.6592), np.float32(1336.5167), np.float32(2783.804), np.float32(1622.2305), np.float32(1463.2386), np.float32(2591.4639)]
2025-09-14 12:15:00,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:15:00,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 21 minutes, 1 second)
2025-09-14 12:17:30,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 12:17:37,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2134.22437 ± 546.018
2025-09-14 12:17:37,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1878.6923), np.float32(1523.6149), np.float32(1776.3303), np.float32(2111.6052), np.float32(1918.7827), np.float32(2933.0227), np.float32(1696.0123), np.float32(2667.406), np.float32(1679.3647), np.float32(3157.4114)]
2025-09-14 12:17:37,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:17:37,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 18 minutes, 24 seconds)
2025-09-14 12:20:07,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 12:20:14,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2537.26221 ± 879.135
2025-09-14 12:20:14,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2722.4631), np.float32(2363.5178), np.float32(3265.144), np.float32(3105.4585), np.float32(1524.595), np.float32(3150.644), np.float32(3501.2302), np.float32(804.66064), np.float32(3324.4883), np.float32(1610.419)]
2025-09-14 12:20:14,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:20:14,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 15 minutes, 48 seconds)
2025-09-14 12:22:43,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 12:22:51,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2323.12231 ± 753.245
2025-09-14 12:22:51,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2358.4963), np.float32(1878.7659), np.float32(1253.2198), np.float32(2944.552), np.float32(1758.594), np.float32(2356.0195), np.float32(3243.1162), np.float32(2706.6018), np.float32(1219.7771), np.float32(3512.081)]
2025-09-14 12:22:51,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:22:51,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 13 minutes, 10 seconds)
2025-09-14 12:25:21,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 12:25:28,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2850.14209 ± 779.055
2025-09-14 12:25:28,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1634.1954), np.float32(3197.279), np.float32(1847.9585), np.float32(2269.8984), np.float32(3564.7993), np.float32(2072.4697), np.float32(3068.4204), np.float32(3958.4307), np.float32(3316.6743), np.float32(3571.296)]
2025-09-14 12:25:28,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:25:28,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (2850.14) for latency 12
2025-09-14 12:25:28,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 10 minutes, 36 seconds)
2025-09-14 12:27:58,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 12:28:05,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2528.38770 ± 879.108
2025-09-14 12:28:05,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1424.2617), np.float32(2818.2822), np.float32(3416.7585), np.float32(1808.1477), np.float32(1244.097), np.float32(1909.8893), np.float32(2154.842), np.float32(3361.604), np.float32(3387.1433), np.float32(3758.8516)]
2025-09-14 12:28:05,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:28:05,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 8 minutes, 1 second)
2025-09-14 12:30:35,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 12:30:42,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2454.31592 ± 810.097
2025-09-14 12:30:42,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1867.4965), np.float32(3739.9019), np.float32(2033.4491), np.float32(1829.3344), np.float32(1720.4641), np.float32(3795.5464), np.float32(3065.9204), np.float32(1349.3523), np.float32(2432.7202), np.float32(2708.973)]
2025-09-14 12:30:42,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:30:42,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 5 minutes, 26 seconds)
2025-09-14 12:33:12,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 12:33:20,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2233.94678 ± 774.088
2025-09-14 12:33:20,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1741.8962), np.float32(2784.6973), np.float32(3311.2666), np.float32(2776.6528), np.float32(423.87405), np.float32(2163.6177), np.float32(2125.7446), np.float32(2511.6624), np.float32(1692.8597), np.float32(2807.195)]
2025-09-14 12:33:20,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:33:20,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 2 minutes, 51 seconds)
2025-09-14 12:35:50,204 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 12:35:57,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2702.25781 ± 733.107
2025-09-14 12:35:57,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3569.152), np.float32(1963.9647), np.float32(2053.8945), np.float32(3099.9482), np.float32(1455.9393), np.float32(2151.6284), np.float32(2698.842), np.float32(2819.8496), np.float32(3616.1912), np.float32(3593.169)]
2025-09-14 12:35:57,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:35:57,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 17 seconds)
2025-09-14 12:38:27,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 12:38:34,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2634.52393 ± 798.712
2025-09-14 12:38:34,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3086.3037), np.float32(2378.534), np.float32(1468.5052), np.float32(3746.9346), np.float32(3300.5774), np.float32(3596.1162), np.float32(2943.2463), np.float32(1786.1514), np.float32(1484.3142), np.float32(2554.5583)]
2025-09-14 12:38:34,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:38:34,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 57 minutes, 39 seconds)
2025-09-14 12:41:04,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 12:41:11,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2372.83350 ± 748.459
2025-09-14 12:41:11,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1764.0511), np.float32(2492.735), np.float32(3408.6677), np.float32(1695.5208), np.float32(4057.2305), np.float32(2120.0173), np.float32(2083.6086), np.float32(2252.561), np.float32(2306.7139), np.float32(1547.2306)]
2025-09-14 12:41:11,803 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:41:11,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 55 minutes, 2 seconds)
2025-09-14 12:43:41,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 12:43:48,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2366.15381 ± 770.522
2025-09-14 12:43:48,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3206.9934), np.float32(1189.0322), np.float32(1961.3756), np.float32(3229.8064), np.float32(3084.2625), np.float32(1935.3455), np.float32(1606.43), np.float32(1650.8616), np.float32(2361.44), np.float32(3435.9927)]
2025-09-14 12:43:48,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:43:48,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 52 minutes, 23 seconds)
2025-09-14 12:46:18,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 12:46:25,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2121.31836 ± 501.333
2025-09-14 12:46:25,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1890.9027), np.float32(1587.0035), np.float32(1908.4711), np.float32(2746.669), np.float32(1742.6813), np.float32(1180.0055), np.float32(2525.4675), np.float32(2568.257), np.float32(2598.1084), np.float32(2465.6177)]
2025-09-14 12:46:25,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:46:25,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 49 minutes, 45 seconds)
2025-09-14 12:48:55,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 12:49:02,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2958.03638 ± 766.955
2025-09-14 12:49:02,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3413.6948), np.float32(3619.9812), np.float32(3210.1401), np.float32(3845.5098), np.float32(3740.3066), np.float32(2478.9243), np.float32(2533.4177), np.float32(1692.5006), np.float32(3354.5813), np.float32(1691.3038)]
2025-09-14 12:49:02,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:49:02,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (2958.04) for latency 12
2025-09-14 12:49:02,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 47 minutes, 6 seconds)
2025-09-14 12:51:32,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 12:51:39,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1881.77734 ± 939.724
2025-09-14 12:51:39,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2050.0657), np.float32(1294.8138), np.float32(1932.6694), np.float32(1974.2961), np.float32(1284.3424), np.float32(-182.71725), np.float32(1874.8408), np.float32(3101.6914), np.float32(2090.797), np.float32(3396.9746)]
2025-09-14 12:51:39,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:51:39,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 44 minutes, 28 seconds)
2025-09-14 12:54:09,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 12:54:16,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3045.40234 ± 662.281
2025-09-14 12:54:16,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2842.0266), np.float32(3402.6453), np.float32(2384.8726), np.float32(3399.5806), np.float32(2542.563), np.float32(1571.0494), np.float32(3433.6897), np.float32(3677.0254), np.float32(3422.2825), np.float32(3778.2876)]
2025-09-14 12:54:16,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:54:16,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (3045.40) for latency 12
2025-09-14 12:54:16,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 41 minutes, 52 seconds)
2025-09-14 12:56:46,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 12:56:54,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2448.09644 ± 797.663
2025-09-14 12:56:54,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2325.4985), np.float32(3722.5435), np.float32(3441.4385), np.float32(1346.1042), np.float32(3463.384), np.float32(2298.9124), np.float32(1478.4353), np.float32(2506.8962), np.float32(2065.282), np.float32(1832.47)]
2025-09-14 12:56:54,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:56:54,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 39 minutes, 16 seconds)
2025-09-14 12:59:23,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 12:59:30,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2164.51416 ± 786.469
2025-09-14 12:59:30,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1662.2983), np.float32(1627.2063), np.float32(1085.2173), np.float32(2568.9236), np.float32(3658.3755), np.float32(3190.2043), np.float32(1703.3344), np.float32(2210.5518), np.float32(2578.391), np.float32(1360.636)]
2025-09-14 12:59:30,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:59:30,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 36 minutes, 38 seconds)
2025-09-14 13:02:00,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 13:02:07,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2609.70166 ± 787.757
2025-09-14 13:02:07,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2091.0098), np.float32(2865.852), np.float32(2016.8062), np.float32(1293.0841), np.float32(1933.4523), np.float32(2089.2097), np.float32(3491.8389), np.float32(3737.7695), np.float32(3083.8965), np.float32(3494.0967)]
2025-09-14 13:02:07,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:02:07,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 34 minutes, 1 second)
2025-09-14 13:04:37,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 13:04:45,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2367.90210 ± 865.688
2025-09-14 13:04:45,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1205.8936), np.float32(1988.6414), np.float32(2488.724), np.float32(3533.117), np.float32(2558.8938), np.float32(3833.579), np.float32(1329.3779), np.float32(3185.1675), np.float32(1787.4735), np.float32(1768.1539)]
2025-09-14 13:04:45,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:04:45,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 31 minutes, 24 seconds)
2025-09-14 13:07:14,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 13:07:22,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2333.48853 ± 734.665
2025-09-14 13:07:22,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3559.405), np.float32(2313.4756), np.float32(2394.22), np.float32(1485.3253), np.float32(1284.4537), np.float32(1709.9595), np.float32(3491.9006), np.float32(2067.1716), np.float32(2195.6145), np.float32(2833.359)]
2025-09-14 13:07:22,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:07:22,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 28 minutes, 47 seconds)
2025-09-14 13:09:51,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 13:09:59,101 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 3069.76831 ± 828.396
2025-09-14 13:09:59,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2095.6135), np.float32(3386.3418), np.float32(1305.6053), np.float32(3798.0286), np.float32(3657.4956), np.float32(3649.1372), np.float32(3801.9714), np.float32(2875.608), np.float32(2387.3335), np.float32(3740.5483)]
2025-09-14 13:09:59,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:09:59,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1226 [INFO]: New best (3069.77) for latency 12
2025-09-14 13:09:59,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 26 minutes, 9 seconds)
2025-09-14 13:12:28,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 13:12:36,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2056.39233 ± 649.995
2025-09-14 13:12:36,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3134.2095), np.float32(2832.7312), np.float32(1967.5994), np.float32(1330.5969), np.float32(1975.8551), np.float32(1628.1395), np.float32(1623.9064), np.float32(1575.2649), np.float32(3036.5398), np.float32(1459.0825)]
2025-09-14 13:12:36,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:12:36,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 23 minutes, 33 seconds)
2025-09-14 13:15:06,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 13:15:13,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2463.18140 ± 725.728
2025-09-14 13:15:13,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2074.956), np.float32(3308.049), np.float32(1525.1932), np.float32(2267.9946), np.float32(3123.008), np.float32(2510.4797), np.float32(1610.5088), np.float32(3028.4604), np.float32(1599.8158), np.float32(3583.3481)]
2025-09-14 13:15:13,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:15:13,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 20 minutes, 57 seconds)
2025-09-14 13:17:43,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 13:17:50,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2153.78857 ± 737.882
2025-09-14 13:17:50,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1863.7325), np.float32(2978.1921), np.float32(3397.7441), np.float32(1382.9523), np.float32(2705.9417), np.float32(2067.3762), np.float32(1569.4089), np.float32(2890.203), np.float32(1447.1412), np.float32(1235.1929)]
2025-09-14 13:17:50,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:17:50,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 18 minutes, 20 seconds)
2025-09-14 13:20:20,547 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 13:20:27,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2893.28564 ± 565.388
2025-09-14 13:20:27,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2616.9285), np.float32(1909.3899), np.float32(3549.7449), np.float32(3094.463), np.float32(3577.6692), np.float32(2757.853), np.float32(2719.1917), np.float32(2033.9222), np.float32(3210.8035), np.float32(3462.8894)]
2025-09-14 13:20:27,856 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:20:27,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 15 minutes, 42 seconds)
2025-09-14 13:22:57,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 13:23:04,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2523.02002 ± 821.484
2025-09-14 13:23:04,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1866.8153), np.float32(1603.0778), np.float32(3376.2878), np.float32(2539.4983), np.float32(1967.2241), np.float32(1599.1812), np.float32(1744.7646), np.float32(3601.137), np.float32(3596.2983), np.float32(3335.9146)]
2025-09-14 13:23:04,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:23:04,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 13 minutes, 5 seconds)
2025-09-14 13:25:34,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 13:25:41,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2680.21997 ± 779.463
2025-09-14 13:25:41,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3357.754), np.float32(3468.759), np.float32(3446.7622), np.float32(2263.9163), np.float32(2492.5671), np.float32(3281.8552), np.float32(1700.8002), np.float32(3503.3015), np.float32(1486.1877), np.float32(1800.2971)]
2025-09-14 13:25:41,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:25:41,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 10 minutes, 28 seconds)
2025-09-14 13:28:11,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 13:28:19,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2300.55542 ± 865.435
2025-09-14 13:28:19,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3098.957), np.float32(1300.8337), np.float32(2035.0374), np.float32(3444.4915), np.float32(1383.2421), np.float32(1314.6835), np.float32(2339.9194), np.float32(2703.5803), np.float32(1646.1455), np.float32(3738.663)]
2025-09-14 13:28:19,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:28:19,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 7 minutes, 51 seconds)
2025-09-14 13:30:49,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 13:30:56,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2761.56812 ± 902.509
2025-09-14 13:30:56,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1730.9156), np.float32(2583.386), np.float32(3801.228), np.float32(2382.3254), np.float32(4002.058), np.float32(3471.239), np.float32(1663.6099), np.float32(3942.9795), np.float32(1819.0809), np.float32(2218.859)]
2025-09-14 13:30:56,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:30:56,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 5 minutes, 14 seconds)
2025-09-14 13:33:26,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 13:33:33,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 1878.11450 ± 734.999
2025-09-14 13:33:33,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1323.4746), np.float32(1502.1971), np.float32(3489.2935), np.float32(1365.0288), np.float32(1477.5522), np.float32(2337.37), np.float32(1216.0283), np.float32(1432.622), np.float32(1728.2136), np.float32(2909.3647)]
2025-09-14 13:33:33,510 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:33:33,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 37 seconds)
2025-09-14 13:35:52,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1214 [DEBUG]: Evaluating for latency 12...
2025-09-14 13:35:58,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1221 [DEBUG]: Total Reward: 2497.63062 ± 1009.473
2025-09-14 13:35:58,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2625.8723), np.float32(3970.166), np.float32(1437.8271), np.float32(2943.3862), np.float32(2408.3538), np.float32(855.1438), np.float32(1275.8041), np.float32(2387.9863), np.float32(3934.0718), np.float32(3137.6956)]
2025-09-14 13:35:58,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:35:58,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille75-halfcheetah):1251 [DEBUG]: Training session finished
