2025-09-14 08:43:01,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1108 [DEBUG]: logdir: _logs/noise-eval-v2/halfcheetah/bpql-noise_0.150-delay_6
2025-09-14 08:43:01,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1109 [DEBUG]: trainer_prefix: noise-eval-v2/halfcheetah/bpql-noise_0.150-delay_6
2025-09-14 08:43:01,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'6': <latency_env.delayed_mdp.ConstantDelay object at 0x7f2b7f447dd0>}
2025-09-14 08:43:01,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1111 [DEBUG]: using device: cpu
2025-09-14 08:43:01,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-14 08:43:01,649 baseline-bpql-noisepromille150-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=53, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-14 08:43:01,649 baseline-bpql-noisepromille150-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-14 08:43:03,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-14 08:43:03,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-14 08:46:17,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 08:46:24,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: -289.04321 ± 39.147
2025-09-14 08:46:24,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-272.85757), np.float32(-290.7578), np.float32(-282.67743), np.float32(-212.68289), np.float32(-257.67535), np.float32(-301.7613), np.float32(-276.96042), np.float32(-368.3528), np.float32(-298.78107), np.float32(-327.92548)]
2025-09-14 08:46:24,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 08:46:24,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (-289.04) for latency 6
2025-09-14 08:46:24,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 5 hours, 32 minutes, 12 seconds)
2025-09-14 08:49:40,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 08:49:47,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: -193.44861 ± 56.439
2025-09-14 08:49:47,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-213.9201), np.float32(-305.88113), np.float32(-146.80179), np.float32(-167.45483), np.float32(-151.72064), np.float32(-117.13269), np.float32(-192.13261), np.float32(-196.88086), np.float32(-279.64905), np.float32(-162.9124)]
2025-09-14 08:49:47,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 08:49:47,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (-193.45) for latency 6
2025-09-14 08:49:47,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 5 hours, 29 minutes, 59 seconds)
2025-09-14 08:53:05,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 08:53:11,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: -106.70596 ± 103.811
2025-09-14 08:53:11,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(-188.31198), np.float32(-56.843243), np.float32(-254.90495), np.float32(19.47485), np.float32(-104.31417), np.float32(-40.286755), np.float32(-149.6496), np.float32(-286.33072), np.float32(-21.354162), np.float32(15.461007)]
2025-09-14 08:53:11,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 08:53:11,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (-106.71) for latency 6
2025-09-14 08:53:11,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 5 hours, 27 minutes, 54 seconds)
2025-09-14 08:56:30,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 08:56:38,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 131.25021 ± 72.841
2025-09-14 08:56:38,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(85.63121), np.float32(116.17762), np.float32(175.08342), np.float32(179.87802), np.float32(119.17208), np.float32(161.62213), np.float32(254.58572), np.float32(-40.487247), np.float32(111.85898), np.float32(148.98006)]
2025-09-14 08:56:38,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 08:56:38,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (131.25) for latency 6
2025-09-14 08:56:38,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 5 hours, 25 minutes, 53 seconds)
2025-09-14 08:59:54,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:00:02,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 361.98474 ± 85.858
2025-09-14 09:00:02,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(393.6941), np.float32(311.89172), np.float32(166.53975), np.float32(447.9108), np.float32(495.80557), np.float32(363.14124), np.float32(319.84262), np.float32(419.87787), np.float32(324.85663), np.float32(376.28708)]
2025-09-14 09:00:02,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:00:02,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (361.98) for latency 6
2025-09-14 09:00:02,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 5 hours, 22 minutes, 34 seconds)
2025-09-14 09:03:20,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:03:27,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 790.77032 ± 101.837
2025-09-14 09:03:27,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(717.6717), np.float32(823.9232), np.float32(962.90784), np.float32(745.9315), np.float32(892.6452), np.float32(888.0611), np.float32(647.72327), np.float32(825.91736), np.float32(765.2114), np.float32(637.7102)]
2025-09-14 09:03:27,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:03:27,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (790.77) for latency 6
2025-09-14 09:03:27,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 5 hours, 20 minutes, 35 seconds)
2025-09-14 09:06:42,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:06:49,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1280.06274 ± 220.666
2025-09-14 09:06:49,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1208.9092), np.float32(1653.0382), np.float32(1180.5475), np.float32(1353.0594), np.float32(1114.9832), np.float32(1026.2585), np.float32(1450.2076), np.float32(1323.6245), np.float32(926.75726), np.float32(1563.2433)]
2025-09-14 09:06:49,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:06:49,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (1280.06) for latency 6
2025-09-14 09:06:49,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 5 hours, 16 minutes, 48 seconds)
2025-09-14 09:10:04,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:10:12,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1311.86194 ± 259.942
2025-09-14 09:10:12,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1079.6661), np.float32(1735.5514), np.float32(1274.1085), np.float32(1215.2615), np.float32(1332.273), np.float32(1830.0905), np.float32(1247.7484), np.float32(1040.7169), np.float32(1341.2457), np.float32(1021.95654)]
2025-09-14 09:10:12,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:10:12,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (1311.86) for latency 6
2025-09-14 09:10:12,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 5 hours, 13 minutes, 7 seconds)
2025-09-14 09:13:23,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:13:30,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1130.29224 ± 247.929
2025-09-14 09:13:30,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(761.2471), np.float32(898.30585), np.float32(1162.9044), np.float32(881.61774), np.float32(1200.2316), np.float32(1357.5934), np.float32(1148.8145), np.float32(997.43353), np.float32(1654.9025), np.float32(1239.8724)]
2025-09-14 09:13:30,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:13:30,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 5 hours, 7 minutes, 9 seconds)
2025-09-14 09:16:42,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:16:50,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1361.26184 ± 494.040
2025-09-14 09:16:50,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1371.0854), np.float32(1551.7189), np.float32(2209.5115), np.float32(1534.3905), np.float32(1163.6079), np.float32(1095.8206), np.float32(1058.4364), np.float32(994.5976), np.float32(2127.6643), np.float32(505.78568)]
2025-09-14 09:16:50,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:16:50,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (1361.26) for latency 6
2025-09-14 09:16:50,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 5 hours, 2 minutes, 28 seconds)
2025-09-14 09:20:03,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:20:10,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2258.76025 ± 434.973
2025-09-14 09:20:10,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1102.4246), np.float32(2652.7114), np.float32(2613.4553), np.float32(2388.735), np.float32(2072.1035), np.float32(2712.247), np.float32(2229.936), np.float32(2265.0784), np.float32(2188.614), np.float32(2362.2986)]
2025-09-14 09:20:10,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:20:10,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (2258.76) for latency 6
2025-09-14 09:20:10,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 57 minutes, 32 seconds)
2025-09-14 09:23:24,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:23:32,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1787.16541 ± 704.393
2025-09-14 09:23:32,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2522.954), np.float32(2599.7068), np.float32(2025.0714), np.float32(506.53915), np.float32(2658.1455), np.float32(1241.6833), np.float32(1302.6559), np.float32(2187.758), np.float32(1005.55634), np.float32(1821.5834)]
2025-09-14 09:23:32,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:23:32,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 54 minutes, 11 seconds)
2025-09-14 09:26:43,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:26:51,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1952.98804 ± 553.341
2025-09-14 09:26:51,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2823.7493), np.float32(1737.0526), np.float32(2423.7935), np.float32(1944.2422), np.float32(2721.6897), np.float32(1425.2057), np.float32(2232.0586), np.float32(1459.4014), np.float32(1072.513), np.float32(1690.1736)]
2025-09-14 09:26:51,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:26:51,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 49 minutes, 34 seconds)
2025-09-14 09:30:03,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:30:11,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2188.28394 ± 649.446
2025-09-14 09:30:11,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1540.944), np.float32(1877.6401), np.float32(2491.3516), np.float32(1012.484), np.float32(2822.211), np.float32(2777.761), np.float32(1385.3108), np.float32(2456.0732), np.float32(2525.621), np.float32(2993.444)]
2025-09-14 09:30:11,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:30:11,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 46 minutes, 44 seconds)
2025-09-14 09:33:24,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:33:32,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2334.93872 ± 656.498
2025-09-14 09:33:32,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2593.553), np.float32(2949.9402), np.float32(1110.5289), np.float32(1705.0365), np.float32(2368.6912), np.float32(1334.2141), np.float32(2715.5337), np.float32(2837.0168), np.float32(2900.6182), np.float32(2834.2537)]
2025-09-14 09:33:32,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:33:32,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (2334.94) for latency 6
2025-09-14 09:33:32,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 43 minutes, 55 seconds)
2025-09-14 09:36:45,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:36:52,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2406.69897 ± 584.699
2025-09-14 09:36:52,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2849.9316), np.float32(2434.0005), np.float32(1482.2068), np.float32(2898.0134), np.float32(1897.5659), np.float32(3134.167), np.float32(3050.9531), np.float32(2674.0986), np.float32(1532.2369), np.float32(2113.8164)]
2025-09-14 09:36:52,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:36:52,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (2406.70) for latency 6
2025-09-14 09:36:52,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 4 hours, 40 minutes, 28 seconds)
2025-09-14 09:40:03,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:40:12,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 1980.51990 ± 464.861
2025-09-14 09:40:12,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1145.2965), np.float32(2659.2222), np.float32(1736.2639), np.float32(1544.8662), np.float32(1947.7238), np.float32(2383.1833), np.float32(2137.0464), np.float32(2551.7769), np.float32(2170.1462), np.float32(1529.6748)]
2025-09-14 09:40:12,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:40:12,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours, 36 minutes, 35 seconds)
2025-09-14 09:43:24,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:43:32,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2509.64404 ± 658.796
2025-09-14 09:43:32,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3204.9348), np.float32(2819.2844), np.float32(1761.5427), np.float32(3123.134), np.float32(2537.7905), np.float32(3038.9048), np.float32(961.9557), np.float32(2322.3772), np.float32(2845.2961), np.float32(2481.2185)]
2025-09-14 09:43:32,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:43:32,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (2509.64) for latency 6
2025-09-14 09:43:32,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 33 minutes, 40 seconds)
2025-09-14 09:46:45,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:46:53,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2140.77954 ± 838.641
2025-09-14 09:46:53,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3448.704), np.float32(1343.0443), np.float32(2104.7083), np.float32(2849.5554), np.float32(1352.227), np.float32(1148.6993), np.float32(2628.412), np.float32(3248.0579), np.float32(2196.9019), np.float32(1087.4851)]
2025-09-14 09:46:53,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:46:53,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 4 hours, 30 minutes, 35 seconds)
2025-09-14 09:50:05,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:50:13,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2674.56494 ± 510.952
2025-09-14 09:50:13,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2847.764), np.float32(2931.8816), np.float32(2612.1711), np.float32(2177.7249), np.float32(3182.2336), np.float32(2945.5635), np.float32(1525.0248), np.float32(3320.5466), np.float32(2903.9863), np.float32(2298.7537)]
2025-09-14 09:50:13,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:50:13,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (2674.56) for latency 6
2025-09-14 09:50:13,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 4 hours, 26 minutes, 53 seconds)
2025-09-14 09:53:28,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:53:36,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2716.82666 ± 667.405
2025-09-14 09:53:36,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1232.614), np.float32(3050.797), np.float32(2954.2305), np.float32(3193.477), np.float32(3023.1194), np.float32(3275.8677), np.float32(3109.8464), np.float32(2844.6904), np.float32(1598.4276), np.float32(2885.1958)]
2025-09-14 09:53:36,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:53:36,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (2716.83) for latency 6
2025-09-14 09:53:36,747 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 4 hours, 24 minutes, 25 seconds)
2025-09-14 09:56:56,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 09:57:04,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2619.82275 ± 748.824
2025-09-14 09:57:04,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3141.2327), np.float32(2843.2073), np.float32(2171.715), np.float32(1257.1847), np.float32(1434.9623), np.float32(3299.4968), np.float32(3159.4746), np.float32(2287.1777), np.float32(3424.0823), np.float32(3179.694)]
2025-09-14 09:57:04,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 09:57:04,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 4 hours, 23 minutes, 19 seconds)
2025-09-14 10:00:21,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:00:29,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2544.19653 ± 828.986
2025-09-14 10:00:29,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3496.1106), np.float32(3253.7148), np.float32(1443.7157), np.float32(2636.285), np.float32(3589.9448), np.float32(3230.6548), np.float32(1824.2292), np.float32(1575.7905), np.float32(1496.0165), np.float32(2895.5034)]
2025-09-14 10:00:29,684 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:00:29,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 4 hours, 21 minutes)
2025-09-14 10:03:46,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:03:55,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2720.76904 ± 815.367
2025-09-14 10:03:55,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2781.0806), np.float32(3430.0518), np.float32(3536.862), np.float32(2305.5386), np.float32(1344.3829), np.float32(3196.4014), np.float32(3167.7551), np.float32(3499.7878), np.float32(1180.0735), np.float32(2765.7563)]
2025-09-14 10:03:55,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:03:55,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (2720.77) for latency 6
2025-09-14 10:03:55,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 4 hours, 18 minutes, 57 seconds)
2025-09-14 10:07:15,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:07:23,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3257.29688 ± 127.485
2025-09-14 10:07:23,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3105.6047), np.float32(3247.9802), np.float32(3289.9863), np.float32(3141.565), np.float32(3297.1416), np.float32(3035.8357), np.float32(3446.203), np.float32(3291.532), np.float32(3450.6658), np.float32(3266.4526)]
2025-09-14 10:07:23,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:07:23,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (3257.30) for latency 6
2025-09-14 10:07:23,188 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 4 hours, 17 minutes, 28 seconds)
2025-09-14 10:10:42,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:10:50,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2945.90674 ± 892.187
2025-09-14 10:10:50,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3710.6116), np.float32(3478.0017), np.float32(2225.4187), np.float32(3672.3286), np.float32(3566.7146), np.float32(1139.4849), np.float32(3180.3582), np.float32(3509.7021), np.float32(1592.4149), np.float32(3384.0325)]
2025-09-14 10:10:50,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:10:50,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 4 hours, 15 minutes)
2025-09-14 10:14:08,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:14:17,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2863.19604 ± 836.764
2025-09-14 10:14:17,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3340.6885), np.float32(3212.9617), np.float32(3524.9094), np.float32(1424.271), np.float32(3458.6575), np.float32(3316.0276), np.float32(3689.6558), np.float32(3217.8062), np.float32(2020.7804), np.float32(1426.2029)]
2025-09-14 10:14:17,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:14:17,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 4 hours, 11 minutes, 18 seconds)
2025-09-14 10:17:36,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:17:44,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2632.28369 ± 835.509
2025-09-14 10:17:44,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1415.0417), np.float32(1765.935), np.float32(3645.4773), np.float32(1559.646), np.float32(2320.2212), np.float32(2222.2498), np.float32(3593.8928), np.float32(3391.8508), np.float32(3478.072), np.float32(2930.4492)]
2025-09-14 10:17:44,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:17:44,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 4 hours, 8 minutes, 16 seconds)
2025-09-14 10:21:01,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:21:09,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3277.87744 ± 524.343
2025-09-14 10:21:09,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3028.0996), np.float32(3576.4067), np.float32(3547.4727), np.float32(3283.4617), np.float32(3496.5156), np.float32(1788.589), np.float32(3495.3625), np.float32(3528.6877), np.float32(3385.36), np.float32(3648.819)]
2025-09-14 10:21:09,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:21:09,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (3277.88) for latency 6
2025-09-14 10:21:09,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 4 hours, 4 minutes, 43 seconds)
2025-09-14 10:24:29,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:24:38,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3355.18872 ± 343.122
2025-09-14 10:24:38,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3189.8306), np.float32(3434.3662), np.float32(3247.5686), np.float32(3896.4473), np.float32(3093.2114), np.float32(3657.189), np.float32(2582.5461), np.float32(3600.9192), np.float32(3346.6216), np.float32(3503.188)]
2025-09-14 10:24:38,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:24:38,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (3355.19) for latency 6
2025-09-14 10:24:38,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 4 hours, 1 minute, 28 seconds)
2025-09-14 10:27:58,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:28:06,043 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 2883.09424 ± 835.815
2025-09-14 10:28:06,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3236.0398), np.float32(1356.0375), np.float32(2525.513), np.float32(3426.5938), np.float32(1258.5947), np.float32(3535.502), np.float32(3513.9036), np.float32(3414.58), np.float32(3428.2375), np.float32(3135.941)]
2025-09-14 10:28:06,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:28:06,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 58 minutes, 9 seconds)
2025-09-14 10:31:23,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:31:31,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3341.54370 ± 607.297
2025-09-14 10:31:31,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3113.2534), np.float32(3662.655), np.float32(3793.3018), np.float32(3779.385), np.float32(3444.0435), np.float32(1612.6241), np.float32(3618.7717), np.float32(3361.7952), np.float32(3507.6042), np.float32(3522.0056)]
2025-09-14 10:31:31,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:31:31,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 54 minutes, 26 seconds)
2025-09-14 10:34:49,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:34:58,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3029.15552 ± 826.126
2025-09-14 10:34:58,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3390.6829), np.float32(3572.2937), np.float32(3690.3691), np.float32(3611.7864), np.float32(2908.0464), np.float32(3659.0874), np.float32(1104.4646), np.float32(3243.969), np.float32(3266.6147), np.float32(1844.2411)]
2025-09-14 10:34:58,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:34:58,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 50 minutes, 56 seconds)
2025-09-14 10:38:17,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:38:25,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3479.39648 ± 93.249
2025-09-14 10:38:25,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3694.6694), np.float32(3518.1755), np.float32(3563.1453), np.float32(3475.5857), np.float32(3339.683), np.float32(3417.3008), np.float32(3455.774), np.float32(3471.5454), np.float32(3467.853), np.float32(3390.232)]
2025-09-14 10:38:25,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:38:25,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (3479.40) for latency 6
2025-09-14 10:38:25,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 47 minutes, 55 seconds)
2025-09-14 10:41:45,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:41:54,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3306.57227 ± 457.653
2025-09-14 10:41:54,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3618.495), np.float32(3605.842), np.float32(2627.9924), np.float32(3317.775), np.float32(3635.1704), np.float32(3389.6926), np.float32(3186.182), np.float32(3817.941), np.float32(3556.81), np.float32(2309.8206)]
2025-09-14 10:41:54,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:41:54,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 44 minutes, 29 seconds)
2025-09-14 10:45:14,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:45:22,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3479.30786 ± 177.123
2025-09-14 10:45:22,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3227.0266), np.float32(3582.6292), np.float32(3797.8599), np.float32(3660.053), np.float32(3581.9197), np.float32(3260.556), np.float32(3276.814), np.float32(3408.1074), np.float32(3502.5813), np.float32(3495.5325)]
2025-09-14 10:45:22,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:45:22,053 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 41 minutes)
2025-09-14 10:48:38,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:48:45,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3107.42236 ± 905.787
2025-09-14 10:48:45,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1285.5521), np.float32(3667.64), np.float32(3666.514), np.float32(3805.9727), np.float32(1461.8169), np.float32(3523.743), np.float32(3767.4465), np.float32(3765.4788), np.float32(2972.0708), np.float32(3157.9924)]
2025-09-14 10:48:45,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:48:45,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 37 minutes, 8 seconds)
2025-09-14 10:52:05,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:52:14,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3535.28125 ± 208.939
2025-09-14 10:52:14,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3353.826), np.float32(3723.9126), np.float32(3350.021), np.float32(3854.1672), np.float32(3605.0215), np.float32(3486.2554), np.float32(3566.6584), np.float32(3532.9976), np.float32(3760.6128), np.float32(3119.341)]
2025-09-14 10:52:14,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:52:14,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (3535.28) for latency 6
2025-09-14 10:52:14,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 34 minutes, 5 seconds)
2025-09-14 10:55:33,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:55:41,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3550.12427 ± 137.992
2025-09-14 10:55:41,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3225.0842), np.float32(3638.5017), np.float32(3595.8206), np.float32(3699.7253), np.float32(3701.2063), np.float32(3625.2751), np.float32(3541.3125), np.float32(3418.5461), np.float32(3474.8074), np.float32(3580.9624)]
2025-09-14 10:55:41,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:55:41,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (3550.12) for latency 6
2025-09-14 10:55:41,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 30 minutes, 37 seconds)
2025-09-14 10:59:01,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 10:59:10,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3553.92041 ± 249.359
2025-09-14 10:59:10,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3777.2014), np.float32(3110.2693), np.float32(3663.4639), np.float32(3342.1064), np.float32(3671.9722), np.float32(3323.824), np.float32(3336.39), np.float32(3896.8938), np.float32(3569.6035), np.float32(3847.4785)]
2025-09-14 10:59:10,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 10:59:10,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (3553.92) for latency 6
2025-09-14 10:59:10,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 27 minutes, 10 seconds)
2025-09-14 11:02:29,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:02:37,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3337.79150 ± 493.097
2025-09-14 11:02:37,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3829.0664), np.float32(2940.5217), np.float32(3524.892), np.float32(3504.3044), np.float32(2075.1333), np.float32(3575.1426), np.float32(3533.955), np.float32(3073.5996), np.float32(3693.409), np.float32(3627.8936)]
2025-09-14 11:02:37,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:02:37,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 23 minutes, 41 seconds)
2025-09-14 11:05:47,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:05:54,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3543.00000 ± 176.817
2025-09-14 11:05:54,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3731.0234), np.float32(3886.3035), np.float32(3461.268), np.float32(3699.1926), np.float32(3461.0269), np.float32(3318.0186), np.float32(3408.3123), np.float32(3565.1084), np.float32(3582.2463), np.float32(3317.4985)]
2025-09-14 11:05:54,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:05:54,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 18 minutes, 54 seconds)
2025-09-14 11:09:01,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:09:09,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3410.92969 ± 206.144
2025-09-14 11:09:09,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3423.5078), np.float32(3725.9617), np.float32(3243.5664), np.float32(3521.3801), np.float32(3407.2559), np.float32(3590.5742), np.float32(2938.791), np.float32(3282.1016), np.float32(3459.4531), np.float32(3516.7031)]
2025-09-14 11:09:09,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:09:09,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 12 minutes, 52 seconds)
2025-09-14 11:12:18,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:12:25,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3439.22412 ± 662.467
2025-09-14 11:12:25,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3591.6538), np.float32(3872.1921), np.float32(3868.5854), np.float32(3743.8564), np.float32(3704.2224), np.float32(3591.5635), np.float32(1519.9337), np.float32(3692.2385), np.float32(3234.928), np.float32(3573.0667)]
2025-09-14 11:12:25,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:12:25,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 7 minutes, 27 seconds)
2025-09-14 11:15:34,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:15:42,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3537.17773 ± 224.629
2025-09-14 11:15:42,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3664.784), np.float32(3521.764), np.float32(3319.0603), np.float32(3048.1323), np.float32(3641.7), np.float32(3621.149), np.float32(3667.6333), np.float32(3529.5789), np.float32(3930.0337), np.float32(3427.9443)]
2025-09-14 11:15:42,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:15:42,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 3 hours, 1 minute, 54 seconds)
2025-09-14 11:18:50,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:18:58,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3536.39502 ± 206.744
2025-09-14 11:18:58,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3561.316), np.float32(3511.2056), np.float32(3460.5852), np.float32(3376.5793), np.float32(3310.7593), np.float32(3525.9216), np.float32(3451.393), np.float32(3843.478), np.float32(3986.2483), np.float32(3336.4631)]
2025-09-14 11:18:58,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:18:58,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 56 minutes, 30 seconds)
2025-09-14 11:22:06,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:22:14,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3650.82544 ± 143.923
2025-09-14 11:22:14,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3667.382), np.float32(3512.235), np.float32(3811.487), np.float32(3878.8372), np.float32(3753.4019), np.float32(3788.5627), np.float32(3458.2754), np.float32(3488.7607), np.float32(3626.6985), np.float32(3522.615)]
2025-09-14 11:22:14,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:22:14,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (3650.83) for latency 6
2025-09-14 11:22:14,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 53 minutes, 3 seconds)
2025-09-14 11:25:21,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:25:29,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3557.96289 ± 213.195
2025-09-14 11:25:29,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3834.2158), np.float32(3043.2173), np.float32(3549.2615), np.float32(3637.578), np.float32(3624.76), np.float32(3344.8762), np.float32(3690.7825), np.float32(3515.7903), np.float32(3749.031), np.float32(3590.1184)]
2025-09-14 11:25:29,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:25:29,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 49 minutes, 51 seconds)
2025-09-14 11:28:38,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:28:45,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3474.42896 ± 558.317
2025-09-14 11:28:45,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3521.806), np.float32(3608.2031), np.float32(3697.7097), np.float32(1925.6862), np.float32(3274.8757), np.float32(3621.4253), np.float32(3949.5095), np.float32(3796.6077), np.float32(3376.0676), np.float32(3972.3975)]
2025-09-14 11:28:45,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:28:45,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 46 minutes, 39 seconds)
2025-09-14 11:31:53,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:32:00,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3533.71826 ± 324.023
2025-09-14 11:32:00,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3633.069), np.float32(3642.0107), np.float32(3566.2366), np.float32(3515.7124), np.float32(3777.9668), np.float32(2637.1172), np.float32(3937.4668), np.float32(3539.4465), np.float32(3536.3467), np.float32(3551.8086)]
2025-09-14 11:32:00,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:32:00,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 43 minutes, 3 seconds)
2025-09-14 11:35:09,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:35:17,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3695.63208 ± 174.106
2025-09-14 11:35:17,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3865.1143), np.float32(3741.4272), np.float32(3636.8125), np.float32(3838.4055), np.float32(3661.4429), np.float32(3843.505), np.float32(3470.2974), np.float32(3961.63), np.float32(3446.1418), np.float32(3491.5437)]
2025-09-14 11:35:17,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:35:17,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (3695.63) for latency 6
2025-09-14 11:35:17,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 39 minutes, 56 seconds)
2025-09-14 11:38:24,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:38:32,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3399.09888 ± 490.747
2025-09-14 11:38:32,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1976.5085), np.float32(3661.38), np.float32(3746.0103), np.float32(3275.9216), np.float32(3488.0613), np.float32(3584.7942), np.float32(3444.8691), np.float32(3669.7278), np.float32(3609.4724), np.float32(3534.2432)]
2025-09-14 11:38:32,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:38:32,148 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 36 minutes, 27 seconds)
2025-09-14 11:41:39,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:41:46,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3828.14307 ± 137.863
2025-09-14 11:41:46,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3992.3396), np.float32(3588.2852), np.float32(3728.329), np.float32(3675.7805), np.float32(3715.5273), np.float32(3962.5317), np.float32(3853.8022), np.float32(3938.325), np.float32(3997.998), np.float32(3828.5098)]
2025-09-14 11:41:46,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:41:46,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (3828.14) for latency 6
2025-09-14 11:41:46,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 33 minutes, 9 seconds)
2025-09-14 11:44:56,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:45:03,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3669.71094 ± 211.478
2025-09-14 11:45:03,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3951.2188), np.float32(3787.742), np.float32(3607.5168), np.float32(3804.6716), np.float32(3435.077), np.float32(3673.782), np.float32(3371.6035), np.float32(3378.245), np.float32(3989.5452), np.float32(3697.7053)]
2025-09-14 11:45:03,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:45:03,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 29 minutes, 54 seconds)
2025-09-14 11:48:11,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:48:19,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3485.56006 ± 244.058
2025-09-14 11:48:19,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3588.6191), np.float32(3660.3994), np.float32(3665.3608), np.float32(3457.5454), np.float32(3705.449), np.float32(3470.8674), np.float32(3610.462), np.float32(3496.1306), np.float32(3383.524), np.float32(2817.2424)]
2025-09-14 11:48:19,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:48:19,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 26 minutes, 45 seconds)
2025-09-14 11:51:27,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:51:35,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3573.77466 ± 231.707
2025-09-14 11:51:35,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3682.837), np.float32(3578.5173), np.float32(3706.647), np.float32(3536.405), np.float32(3697.5042), np.float32(3368.4917), np.float32(2966.4902), np.float32(3767.3513), np.float32(3742.3428), np.float32(3691.1594)]
2025-09-14 11:51:35,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:51:35,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 23 minutes, 23 seconds)
2025-09-14 11:54:42,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:54:50,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3486.21216 ± 682.868
2025-09-14 11:54:50,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3657.844), np.float32(3527.009), np.float32(3988.7803), np.float32(3556.017), np.float32(3767.4548), np.float32(3707.241), np.float32(3492.4253), np.float32(4099.433), np.float32(3546.145), np.float32(1519.7714)]
2025-09-14 11:54:50,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:54:50,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 20 minutes, 10 seconds)
2025-09-14 11:58:00,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 11:58:07,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3537.39722 ± 770.268
2025-09-14 11:58:07,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3755.7324), np.float32(1261.4852), np.float32(3490.6313), np.float32(3837.9487), np.float32(3787.0454), np.float32(3991.8286), np.float32(3976.569), np.float32(3765.2405), np.float32(3699.2766), np.float32(3808.2153)]
2025-09-14 11:58:07,725 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 11:58:07,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 17 minutes, 20 seconds)
2025-09-14 12:01:14,903 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:01:22,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3625.77197 ± 102.039
2025-09-14 12:01:22,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3761.7332), np.float32(3631.5117), np.float32(3471.5095), np.float32(3525.3613), np.float32(3631.6794), np.float32(3470.3516), np.float32(3753.4565), np.float32(3717.0476), np.float32(3616.3164), np.float32(3678.7505)]
2025-09-14 12:01:22,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:01:22,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 13 minutes, 46 seconds)
2025-09-14 12:04:31,630 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:04:39,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3413.24878 ± 727.753
2025-09-14 12:04:39,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(2108.9106), np.float32(3614.1223), np.float32(3947.8362), np.float32(3962.5884), np.float32(3471.4304), np.float32(3909.0103), np.float32(3800.797), np.float32(3642.479), np.float32(3800.08), np.float32(1875.2343)]
2025-09-14 12:04:39,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:04:39,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 10 minutes, 42 seconds)
2025-09-14 12:07:48,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:07:56,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3640.39697 ± 116.083
2025-09-14 12:07:56,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3643.8838), np.float32(3616.385), np.float32(3615.8174), np.float32(3825.9053), np.float32(3809.897), np.float32(3556.1877), np.float32(3452.2756), np.float32(3656.925), np.float32(3727.7961), np.float32(3498.8945)]
2025-09-14 12:07:56,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:07:56,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 7 minutes, 31 seconds)
2025-09-14 12:11:03,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:11:10,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3685.39502 ± 171.916
2025-09-14 12:11:10,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3788.0305), np.float32(3916.4954), np.float32(3537.2073), np.float32(3548.0554), np.float32(3339.0422), np.float32(3717.2407), np.float32(3680.6204), np.float32(3681.2734), np.float32(3696.6838), np.float32(3949.3013)]
2025-09-14 12:11:10,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:11:10,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 4 minutes, 12 seconds)
2025-09-14 12:14:17,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:14:24,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3688.78662 ± 153.192
2025-09-14 12:14:24,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3784.8884), np.float32(3839.2615), np.float32(3726.0806), np.float32(3506.8647), np.float32(3609.8572), np.float32(3351.9614), np.float32(3865.5024), np.float32(3819.7563), np.float32(3681.106), np.float32(3702.5881)]
2025-09-14 12:14:24,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:14:24,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 31 seconds)
2025-09-14 12:17:32,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:17:40,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3321.63599 ± 498.834
2025-09-14 12:17:40,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3376.486), np.float32(3604.3362), np.float32(3378.6736), np.float32(3398.954), np.float32(3416.2424), np.float32(3612.4153), np.float32(3531.8523), np.float32(3699.274), np.float32(1867.0569), np.float32(3331.0667)]
2025-09-14 12:17:40,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:17:40,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 57 minutes, 18 seconds)
2025-09-14 12:20:48,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:20:56,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3505.78516 ± 731.382
2025-09-14 12:20:56,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3789.8518), np.float32(3639.7026), np.float32(4078.9385), np.float32(3725.885), np.float32(3540.4424), np.float32(3480.9373), np.float32(3928.1077), np.float32(1380.0029), np.float32(3917.4841), np.float32(3576.5007)]
2025-09-14 12:20:56,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:20:56,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 53 minutes, 57 seconds)
2025-09-14 12:24:00,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:24:07,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3683.20361 ± 239.092
2025-09-14 12:24:07,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3764.7524), np.float32(3927.0845), np.float32(3688.4463), np.float32(3857.21), np.float32(3684.9314), np.float32(3755.4038), np.float32(3961.524), np.float32(3308.714), np.float32(3181.2925), np.float32(3702.6755)]
2025-09-14 12:24:07,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:24:07,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 50 minutes, 3 seconds)
2025-09-14 12:27:07,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:27:14,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3814.09448 ± 72.310
2025-09-14 12:27:14,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3935.8225), np.float32(3949.615), np.float32(3820.867), np.float32(3766.1157), np.float32(3768.905), np.float32(3797.4128), np.float32(3804.262), np.float32(3756.2668), np.float32(3831.0613), np.float32(3710.617)]
2025-09-14 12:27:14,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:27:14,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 46 minutes, 1 second)
2025-09-14 12:30:14,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:30:21,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3716.84912 ± 143.218
2025-09-14 12:30:21,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3712.0981), np.float32(3836.358), np.float32(3626.2905), np.float32(3799.9675), np.float32(3677.5898), np.float32(3737.198), np.float32(3651.1582), np.float32(3990.6238), np.float32(3727.5427), np.float32(3409.6655)]
2025-09-14 12:30:21,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:30:21,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 42 minutes, 1 second)
2025-09-14 12:33:23,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:33:30,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3749.50513 ± 100.933
2025-09-14 12:33:30,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3938.0554), np.float32(3630.6123), np.float32(3618.1838), np.float32(3648.2705), np.float32(3745.308), np.float32(3803.7666), np.float32(3877.3938), np.float32(3748.5486), np.float32(3793.8237), np.float32(3691.0864)]
2025-09-14 12:33:30,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:33:30,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 38 minutes, 13 seconds)
2025-09-14 12:36:32,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:36:39,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3673.75073 ± 252.082
2025-09-14 12:36:39,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3789.2986), np.float32(3529.279), np.float32(3152.4749), np.float32(3713.8005), np.float32(3749.3887), np.float32(3828.608), np.float32(3711.0732), np.float32(4085.1292), np.float32(3827.2239), np.float32(3351.2314)]
2025-09-14 12:36:39,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:36:39,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 34 minutes, 21 seconds)
2025-09-14 12:39:42,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:39:49,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3642.81787 ± 168.277
2025-09-14 12:39:49,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3428.938), np.float32(3633.426), np.float32(3816.438), np.float32(3579.7744), np.float32(3855.1255), np.float32(3359.513), np.float32(3865.11), np.float32(3740.9897), np.float32(3505.588), np.float32(3643.2769)]
2025-09-14 12:39:49,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:39:49,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 31 minutes, 5 seconds)
2025-09-14 12:42:49,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:42:56,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3686.94971 ± 194.940
2025-09-14 12:42:56,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3945.568), np.float32(3993.3757), np.float32(3377.9553), np.float32(3910.1826), np.float32(3526.151), np.float32(3611.3794), np.float32(3571.6914), np.float32(3632.788), np.float32(3541.9443), np.float32(3758.4622)]
2025-09-14 12:42:56,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:42:56,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 27 minutes, 55 seconds)
2025-09-14 12:45:56,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:46:04,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3662.87622 ± 102.320
2025-09-14 12:46:04,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3464.8867), np.float32(3637.8347), np.float32(3849.5383), np.float32(3616.8013), np.float32(3762.0298), np.float32(3726.6416), np.float32(3570.012), np.float32(3723.4395), np.float32(3622.1855), np.float32(3655.3955)]
2025-09-14 12:46:04,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:46:04,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 24 minutes, 50 seconds)
2025-09-14 12:49:05,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:49:12,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3783.53467 ± 107.787
2025-09-14 12:49:12,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3751.3074), np.float32(3710.6375), np.float32(3906.3782), np.float32(3655.8057), np.float32(3806.5715), np.float32(3968.7678), np.float32(3643.6343), np.float32(3911.8083), np.float32(3783.8135), np.float32(3696.6257)]
2025-09-14 12:49:12,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:49:12,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 21 minutes, 37 seconds)
2025-09-14 12:52:14,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:52:22,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3686.69385 ± 118.317
2025-09-14 12:52:22,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3755.2285), np.float32(3724.5662), np.float32(3578.9243), np.float32(3675.3972), np.float32(3502.7866), np.float32(3771.3044), np.float32(3541.3813), np.float32(3853.2783), np.float32(3850.689), np.float32(3613.381)]
2025-09-14 12:52:22,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:52:22,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 18 minutes, 32 seconds)
2025-09-14 12:55:23,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:55:30,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3471.63135 ± 559.601
2025-09-14 12:55:30,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3555.058), np.float32(2481.3547), np.float32(3898.214), np.float32(3693.8306), np.float32(3910.6301), np.float32(3641.332), np.float32(3601.7388), np.float32(2332.7783), np.float32(3517.3223), np.float32(4084.0544)]
2025-09-14 12:55:30,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:55:30,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 15 minutes, 16 seconds)
2025-09-14 12:58:31,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 12:58:38,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3649.30396 ± 218.212
2025-09-14 12:58:38,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3828.097), np.float32(3707.954), np.float32(3882.4204), np.float32(3681.5168), np.float32(3394.1448), np.float32(3748.2551), np.float32(3945.7273), np.float32(3494.2998), np.float32(3205.9087), np.float32(3604.7168)]
2025-09-14 12:58:38,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 12:58:38,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 12 minutes, 13 seconds)
2025-09-14 13:01:38,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 13:01:45,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3663.63989 ± 193.588
2025-09-14 13:01:45,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3824.9084), np.float32(3688.841), np.float32(3480.7786), np.float32(3577.1702), np.float32(3914.4438), np.float32(3535.8394), np.float32(3930.9878), np.float32(3583.2551), np.float32(3799.3303), np.float32(3300.8457)]
2025-09-14 13:01:45,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:01:45,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 9 minutes, 2 seconds)
2025-09-14 13:04:47,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 13:04:54,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3763.81641 ± 145.354
2025-09-14 13:04:54,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4024.1511), np.float32(3694.3892), np.float32(3841.475), np.float32(3804.4937), np.float32(3699.0356), np.float32(3930.9875), np.float32(3552.4836), np.float32(3557.294), np.float32(3842.8335), np.float32(3691.0227)]
2025-09-14 13:04:54,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:04:54,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 5 minutes, 55 seconds)
2025-09-14 13:07:54,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 13:08:02,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3752.40771 ± 142.090
2025-09-14 13:08:02,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3929.838), np.float32(3733.3618), np.float32(3523.853), np.float32(3803.9438), np.float32(3953.6992), np.float32(3809.301), np.float32(3721.617), np.float32(3576.7366), np.float32(3596.1392), np.float32(3875.5898)]
2025-09-14 13:08:02,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:08:02,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 2 minutes, 39 seconds)
2025-09-14 13:11:03,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 13:11:10,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3593.62354 ± 364.423
2025-09-14 13:11:10,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3645.524), np.float32(3681.2515), np.float32(3691.0593), np.float32(3612.124), np.float32(3992.3994), np.float32(3957.549), np.float32(3735.9958), np.float32(3759.4397), np.float32(2719.5054), np.float32(3141.3857)]
2025-09-14 13:11:10,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:11:10,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 59 minutes, 32 seconds)
2025-09-14 13:14:11,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 13:14:18,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3716.70435 ± 239.169
2025-09-14 13:14:18,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3822.625), np.float32(3319.1624), np.float32(3304.858), np.float32(4056.62), np.float32(3912.423), np.float32(3882.0142), np.float32(3582.0312), np.float32(3649.8896), np.float32(3753.3088), np.float32(3884.1077)]
2025-09-14 13:14:18,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:14:18,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 56 minutes, 25 seconds)
2025-09-14 13:17:19,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 13:17:26,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3725.69800 ± 181.797
2025-09-14 13:17:26,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4033.8494), np.float32(3877.4204), np.float32(3692.633), np.float32(3906.1157), np.float32(3779.265), np.float32(3539.9792), np.float32(3503.664), np.float32(3424.105), np.float32(3743.4915), np.float32(3756.4553)]
2025-09-14 13:17:26,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:17:26,857 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 53 minutes, 19 seconds)
2025-09-14 13:20:26,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 13:20:33,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3600.61060 ± 159.963
2025-09-14 13:20:33,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3494.6973), np.float32(3598.937), np.float32(3265.978), np.float32(3766.3577), np.float32(3804.2793), np.float32(3620.3801), np.float32(3493.539), np.float32(3596.292), np.float32(3821.2722), np.float32(3544.3716)]
2025-09-14 13:20:33,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:20:33,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 50 minutes, 5 seconds)
2025-09-14 13:23:35,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 13:23:42,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3754.40771 ± 195.341
2025-09-14 13:23:42,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3720.607), np.float32(3693.3035), np.float32(4033.5583), np.float32(3818.0415), np.float32(3799.429), np.float32(3842.9724), np.float32(3755.0076), np.float32(3244.3477), np.float32(3727.5627), np.float32(3909.25)]
2025-09-14 13:23:42,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:23:42,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 47 minutes, 1 second)
2025-09-14 13:26:44,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 13:26:51,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3621.16455 ± 140.312
2025-09-14 13:26:51,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3687.1748), np.float32(3585.1738), np.float32(3656.4663), np.float32(3784.095), np.float32(3419.4548), np.float32(3601.9065), np.float32(3608.6162), np.float32(3851.8584), np.float32(3656.075), np.float32(3360.824)]
2025-09-14 13:26:51,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:26:51,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 43 minutes, 54 seconds)
2025-09-14 13:29:53,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 13:30:01,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3531.25342 ± 601.458
2025-09-14 13:30:01,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3922.4548), np.float32(3594.0486), np.float32(3688.7454), np.float32(3415.519), np.float32(3835.1077), np.float32(3842.9216), np.float32(1789.6064), np.float32(3536.288), np.float32(3888.5776), np.float32(3799.264)]
2025-09-14 13:30:01,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:30:01,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 40 minutes, 49 seconds)
2025-09-14 13:33:01,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 13:33:08,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3600.41016 ± 472.969
2025-09-14 13:33:08,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3683.8943), np.float32(3844.527), np.float32(3931.404), np.float32(3664.757), np.float32(3636.3281), np.float32(3795.2434), np.float32(3554.5168), np.float32(3811.5852), np.float32(2220.9446), np.float32(3860.9023)]
2025-09-14 13:33:08,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:33:08,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 37 minutes, 40 seconds)
2025-09-14 13:36:10,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 13:36:17,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3354.75342 ± 838.430
2025-09-14 13:36:17,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(1624.5529), np.float32(3585.3235), np.float32(3635.0293), np.float32(3644.7585), np.float32(3868.739), np.float32(3963.54), np.float32(3780.4282), np.float32(3917.101), np.float32(1766.5106), np.float32(3761.5532)]
2025-09-14 13:36:17,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:36:17,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 34 minutes, 36 seconds)
2025-09-14 13:39:21,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 13:39:29,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3799.69141 ± 124.570
2025-09-14 13:39:29,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3740.8796), np.float32(3881.9321), np.float32(3696.4358), np.float32(3630.9536), np.float32(3765.1328), np.float32(3853.8477), np.float32(3804.913), np.float32(4109.8267), np.float32(3786.0735), np.float32(3726.917)]
2025-09-14 13:39:29,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:39:29,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 31 minutes, 32 seconds)
2025-09-14 13:42:32,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 13:42:40,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3599.88037 ± 376.084
2025-09-14 13:42:40,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3992.9736), np.float32(3772.8613), np.float32(3688.213), np.float32(3581.9358), np.float32(3463.7593), np.float32(3756.8726), np.float32(3475.3652), np.float32(2588.4282), np.float32(3952.245), np.float32(3726.147)]
2025-09-14 13:42:40,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:42:40,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 28 minutes, 27 seconds)
2025-09-14 13:45:42,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 13:45:50,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3376.14526 ± 891.982
2025-09-14 13:45:50,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3824.149), np.float32(774.3276), np.float32(3923.4912), np.float32(3745.516), np.float32(3107.474), np.float32(3596.9146), np.float32(3683.8713), np.float32(3628.699), np.float32(3802.2073), np.float32(3674.8013)]
2025-09-14 13:45:50,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:45:50,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 25 minutes, 18 seconds)
2025-09-14 13:48:52,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 13:49:00,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3754.22144 ± 144.161
2025-09-14 13:49:00,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3537.816), np.float32(3454.2083), np.float32(3750.9841), np.float32(3743.139), np.float32(3884.9573), np.float32(3906.5579), np.float32(3830.75), np.float32(3908.2412), np.float32(3783.136), np.float32(3742.4272)]
2025-09-14 13:49:00,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:49:00,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 22 minutes, 12 seconds)
2025-09-14 13:52:02,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 13:52:09,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3551.97266 ± 627.953
2025-09-14 13:52:09,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3685.056), np.float32(3974.889), np.float32(3977.5928), np.float32(3647.378), np.float32(2588.9448), np.float32(3792.8066), np.float32(2089.2363), np.float32(4043.071), np.float32(3860.4534), np.float32(3860.2998)]
2025-09-14 13:52:09,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:52:09,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 19 minutes, 2 seconds)
2025-09-14 13:55:12,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 13:55:19,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3583.35083 ± 794.811
2025-09-14 13:55:19,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4017.4905), np.float32(3889.462), np.float32(1241.251), np.float32(3787.9602), np.float32(3516.9941), np.float32(3890.4944), np.float32(3806.4172), np.float32(3680.22), np.float32(4003.9158), np.float32(3999.3042)]
2025-09-14 13:55:19,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:55:19,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 15 minutes, 50 seconds)
2025-09-14 13:58:21,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 13:58:29,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3392.47852 ± 670.758
2025-09-14 13:58:29,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3745.5168), np.float32(1452.2573), np.float32(3727.8235), np.float32(3408.0645), np.float32(3315.2268), np.float32(3668.9775), np.float32(3817.1309), np.float32(3496.8928), np.float32(3428.8186), np.float32(3864.0747)]
2025-09-14 13:58:29,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 13:58:29,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 12 minutes, 39 seconds)
2025-09-14 14:01:31,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 14:01:38,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3783.27588 ± 228.903
2025-09-14 14:01:38,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3847.625), np.float32(3537.3125), np.float32(4052.1667), np.float32(3697.0251), np.float32(3831.5862), np.float32(3941.7812), np.float32(3434.4785), np.float32(4021.527), np.float32(3443.9614), np.float32(4025.2932)]
2025-09-14 14:01:38,572 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:01:38,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 9 minutes, 29 seconds)
2025-09-14 14:04:43,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 14:04:50,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3483.30981 ± 753.769
2025-09-14 14:04:50,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3750.7808), np.float32(3272.0232), np.float32(3957.822), np.float32(3692.7163), np.float32(3856.4963), np.float32(3419.4402), np.float32(3785.825), np.float32(1310.837), np.float32(3945.2717), np.float32(3841.8838)]
2025-09-14 14:04:50,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:04:50,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 6 minutes, 20 seconds)
2025-09-14 14:07:53,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 14:08:00,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3841.50513 ± 100.050
2025-09-14 14:08:00,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(3777.5557), np.float32(3806.5464), np.float32(3954.9482), np.float32(3780.6897), np.float32(3969.692), np.float32(3749.357), np.float32(3998.8506), np.float32(3868.2551), np.float32(3674.8528), np.float32(3834.3054)]
2025-09-14 14:08:00,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:08:00,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1226 [INFO]: New best (3841.51) for latency 6
2025-09-14 14:08:00,236 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 3 minutes, 10 seconds)
2025-09-14 14:11:01,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1214 [DEBUG]: Evaluating for latency 6...
2025-09-14 14:11:08,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1221 [DEBUG]: Total Reward: 3809.15771 ± 176.504
2025-09-14 14:11:08,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1222 [DEBUG]: All rewards: [np.float32(4114.5156), np.float32(3828.7444), np.float32(3980.621), np.float32(3902.4507), np.float32(3759.4517), np.float32(3671.5142), np.float32(3852.9128), np.float32(3420.646), np.float32(3736.333), np.float32(3824.3914)]
2025-09-14 14:11:08,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1223 [DEBUG]: All trajectory lengths: [np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0), np.float32(1000.0)]
2025-09-14 14:11:08,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille150-halfcheetah):1251 [DEBUG]: Training session finished
