2025-09-16 09:07:45,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1108 [DEBUG]: logdir: _logs/noise-eval-v2/halfcheetah/bpql-noise_0.050-delay_15
2025-09-16 09:07:45,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1109 [DEBUG]: trainer_prefix: noise-eval-v2/halfcheetah/bpql-noise_0.050-delay_15
2025-09-16 09:07:45,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'15': <latency_env.delayed_mdp.ConstantDelay object at 0x147deb880990>}
2025-09-16 09:07:45,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1111 [DEBUG]: using device: cuda
2025-09-16 09:07:45,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-16 09:07:45,604 baseline-bpql-noisepromille50-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=107, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-16 09:07:45,604 baseline-bpql-noisepromille50-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 09:07:46,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-16 09:07:46,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-16 09:09:25,296 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:09:36,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: -410.07242 ± 23.848
2025-09-16 09:09:36,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [-403.5955, -426.4072, -388.01053, -436.1333, -407.82556, -453.6165, -378.63995, -381.71503, -428.33997, -396.4407]
2025-09-16 09:09:36,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:09:36,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (-410.07) for latency 15
2025-09-16 09:09:36,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 10 seconds)
2025-09-16 09:11:19,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:11:30,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: -181.41315 ± 66.915
2025-09-16 09:11:30,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [-148.88873, -177.1007, -112.734406, -50.931362, -278.0821, -182.54726, -164.2377, -207.13622, -205.92058, -286.5525]
2025-09-16 09:11:30,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:11:30,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (-181.41) for latency 15
2025-09-16 09:11:30,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 2 minutes, 31 seconds)
2025-09-16 09:13:13,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:13:24,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: -155.94865 ± 84.704
2025-09-16 09:13:24,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [-95.702324, -166.53287, -182.77753, -294.80838, -72.59569, -224.46802, -127.804245, 15.391808, -231.37148, -178.81793]
2025-09-16 09:13:24,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:13:24,635 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (-155.95) for latency 15
2025-09-16 09:13:24,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 1 minute, 59 seconds)
2025-09-16 09:15:08,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:15:18,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 43.45084 ± 164.829
2025-09-16 09:15:18,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [-190.8221, -39.42982, 208.14838, 239.53722, -145.49315, 42.052757, 156.29118, -123.81891, -10.43964, 298.48245]
2025-09-16 09:15:18,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:15:18,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (43.45) for latency 15
2025-09-16 09:15:18,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 48 seconds)
2025-09-16 09:17:02,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:17:12,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 141.47888 ± 249.927
2025-09-16 09:17:12,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [129.66226, 34.34871, 191.62553, -48.80578, -173.67336, 435.82098, 576.41864, 446.41342, -74.547, -102.47474]
2025-09-16 09:17:12,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:17:12,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (141.48) for latency 15
2025-09-16 09:17:12,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 59 minutes, 14 seconds)
2025-09-16 09:18:56,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:19:06,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 272.05167 ± 282.255
2025-09-16 09:19:06,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [475.74286, 169.58607, 477.77277, 297.42557, 683.1846, -359.62122, 540.9534, 54.880806, 210.83154, 169.7603]
2025-09-16 09:19:06,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:19:06,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (272.05) for latency 15
2025-09-16 09:19:06,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 58 minutes, 51 seconds)
2025-09-16 09:20:50,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:21:00,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 749.18103 ± 70.258
2025-09-16 09:21:00,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [657.1935, 712.7526, 747.55023, 774.46704, 777.47046, 862.5946, 628.8564, 702.47345, 805.64417, 822.80804]
2025-09-16 09:21:00,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:21:00,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (749.18) for latency 15
2025-09-16 09:21:01,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 56 minutes, 53 seconds)
2025-09-16 09:22:44,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:22:55,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 868.56396 ± 159.514
2025-09-16 09:22:55,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [878.1944, 907.4047, 665.2182, 810.2189, 1108.2195, 648.65436, 767.1845, 1136.743, 782.5469, 981.2551]
2025-09-16 09:22:55,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:22:55,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (868.56) for latency 15
2025-09-16 09:22:55,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 54 minutes, 59 seconds)
2025-09-16 09:24:38,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:24:49,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1081.11792 ± 115.691
2025-09-16 09:24:49,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [899.8367, 1186.9501, 1178.1134, 1108.3656, 1000.2682, 1068.2842, 1147.7346, 1276.0844, 1031.9353, 913.60724]
2025-09-16 09:24:49,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:24:49,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (1081.12) for latency 15
2025-09-16 09:24:49,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 53 minutes, 3 seconds)
2025-09-16 09:26:33,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:26:44,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1106.87024 ± 128.932
2025-09-16 09:26:44,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [1037.9386, 1073.0466, 1133.8652, 1330.331, 1009.189, 1076.8478, 998.21783, 1352.6544, 942.6516, 1113.9602]
2025-09-16 09:26:44,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:26:44,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (1106.87) for latency 15
2025-09-16 09:26:44,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 51 minutes, 21 seconds)
2025-09-16 09:28:28,032 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:28:38,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1211.08191 ± 280.066
2025-09-16 09:28:38,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [1198.5188, 1062.2394, 1068.1451, 1294.1256, 1998.9913, 1121.1395, 1150.775, 945.0967, 1236.2653, 1035.5228]
2025-09-16 09:28:38,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:28:38,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (1211.08) for latency 15
2025-09-16 09:28:38,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 49 minutes, 37 seconds)
2025-09-16 09:30:22,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:30:33,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1261.04053 ± 320.301
2025-09-16 09:30:33,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [1011.8, 1081.7063, 1093.9603, 1238.0441, 1319.4768, 1178.4572, 835.52466, 1440.8442, 2082.6882, 1327.9036]
2025-09-16 09:30:33,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:30:33,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (1261.04) for latency 15
2025-09-16 09:30:33,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 47 minutes, 55 seconds)
2025-09-16 09:32:17,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:32:28,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1332.06323 ± 319.022
2025-09-16 09:32:28,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [1440.4244, 1374.5208, 1214.362, 2109.0403, 1610.491, 959.18677, 1263.547, 1012.78986, 1237.86, 1098.4098]
2025-09-16 09:32:28,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:32:28,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (1332.06) for latency 15
2025-09-16 09:32:28,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 46 minutes, 14 seconds)
2025-09-16 09:34:12,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:34:23,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1430.27905 ± 233.577
2025-09-16 09:34:23,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [1335.9841, 1474.4899, 1211.5457, 1237.1854, 1193.1833, 1568.106, 1416.3812, 2023.1799, 1527.4622, 1315.2732]
2025-09-16 09:34:23,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:34:23,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (1430.28) for latency 15
2025-09-16 09:34:23,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 44 minutes, 28 seconds)
2025-09-16 09:36:07,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:36:18,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1556.70386 ± 307.010
2025-09-16 09:36:18,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [2032.9298, 1610.5165, 1513.7523, 1491.3864, 1089.8138, 1473.7108, 2037.6802, 1261.7356, 1243.278, 1812.2352]
2025-09-16 09:36:18,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:36:18,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (1556.70) for latency 15
2025-09-16 09:36:18,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 42 minutes, 37 seconds)
2025-09-16 09:38:03,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:38:13,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1534.07983 ± 265.642
2025-09-16 09:38:13,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [2082.648, 1234.3748, 1230.7788, 1406.9512, 1868.3917, 1273.5868, 1473.2296, 1581.4073, 1511.534, 1677.8967]
2025-09-16 09:38:13,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:38:13,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 41 minutes, 2 seconds)
2025-09-16 09:40:00,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:40:10,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1390.92603 ± 223.000
2025-09-16 09:40:10,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [1198.0878, 1653.4292, 1503.3384, 1880.0023, 1177.418, 1468.4641, 1292.164, 1154.2153, 1259.1493, 1322.9912]
2025-09-16 09:40:10,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:40:10,915 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 39 minutes, 45 seconds)
2025-09-16 09:41:56,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:42:07,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2037.02246 ± 439.152
2025-09-16 09:42:07,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [2328.6365, 1336.3899, 1852.1879, 2494.7732, 2051.7095, 2237.3306, 1914.4938, 2227.9807, 1254.0464, 2672.6758]
2025-09-16 09:42:07,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:42:07,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (2037.02) for latency 15
2025-09-16 09:42:07,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 38 minutes, 17 seconds)
2025-09-16 09:43:53,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:44:04,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1334.99243 ± 104.882
2025-09-16 09:44:04,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [1276.9469, 1395.9354, 1221.3694, 1342.9519, 1319.7128, 1238.5074, 1292.9015, 1414.5432, 1591.7906, 1255.2638]
2025-09-16 09:44:04,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:44:04,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 36 minutes, 50 seconds)
2025-09-16 09:45:49,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:46:00,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1737.08960 ± 529.142
2025-09-16 09:46:00,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [2156.4548, 1604.2507, 1132.0487, 3005.0408, 2049.2832, 1556.0632, 1306.9944, 1230.749, 1505.887, 1824.1251]
2025-09-16 09:46:00,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:46:00,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 35 minutes, 21 seconds)
2025-09-16 09:47:46,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:47:56,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1721.55347 ± 408.758
2025-09-16 09:47:56,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [1920.5101, 2370.3918, 1321.9023, 1755.1263, 1337.3271, 1290.9357, 1715.1262, 2417.723, 1833.131, 1253.3622]
2025-09-16 09:47:56,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:47:56,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 33 minutes, 30 seconds)
2025-09-16 09:49:39,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:49:50,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1623.92896 ± 440.021
2025-09-16 09:49:50,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [1284.016, 1267.8676, 1291.8453, 1195.5753, 2728.5107, 1634.7869, 1542.8596, 1981.3772, 1803.6279, 1508.8221]
2025-09-16 09:49:50,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:49:50,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 30 minutes, 40 seconds)
2025-09-16 09:51:33,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:51:44,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1496.51978 ± 352.326
2025-09-16 09:51:44,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [1157.3741, 2295.8835, 1815.4694, 1265.6014, 1196.2589, 1223.323, 1581.0016, 1702.7823, 1566.9169, 1160.5852]
2025-09-16 09:51:44,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:51:44,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 28 minutes, 1 second)
2025-09-16 09:53:27,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:53:38,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1784.63940 ± 521.637
2025-09-16 09:53:38,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [1154.3392, 1352.6608, 2176.601, 1961.3594, 1332.3035, 2263.5156, 2942.53, 1456.1069, 1648.7816, 1558.1963]
2025-09-16 09:53:38,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:53:38,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 25 minutes, 27 seconds)
2025-09-16 09:55:21,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:55:32,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1622.73535 ± 358.478
2025-09-16 09:55:32,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [1531.6045, 1754.5205, 1411.4131, 1255.2665, 1549.8472, 1174.2317, 1818.9884, 1551.0385, 2533.2173, 1647.2263]
2025-09-16 09:55:32,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:55:32,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 22 minutes, 52 seconds)
2025-09-16 09:57:15,456 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:57:26,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1594.87817 ± 227.179
2025-09-16 09:57:26,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [1436.0757, 1392.7776, 1636.2778, 1337.6567, 1425.5983, 2064.9207, 1562.1619, 1861.4935, 1443.3844, 1788.4332]
2025-09-16 09:57:26,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:57:26,223 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 20 minutes, 27 seconds)
2025-09-16 09:59:09,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 09:59:20,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1490.78088 ± 232.476
2025-09-16 09:59:20,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [1927.0836, 1254.7184, 1886.1428, 1365.2556, 1631.3829, 1436.8132, 1462.5323, 1358.3247, 1254.9597, 1330.5942]
2025-09-16 09:59:20,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:59:20,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 18 minutes, 37 seconds)
2025-09-16 10:01:03,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:01:14,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1701.54065 ± 608.882
2025-09-16 10:01:14,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [1203.831, 1196.4542, 2700.591, 1376.2104, 2735.426, 1937.4889, 2214.6548, 1189.1337, 1289.9092, 1171.706]
2025-09-16 10:01:14,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:01:14,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 16 minutes, 47 seconds)
2025-09-16 10:02:57,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:03:08,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1939.44373 ± 533.665
2025-09-16 10:03:08,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [1439.2627, 1346.975, 1647.2135, 2575.4373, 1840.8531, 1492.3655, 2645.0796, 2921.7717, 1802.0085, 1683.4697]
2025-09-16 10:03:08,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:03:08,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 14 minutes, 55 seconds)
2025-09-16 10:04:51,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:05:02,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1587.70605 ± 442.193
2025-09-16 10:05:02,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [1395.2875, 1629.9927, 1448.4236, 2180.1477, 1514.5072, 1261.716, 1205.8767, 1289.8914, 2646.0908, 1305.1271]
2025-09-16 10:05:02,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:05:02,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 12 minutes, 59 seconds)
2025-09-16 10:06:45,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:06:56,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1831.97192 ± 482.755
2025-09-16 10:06:56,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [2272.6055, 2509.1934, 2547.7793, 1430.5077, 1744.317, 1460.3738, 1511.3566, 2253.9792, 1309.425, 1280.1802]
2025-09-16 10:06:56,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:06:56,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 11 minutes, 4 seconds)
2025-09-16 10:08:39,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:08:50,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1765.03552 ± 386.148
2025-09-16 10:08:50,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [1498.9617, 2045.462, 1719.9907, 1597.9147, 2491.1504, 1199.0215, 1324.2648, 2182.5264, 2011.9323, 1579.13]
2025-09-16 10:08:50,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:08:50,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 9 minutes, 12 seconds)
2025-09-16 10:10:33,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:10:43,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1501.40161 ± 322.132
2025-09-16 10:10:43,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [1256.7415, 2029.3206, 2155.4365, 1514.4078, 1124.2877, 1309.7451, 1246.4609, 1442.0432, 1364.1646, 1571.4077]
2025-09-16 10:10:43,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:10:43,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 7 minutes, 14 seconds)
2025-09-16 10:12:27,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:12:37,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1904.96899 ± 458.814
2025-09-16 10:12:37,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [1209.0422, 2632.3584, 1272.4015, 1515.7258, 2196.3386, 1880.2006, 1822.6893, 2038.345, 2560.466, 1922.1218]
2025-09-16 10:12:37,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:12:37,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 5 minutes, 18 seconds)
2025-09-16 10:14:21,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:14:31,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1835.08167 ± 565.717
2025-09-16 10:14:31,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [2559.582, 1385.1412, 1593.1534, 2648.558, 1591.8074, 2710.8943, 1993.2769, 1307.5287, 1217.5408, 1343.3335]
2025-09-16 10:14:31,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:14:31,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 3 minutes, 23 seconds)
2025-09-16 10:16:14,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:16:25,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2203.49121 ± 746.181
2025-09-16 10:16:25,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [2434.7385, 3066.1675, 1205.8713, 2894.3335, 1418.4612, 2083.4548, 1476.6948, 2114.1882, 1750.0175, 3590.985]
2025-09-16 10:16:25,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:16:25,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (2203.49) for latency 15
2025-09-16 10:16:25,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 1 minute, 27 seconds)
2025-09-16 10:18:08,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:18:19,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1868.78052 ± 453.672
2025-09-16 10:18:19,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [1458.763, 1438.5311, 1855.1595, 2400.4526, 2577.5642, 1678.6161, 1602.5505, 2558.9978, 1290.4121, 1826.7584]
2025-09-16 10:18:19,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:18:19,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 59 minutes, 33 seconds)
2025-09-16 10:20:03,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:20:14,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1808.79529 ± 539.466
2025-09-16 10:20:14,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [2610.2793, 1411.3737, 2678.835, 1436.6373, 1337.4829, 1254.6509, 2318.2012, 1566.3164, 1314.7999, 2159.378]
2025-09-16 10:20:14,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:20:14,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 57 minutes, 53 seconds)
2025-09-16 10:22:00,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:22:10,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2091.67529 ± 534.649
2025-09-16 10:22:10,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [3084.197, 1747.8196, 1910.3949, 1856.1631, 2026.12, 1586.2837, 2529.8345, 2805.6123, 1252.1987, 2118.1296]
2025-09-16 10:22:10,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:22:10,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 56 minutes, 31 seconds)
2025-09-16 10:23:56,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:24:07,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 1951.18335 ± 537.607
2025-09-16 10:24:07,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [2871.697, 2629.685, 2155.5352, 1347.1537, 1777.676, 1831.9973, 1223.7968, 2310.5593, 1276.7305, 2087.0034]
2025-09-16 10:24:07,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:24:07,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 55 minutes, 9 seconds)
2025-09-16 10:25:53,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:26:04,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2139.63110 ± 551.264
2025-09-16 10:26:04,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [1602.4891, 2820.7695, 1584.588, 2872.1912, 1785.9705, 1307.7965, 2434.1987, 2765.884, 1842.7651, 2379.6587]
2025-09-16 10:26:04,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:26:04,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 53 minutes, 49 seconds)
2025-09-16 10:27:50,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:28:01,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2092.43823 ± 791.427
2025-09-16 10:28:01,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [1430.6815, 3078.4827, 851.63806, 1562.992, 2555.3367, 1858.3146, 3077.0527, 3257.0308, 1580.084, 1672.7719]
2025-09-16 10:28:01,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:28:01,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 52 minutes, 25 seconds)
2025-09-16 10:29:46,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:29:57,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2280.45801 ± 888.037
2025-09-16 10:29:57,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [1701.04, 1644.5142, 2330.942, 2292.103, 3569.077, 4058.455, 1131.358, 1396.8574, 2666.575, 2013.6577]
2025-09-16 10:29:57,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:29:57,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (2280.46) for latency 15
2025-09-16 10:29:57,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 50 minutes, 45 seconds)
2025-09-16 10:31:40,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:31:51,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2921.44385 ± 733.373
2025-09-16 10:31:51,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [2240.844, 4056.2742, 2513.1382, 2626.1875, 2770.8896, 3870.7917, 2919.6401, 2233.1638, 3983.1846, 2000.3262]
2025-09-16 10:31:51,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:31:51,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (2921.44) for latency 15
2025-09-16 10:31:51,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 48 minutes, 20 seconds)
2025-09-16 10:33:34,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:33:45,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2975.66650 ± 697.847
2025-09-16 10:33:45,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [3533.0771, 3012.5862, 2681.191, 3562.4202, 3618.1865, 3591.0762, 3610.3337, 1706.0796, 2521.8137, 1919.9025]
2025-09-16 10:33:45,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:33:45,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (2975.67) for latency 15
2025-09-16 10:33:45,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 45 minutes, 57 seconds)
2025-09-16 10:35:28,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:35:39,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2950.67114 ± 1055.641
2025-09-16 10:35:39,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [3978.4827, 3936.607, 3999.3335, 3492.3962, 3453.8496, 1914.8239, 3739.94, 1292.9244, 2396.1472, 1302.2089]
2025-09-16 10:35:39,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:35:39,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 43 minutes, 34 seconds)
2025-09-16 10:37:22,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:37:33,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 2729.46533 ± 917.002
2025-09-16 10:37:33,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [2437.0662, 4053.0623, 3869.2195, 3232.2021, 2680.579, 2229.2769, 1344.1786, 1589.2543, 3786.5598, 2073.2517]
2025-09-16 10:37:33,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:37:33,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 41 minutes, 9 seconds)
2025-09-16 10:39:16,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:39:27,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3056.11719 ± 1202.614
2025-09-16 10:39:27,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [1807.7632, 4264.455, 1457.0106, 4327.0376, 2608.9773, 4311.254, 2526.8936, 1307.6803, 3489.9373, 4460.165]
2025-09-16 10:39:27,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:39:27,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (3056.12) for latency 15
2025-09-16 10:39:27,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 38 minutes, 51 seconds)
2025-09-16 10:41:11,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:41:21,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3592.40747 ± 932.298
2025-09-16 10:41:21,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [3510.469, 4458.092, 4483.092, 3790.7327, 4285.0654, 1421.7737, 4199.2275, 3227.6274, 2496.9758, 4051.021]
2025-09-16 10:41:21,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:41:21,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (3592.41) for latency 15
2025-09-16 10:41:21,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 36 minutes, 58 seconds)
2025-09-16 10:43:05,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:43:15,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3711.15234 ± 1257.624
2025-09-16 10:43:15,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [4650.8237, 3805.6392, 4266.15, 1237.8088, 3517.1875, 4676.6523, 4657.136, 4461.2705, 1367.3652, 4471.4907]
2025-09-16 10:43:15,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:43:15,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (3711.15) for latency 15
2025-09-16 10:43:15,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 35 minutes, 4 seconds)
2025-09-16 10:44:59,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:45:09,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4501.06299 ± 445.888
2025-09-16 10:45:09,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [4796.305, 4804.977, 4673.0156, 3972.1077, 4472.6694, 4683.861, 4879.5654, 3381.3813, 4607.3633, 4739.384]
2025-09-16 10:45:09,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:45:09,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (4501.06) for latency 15
2025-09-16 10:45:09,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 33 minutes, 9 seconds)
2025-09-16 10:46:53,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:47:03,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3570.76294 ± 1433.702
2025-09-16 10:47:03,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [1364.4907, 4749.5215, 4869.998, 4579.949, 4308.828, 3084.921, 1881.317, 4838.943, 4707.465, 1322.1948]
2025-09-16 10:47:03,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:47:03,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 31 minutes, 14 seconds)
2025-09-16 10:48:47,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:48:57,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3947.99292 ± 996.282
2025-09-16 10:48:57,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [4584.4487, 4484.9365, 4633.7627, 4712.1885, 2296.7761, 4598.8896, 2739.241, 4722.834, 4416.3105, 2290.539]
2025-09-16 10:48:57,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:48:57,831 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 29 minutes, 19 seconds)
2025-09-16 10:50:40,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:50:51,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4360.70215 ± 898.816
2025-09-16 10:50:51,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [4500.271, 4895.597, 2780.2456, 4905.9756, 4485.9536, 4880.776, 2419.8335, 4960.6763, 4947.912, 4829.7773]
2025-09-16 10:50:51,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:50:51,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 27 minutes, 22 seconds)
2025-09-16 10:52:34,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:52:45,481 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3880.63672 ± 979.664
2025-09-16 10:52:45,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [4412.4185, 3008.6628, 4869.951, 4788.3096, 4336.2373, 4228.593, 2358.1206, 4683.986, 4087.8767, 2032.2162]
2025-09-16 10:52:45,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:52:45,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 25 minutes, 26 seconds)
2025-09-16 10:54:28,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:54:39,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3985.45630 ± 1109.618
2025-09-16 10:54:39,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [4735.024, 4168.3516, 4315.2114, 1851.5858, 4395.4824, 4336.8564, 4728.6543, 4669.2183, 1765.5687, 4888.6113]
2025-09-16 10:54:39,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:54:39,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 23 minutes, 29 seconds)
2025-09-16 10:56:22,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:56:32,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3803.12744 ± 1220.773
2025-09-16 10:56:32,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [4674.0093, 2219.4346, 4800.096, 4629.8896, 4531.0967, 1706.861, 4662.2607, 1963.9609, 4694.195, 4149.4688]
2025-09-16 10:56:32,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:56:33,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 21 minutes, 34 seconds)
2025-09-16 10:58:16,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 10:58:26,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3537.87891 ± 1199.348
2025-09-16 10:58:26,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [3134.8572, 2238.6538, 1366.699, 4609.2627, 3197.4746, 4930.5986, 2283.1172, 4644.1494, 4578.1763, 4395.8027]
2025-09-16 10:58:26,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:58:26,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 19 minutes, 40 seconds)
2025-09-16 11:00:10,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:00:21,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3954.73096 ± 1015.721
2025-09-16 11:00:21,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [4700.26, 2083.2026, 4672.5303, 2964.3179, 4576.248, 4827.6963, 2613.0728, 4796.5073, 3441.2976, 4872.1753]
2025-09-16 11:00:21,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:00:21,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 17 minutes, 53 seconds)
2025-09-16 11:02:04,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:02:15,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4030.45654 ± 1038.872
2025-09-16 11:02:15,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [3125.2107, 4307.257, 4903.4453, 4424.571, 4903.825, 1371.229, 3543.5728, 4468.777, 4646.473, 4610.2046]
2025-09-16 11:02:15,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:02:15,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 15 minutes, 58 seconds)
2025-09-16 11:03:57,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:04:08,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3747.47998 ± 1233.699
2025-09-16 11:04:08,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [2074.9524, 4770.031, 4522.05, 3149.0254, 1894.5242, 2049.0466, 4932.588, 4644.555, 4763.6064, 4674.421]
2025-09-16 11:04:08,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:04:08,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 14 minutes)
2025-09-16 11:05:50,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:06:01,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3432.64380 ± 1491.060
2025-09-16 11:06:01,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [1516.8649, 4813.889, 4611.517, 2465.1191, 1378.7054, 4607.3945, 1267.5299, 4782.2627, 4137.5005, 4745.654]
2025-09-16 11:06:01,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:06:01,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 11 minutes, 59 seconds)
2025-09-16 11:07:43,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:07:54,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3476.78711 ± 1426.926
2025-09-16 11:07:54,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [3692.3923, 2556.194, 4792.7773, 1829.2432, 1299.7739, 1579.6282, 4862.8115, 4453.394, 4976.4673, 4725.19]
2025-09-16 11:07:54,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:07:54,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 10 minutes)
2025-09-16 11:09:36,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:09:47,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3925.84180 ± 1245.693
2025-09-16 11:09:47,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [4453.4478, 4298.6157, 5016.031, 1444.2762, 4323.19, 5093.4907, 4508.5225, 3449.4817, 4911.9453, 1759.4197]
2025-09-16 11:09:47,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:09:47,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 7 minutes, 52 seconds)
2025-09-16 11:11:28,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:11:39,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3753.67651 ± 950.405
2025-09-16 11:11:39,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [4257.283, 4249.294, 4200.1235, 2259.4236, 1981.4717, 4616.1416, 3952.417, 5007.1836, 3014.7722, 3998.6582]
2025-09-16 11:11:39,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:11:39,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 5 minutes, 46 seconds)
2025-09-16 11:13:20,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:13:31,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4381.75586 ± 601.727
2025-09-16 11:13:31,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [4096.829, 4803.9087, 4692.4814, 4886.16, 4553.1167, 4710.0645, 4866.518, 4614.0293, 2956.73, 3637.7183]
2025-09-16 11:13:31,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:13:31,026 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 3 minutes, 45 seconds)
2025-09-16 11:15:12,206 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:15:22,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3541.70654 ± 1441.905
2025-09-16 11:15:22,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [2488.3545, 4668.7056, 4740.877, 1611.6798, 4775.383, 1591.3324, 4690.8496, 4568.6206, 1524.2334, 4757.0317]
2025-09-16 11:15:22,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:15:22,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 1 minute, 44 seconds)
2025-09-16 11:17:02,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:17:13,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4399.90918 ± 487.199
2025-09-16 11:17:13,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [4386.1865, 3116.4211, 4549.0767, 4484.582, 4111.6772, 4840.023, 4551.2856, 4618.92, 4352.164, 4988.758]
2025-09-16 11:17:13,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:17:13,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 59 minutes, 37 seconds)
2025-09-16 11:18:53,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:19:03,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3483.00195 ± 1325.897
2025-09-16 11:19:03,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [4369.467, 4670.085, 1379.8744, 1428.1736, 4439.7354, 4600.624, 3293.523, 1822.8757, 4577.276, 4248.3867]
2025-09-16 11:19:03,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:19:03,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 57 minutes, 30 seconds)
2025-09-16 11:20:43,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:20:54,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4192.57324 ± 860.257
2025-09-16 11:20:54,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [4862.498, 4662.821, 2909.3997, 4498.6104, 4635.204, 4960.774, 5002.7896, 4408.76, 3549.402, 2435.4778]
2025-09-16 11:20:54,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:20:54,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 55 minutes, 31 seconds)
2025-09-16 11:22:34,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:22:44,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4791.85059 ± 200.968
2025-09-16 11:22:44,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [4854.2915, 4319.5283, 4633.718, 4829.822, 5069.517, 4924.3047, 4711.612, 4877.5796, 4715.9004, 4982.2354]
2025-09-16 11:22:44,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:22:44,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (4791.85) for latency 15
2025-09-16 11:22:44,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 53 minutes, 31 seconds)
2025-09-16 11:24:24,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:24:35,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4357.33496 ± 664.549
2025-09-16 11:24:35,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [2857.505, 4136.138, 4751.8174, 3447.1123, 4842.4185, 4591.6084, 4848.8643, 4962.432, 4839.065, 4296.3813]
2025-09-16 11:24:35,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:24:35,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 51 minutes, 35 seconds)
2025-09-16 11:26:15,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:26:26,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3895.68604 ± 1207.060
2025-09-16 11:26:26,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [4662.655, 1358.3081, 4866.5913, 2600.4968, 4362.259, 4878.825, 4925.507, 2470.409, 4645.9014, 4185.908]
2025-09-16 11:26:26,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:26:26,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 49 minutes, 46 seconds)
2025-09-16 11:28:06,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:28:17,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4835.21631 ± 85.561
2025-09-16 11:28:17,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [4800.581, 4899.179, 4759.6147, 4759.9536, 4857.5317, 4840.0522, 4907.446, 4682.769, 5000.9854, 4844.0557]
2025-09-16 11:28:17,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:28:17,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (4835.22) for latency 15
2025-09-16 11:28:17,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 47 minutes, 59 seconds)
2025-09-16 11:29:57,055 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:30:07,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3847.49854 ± 1173.733
2025-09-16 11:30:07,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [3787.8462, 3707.5396, 4857.9116, 4783.2183, 3509.869, 4605.417, 1396.4314, 4984.2295, 4745.1113, 2097.4116]
2025-09-16 11:30:07,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:30:07,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 46 minutes, 6 seconds)
2025-09-16 11:31:46,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:31:57,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4283.91260 ± 1158.495
2025-09-16 11:31:57,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [4567.7964, 2738.2656, 4451.849, 4729.051, 5103.2676, 4987.0225, 1417.4912, 4974.6636, 4838.7856, 5030.9316]
2025-09-16 11:31:57,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:31:57,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 44 minutes, 13 seconds)
2025-09-16 11:33:37,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:33:47,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4564.90479 ± 878.460
2025-09-16 11:33:47,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [1959.7924, 4854.852, 5007.2305, 4637.6636, 4628.8184, 4860.7993, 4768.1294, 4987.1753, 4939.7954, 5004.787]
2025-09-16 11:33:47,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:33:47,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 42 minutes, 19 seconds)
2025-09-16 11:35:26,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:35:37,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3992.63989 ± 1197.830
2025-09-16 11:35:37,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [5012.2695, 4422.095, 5043.043, 1478.5953, 2920.0808, 2359.605, 4634.175, 4709.638, 4796.4976, 4550.399]
2025-09-16 11:35:37,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:35:37,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 40 minutes, 24 seconds)
2025-09-16 11:37:16,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:37:27,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3908.89062 ± 1062.345
2025-09-16 11:37:27,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [4846.314, 4675.4824, 3177.4424, 2041.5226, 3010.1992, 2410.5767, 4760.48, 4570.471, 4800.0728, 4796.3438]
2025-09-16 11:37:27,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:37:27,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 38 minutes, 28 seconds)
2025-09-16 11:39:06,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:39:16,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4634.10303 ± 872.747
2025-09-16 11:39:16,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [4596.514, 4977.6646, 2045.4622, 4929.7036, 5050.786, 4800.3174, 4906.8774, 5070.726, 4955.702, 5007.277]
2025-09-16 11:39:16,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:39:16,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 36 minutes, 37 seconds)
2025-09-16 11:40:56,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:41:06,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4213.18652 ± 787.887
2025-09-16 11:41:06,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [4870.274, 4922.8843, 4453.218, 4848.0234, 2496.4495, 4745.0547, 4934.1147, 3561.479, 3689.692, 3610.6794]
2025-09-16 11:41:06,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:41:06,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 34 minutes, 47 seconds)
2025-09-16 11:42:46,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:42:56,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4298.35645 ± 963.591
2025-09-16 11:42:56,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [3656.1914, 5054.067, 3117.4407, 2069.4788, 5076.988, 4795.816, 4842.642, 4616.249, 4971.319, 4783.37]
2025-09-16 11:42:56,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:42:56,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 32 minutes, 57 seconds)
2025-09-16 11:44:36,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:44:46,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4653.71973 ± 576.255
2025-09-16 11:44:46,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [3645.0977, 3511.027, 5137.367, 5059.338, 4651.593, 4941.7183, 4996.1694, 4447.0596, 5034.3203, 5113.506]
2025-09-16 11:44:46,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:44:46,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 31 minutes, 7 seconds)
2025-09-16 11:46:26,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:46:36,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3850.13159 ± 1590.365
2025-09-16 11:46:36,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [4714.3457, 1176.9773, 5077.5454, 4758.353, 1279.4651, 5065.645, 4776.791, 1864.7689, 4847.321, 4940.1055]
2025-09-16 11:46:36,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:46:36,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 29 minutes, 18 seconds)
2025-09-16 11:48:16,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:48:26,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4046.44409 ± 1041.621
2025-09-16 11:48:26,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [4720.1357, 3062.9265, 4817.4927, 3730.538, 4834.1978, 4620.4424, 4675.9014, 3391.6628, 5017.516, 1593.6276]
2025-09-16 11:48:26,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:48:26,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 27 minutes, 29 seconds)
2025-09-16 11:50:06,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:50:16,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4696.21191 ± 763.167
2025-09-16 11:50:16,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [5186.682, 5177.3247, 3389.8887, 5120.2583, 5093.1694, 5074.488, 2987.6123, 5043.1484, 4913.189, 4976.3584]
2025-09-16 11:50:16,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:50:16,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 25 minutes, 39 seconds)
2025-09-16 11:51:55,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:52:06,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4468.70703 ± 1029.068
2025-09-16 11:52:06,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [5105.14, 4977.143, 4838.8687, 4856.095, 4757.004, 4859.4053, 4390.9224, 4561.7065, 1435.824, 4904.961]
2025-09-16 11:52:06,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:52:06,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 23 minutes, 49 seconds)
2025-09-16 11:53:45,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:53:56,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4351.98291 ± 1122.609
2025-09-16 11:53:56,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [4860.3037, 4613.28, 4881.803, 3849.1006, 1303.5308, 3746.6904, 4893.7793, 5085.778, 5149.057, 5136.504]
2025-09-16 11:53:56,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:53:56,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 21 minutes, 58 seconds)
2025-09-16 11:55:35,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:55:45,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4004.03516 ± 1084.997
2025-09-16 11:55:45,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [3080.3635, 4760.803, 4965.548, 5065.23, 4268.6885, 4185.293, 4587.7515, 2782.4023, 4749.406, 1594.8668]
2025-09-16 11:55:45,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:55:45,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 20 minutes, 8 seconds)
2025-09-16 11:57:24,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:57:35,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4211.12402 ± 1057.856
2025-09-16 11:57:35,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [4977.3354, 5048.619, 3950.821, 4790.994, 3362.9292, 5092.694, 2469.4272, 5010.4707, 2322.916, 5085.037]
2025-09-16 11:57:35,542 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:57:35,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 18 minutes, 17 seconds)
2025-09-16 11:59:14,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 11:59:24,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4915.54297 ± 303.418
2025-09-16 11:59:24,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [5005.452, 5163.311, 4895.261, 4832.318, 5028.7754, 5081.195, 5140.351, 5100.402, 4837.018, 4071.344]
2025-09-16 11:59:24,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:59:24,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (4915.54) for latency 15
2025-09-16 11:59:25,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 16 minutes, 26 seconds)
2025-09-16 12:01:04,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:01:14,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4828.17090 ± 355.329
2025-09-16 12:01:14,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [4943.7803, 4988.1924, 4956.2446, 5031.523, 5012.0254, 3929.8357, 5011.5625, 5068.696, 4972.8115, 4367.037]
2025-09-16 12:01:14,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:01:14,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 14 minutes, 37 seconds)
2025-09-16 12:02:54,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:03:04,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4984.12988 ± 176.259
2025-09-16 12:03:04,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [5097.7217, 5085.6123, 5105.6997, 4987.1724, 5064.6216, 5113.1, 4731.304, 5103.628, 4570.1646, 4982.2725]
2025-09-16 12:03:04,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:03:04,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (4984.13) for latency 15
2025-09-16 12:03:04,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 12 minutes, 48 seconds)
2025-09-16 12:04:44,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:04:54,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 3827.31982 ± 1419.219
2025-09-16 12:04:54,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [4871.466, 4901.9043, 4626.3887, 4772.696, 1280.2511, 4904.9067, 1346.3396, 5066.5005, 3676.5771, 2826.1702]
2025-09-16 12:04:54,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:04:54,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 58 seconds)
2025-09-16 12:06:33,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:06:44,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4992.79541 ± 110.716
2025-09-16 12:06:44,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [4944.5474, 4836.4077, 4797.101, 5097.5996, 4956.411, 5090.788, 5136.55, 5101.076, 5022.688, 4944.783]
2025-09-16 12:06:44,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:06:44,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1226 [INFO]: New best (4992.80) for latency 15
2025-09-16 12:06:44,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes, 8 seconds)
2025-09-16 12:08:23,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:08:34,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4447.15576 ± 1080.847
2025-09-16 12:08:34,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [4300.0273, 5119.9985, 4781.7515, 5050.2305, 4343.6616, 4415.948, 4932.9556, 5151.235, 1341.621, 5034.128]
2025-09-16 12:08:34,235 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:08:34,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 19 seconds)
2025-09-16 12:10:13,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:10:23,997 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4369.53809 ± 1067.167
2025-09-16 12:10:23,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [4750.7427, 5065.807, 4970.254, 5140.1914, 4746.004, 2225.2563, 4439.412, 5086.2935, 2318.8433, 4952.5737]
2025-09-16 12:10:23,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:10:24,010 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 29 seconds)
2025-09-16 12:12:03,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:12:13,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4749.06152 ± 509.113
2025-09-16 12:12:13,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [4791.9434, 4962.978, 3251.4841, 4839.0254, 5032.6475, 4983.281, 4739.042, 5056.1724, 4856.152, 4977.888]
2025-09-16 12:12:13,704 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:12:13,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 39 seconds)
2025-09-16 12:13:52,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:14:03,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4811.41406 ± 507.340
2025-09-16 12:14:03,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [5050.8633, 5054.5312, 4991.718, 5010.495, 5087.201, 3325.357, 4777.3735, 4894.1797, 4806.22, 5116.202]
2025-09-16 12:14:03,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:14:03,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 49 seconds)
2025-09-16 12:15:41,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1214 [DEBUG]: Evaluating for latency 15...
2025-09-16 12:15:52,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1221 [DEBUG]: Total Reward: 4838.61133 ± 300.133
2025-09-16 12:15:52,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1222 [DEBUG]: All rewards: [5214.0786, 4159.6367, 4975.9844, 4733.576, 5139.2314, 5025.769, 5026.341, 4843.1167, 4745.262, 4523.116]
2025-09-16 12:15:52,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:15:52,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisepromille50-halfcheetah):1251 [DEBUG]: Training session finished
