2025-09-16 09:12:37,375 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1108 [DEBUG]: logdir: _logs/noise-eval-v2/halfcheetah/bpql-noise_0.000-delay_18
2025-09-16 09:12:37,375 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1109 [DEBUG]: trainer_prefix: noise-eval-v2/halfcheetah/bpql-noise_0.000-delay_18
2025-09-16 09:12:37,375 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'18': <latency_env.delayed_mdp.ConstantDelay object at 0x15014a258890>}
2025-09-16 09:12:37,375 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1111 [DEBUG]: using device: cuda
2025-09-16 09:12:37,379 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-16 09:12:37,397 baseline-bpql-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=125, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-16 09:12:37,397 baseline-bpql-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-16 09:12:38,344 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-16 09:12:38,344 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-16 09:14:12,855 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 09:14:24,089 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: -355.45978 ± 44.102
2025-09-16 09:14:24,089 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [-350.02698, -376.11032, -338.27847, -325.05084, -304.38922, -272.82025, -409.65698, -366.42538, -396.253, -415.5864]
2025-09-16 09:14:24,089 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:14:24,089 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (-355.46) for latency 18
2025-09-16 09:14:24,094 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 54 minutes, 29 seconds)
2025-09-16 09:16:04,083 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 09:16:15,287 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: -319.07251 ± 51.388
2025-09-16 09:16:15,288 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [-306.9736, -284.468, -292.94672, -257.21756, -332.97107, -318.5972, -381.38602, -247.11127, -347.57153, -421.48224]
2025-09-16 09:16:15,288 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:16:15,288 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (-319.07) for latency 18
2025-09-16 09:16:15,305 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 57 minutes, 11 seconds)
2025-09-16 09:17:55,381 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 09:18:06,602 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: -120.50047 ± 135.346
2025-09-16 09:18:06,602 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [-237.66968, -123.99177, -13.093946, -24.97223, -252.89055, -38.146927, 171.90619, -185.68732, -227.15817, -273.30026]
2025-09-16 09:18:06,602 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:18:06,602 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (-120.50) for latency 18
2025-09-16 09:18:06,605 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 56 minutes, 53 seconds)
2025-09-16 09:19:46,761 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 09:19:57,816 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: -47.20266 ± 147.569
2025-09-16 09:19:57,816 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [88.704414, 160.62955, 102.49795, -140.70547, -44.798717, -136.10066, -50.10938, -325.78812, 76.30369, -202.65985]
2025-09-16 09:19:57,816 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:19:57,816 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (-47.20) for latency 18
2025-09-16 09:19:57,821 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 55 minutes, 47 seconds)
2025-09-16 09:21:37,994 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 09:21:49,078 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 25.13198 ± 107.107
2025-09-16 09:21:49,079 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [-33.284634, 107.91189, 164.70172, 38.018063, 60.274384, 46.504364, 191.04398, -81.17185, -147.6231, -95.055]
2025-09-16 09:21:49,079 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:21:49,079 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (25.13) for latency 18
2025-09-16 09:21:49,097 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 54 minutes, 24 seconds)
2025-09-16 09:23:29,456 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 09:23:40,488 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 251.93375 ± 101.596
2025-09-16 09:23:40,489 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [391.558, 151.11388, 212.89719, 269.39044, 367.67432, 354.24747, 155.84703, 217.62683, 73.08877, 325.89368]
2025-09-16 09:23:40,489 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:23:40,489 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (251.93) for latency 18
2025-09-16 09:23:40,497 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 54 minutes, 20 seconds)
2025-09-16 09:25:19,772 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 09:25:30,994 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 523.54901 ± 96.336
2025-09-16 09:25:30,994 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [562.7742, 607.23816, 602.1193, 385.98233, 462.81375, 570.22424, 536.0009, 633.1196, 322.79227, 552.4251]
2025-09-16 09:25:30,994 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:25:30,994 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (523.55) for latency 18
2025-09-16 09:25:31,000 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 52 minutes, 15 seconds)
2025-09-16 09:27:10,086 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 09:27:21,315 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 472.10580 ± 136.333
2025-09-16 09:27:21,315 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [374.5839, 236.81396, 417.1062, 599.7449, 459.91498, 582.662, 633.2225, 375.17783, 677.7416, 364.09018]
2025-09-16 09:27:21,315 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:27:21,323 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 50 minutes, 6 seconds)
2025-09-16 09:29:00,299 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 09:29:11,502 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 722.87299 ± 121.187
2025-09-16 09:29:11,503 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [745.8441, 501.9694, 856.5662, 609.99316, 818.9495, 897.97314, 787.54486, 644.5314, 600.583, 764.7752]
2025-09-16 09:29:11,503 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:29:11,503 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (722.87) for latency 18
2025-09-16 09:29:11,518 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 47 minutes, 57 seconds)
2025-09-16 09:30:50,502 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 09:31:01,654 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 846.96326 ± 134.719
2025-09-16 09:31:01,654 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [715.5089, 1073.3611, 988.8493, 784.22363, 987.06384, 705.81586, 782.9332, 882.3555, 904.29504, 645.22644]
2025-09-16 09:31:01,654 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:31:01,654 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (846.96) for latency 18
2025-09-16 09:31:01,674 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 45 minutes, 46 seconds)
2025-09-16 09:32:40,655 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 09:32:51,939 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 778.05835 ± 104.605
2025-09-16 09:32:51,939 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [746.4202, 555.9866, 705.2939, 748.33765, 715.3298, 922.74884, 912.33026, 871.8457, 810.26294, 792.0278]
2025-09-16 09:32:51,939 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:32:51,941 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 43 minutes, 35 seconds)
2025-09-16 09:34:30,856 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 09:34:42,005 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1027.87793 ± 135.689
2025-09-16 09:34:42,005 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1015.6479, 1110.1532, 838.43317, 1334.8983, 962.14966, 1026.1846, 962.1147, 975.1058, 1160.4832, 893.6081]
2025-09-16 09:34:42,005 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:34:42,005 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (1027.88) for latency 18
2025-09-16 09:34:42,010 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 41 minutes, 37 seconds)
2025-09-16 09:36:20,950 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 09:36:32,117 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1012.28973 ± 88.978
2025-09-16 09:36:32,117 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1026.5967, 874.89685, 969.6551, 1065.2131, 938.72455, 896.2933, 1163.63, 1023.05225, 1125.7125, 1039.1228]
2025-09-16 09:36:32,117 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:36:32,119 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 39 minutes, 43 seconds)
2025-09-16 09:38:11,107 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 09:38:22,384 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1232.77612 ± 249.127
2025-09-16 09:38:22,384 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1117.9001, 1363.5835, 1362.7668, 1088.6088, 1089.3673, 1250.8158, 1332.1274, 1823.7527, 871.7821, 1027.0557]
2025-09-16 09:38:22,384 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:38:22,384 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (1232.78) for latency 18
2025-09-16 09:38:22,404 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 37 minutes, 55 seconds)
2025-09-16 09:40:01,365 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 09:40:12,520 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1125.33984 ± 252.251
2025-09-16 09:40:12,520 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1435.0983, 665.9465, 1420.3004, 1141.8556, 1391.5048, 939.1524, 1242.8354, 1066.4834, 1167.6207, 782.6001]
2025-09-16 09:40:12,520 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:40:12,528 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 36 minutes, 4 seconds)
2025-09-16 09:41:51,521 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 09:42:02,682 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1345.09509 ± 361.544
2025-09-16 09:42:02,682 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1298.8944, 1518.6248, 1342.9938, 890.2933, 857.4341, 2130.6309, 1525.1772, 1556.7683, 976.5095, 1353.6245]
2025-09-16 09:42:02,682 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:42:02,683 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (1345.10) for latency 18
2025-09-16 09:42:02,687 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 34 minutes, 12 seconds)
2025-09-16 09:43:41,669 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 09:43:52,759 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1258.84900 ± 251.667
2025-09-16 09:43:52,759 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [936.9186, 1527.2743, 1441.8278, 960.67487, 1087.6891, 1473.1759, 1598.6376, 895.7492, 1304.315, 1362.2289]
2025-09-16 09:43:52,759 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:43:52,793 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 32 minutes, 22 seconds)
2025-09-16 09:45:31,886 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 09:45:43,006 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1086.54932 ± 156.547
2025-09-16 09:45:43,006 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1156.0244, 870.53186, 1035.2056, 1111.0028, 1216.1895, 1275.668, 792.795, 1181.9949, 969.9175, 1256.1635]
2025-09-16 09:45:43,006 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:45:43,011 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 30 minutes, 34 seconds)
2025-09-16 09:47:21,975 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 09:47:33,084 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1375.95251 ± 414.346
2025-09-16 09:47:33,085 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1615.9194, 1071.7883, 1034.0994, 1779.1624, 2061.9956, 1066.4502, 1186.6019, 1116.2688, 859.22406, 1968.0153]
2025-09-16 09:47:33,085 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:47:33,085 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (1375.95) for latency 18
2025-09-16 09:47:33,092 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 28 minutes, 41 seconds)
2025-09-16 09:49:12,047 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 09:49:23,215 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1381.10522 ± 380.232
2025-09-16 09:49:23,215 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1463.0774, 1161.4904, 1632.9257, 991.5578, 2012.2753, 1367.3691, 1180.1138, 918.0585, 2027.0903, 1057.0941]
2025-09-16 09:49:23,215 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:49:23,215 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (1381.11) for latency 18
2025-09-16 09:49:23,221 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 26 minutes, 51 seconds)
2025-09-16 09:51:02,140 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 09:51:13,326 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1177.76440 ± 250.681
2025-09-16 09:51:13,326 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [994.02167, 952.1954, 1190.7832, 1112.7655, 981.9995, 1106.8788, 1864.8744, 1295.2979, 1199.1884, 1079.6384]
2025-09-16 09:51:13,326 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:51:13,332 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 25 minutes)
2025-09-16 09:52:52,281 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 09:53:03,413 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1593.81763 ± 523.076
2025-09-16 09:53:03,413 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1006.1856, 1734.9911, 953.34436, 1860.7618, 2582.17, 2186.8806, 1216.1758, 1538.1168, 1854.7899, 1004.7585]
2025-09-16 09:53:03,413 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:53:03,414 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (1593.82) for latency 18
2025-09-16 09:53:03,427 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 23 minutes, 9 seconds)
2025-09-16 09:54:42,407 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 09:54:53,545 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1578.62085 ± 323.739
2025-09-16 09:54:53,545 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1404.6057, 1455.9023, 2095.257, 1028.4779, 1179.1389, 1790.4128, 1551.2953, 1493.8302, 1788.0555, 1999.2327]
2025-09-16 09:54:53,545 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:54:53,554 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 21 minutes, 18 seconds)
2025-09-16 09:56:32,493 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 09:56:43,549 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1707.99731 ± 500.488
2025-09-16 09:56:43,549 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [2322.5642, 2336.372, 2342.2102, 1808.9349, 1054.1965, 1778.4255, 1555.0344, 1130.5498, 1769.5452, 982.1391]
2025-09-16 09:56:43,549 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:56:43,549 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (1708.00) for latency 18
2025-09-16 09:56:43,554 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 19 minutes, 27 seconds)
2025-09-16 09:58:22,526 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 09:58:33,538 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2069.23022 ± 685.914
2025-09-16 09:58:33,538 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [2757.1736, 1713.2866, 1462.9795, 1258.5327, 1616.3696, 1847.8785, 3153.601, 3211.2292, 2179.72, 1491.5317]
2025-09-16 09:58:33,538 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 09:58:33,538 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (2069.23) for latency 18
2025-09-16 09:58:33,550 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 17 minutes, 34 seconds)
2025-09-16 10:00:12,519 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 10:00:23,745 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1551.05225 ± 491.483
2025-09-16 10:00:23,745 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1232.005, 1235.8652, 1083.3484, 2478.3298, 2202.1323, 1506.7391, 1225.0178, 2139.0823, 1296.1736, 1111.8287]
2025-09-16 10:00:23,745 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:00:23,748 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 15 minutes, 46 seconds)
2025-09-16 10:02:02,719 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 10:02:13,823 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2400.36353 ± 668.707
2025-09-16 10:02:13,823 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [2776.112, 1395.744, 1367.613, 1940.3894, 3150.993, 2462.606, 3301.1382, 3121.373, 2462.3691, 2025.2975]
2025-09-16 10:02:13,823 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:02:13,823 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (2400.36) for latency 18
2025-09-16 10:02:13,833 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 13 minutes, 55 seconds)
2025-09-16 10:03:52,796 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 10:04:03,951 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1546.86621 ± 641.835
2025-09-16 10:04:03,951 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1147.9329, 1081.3324, 1216.2238, 1464.9878, 1379.4609, 3333.1243, 1716.0393, 1129.418, 1182.5616, 1817.5818]
2025-09-16 10:04:03,951 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:04:03,959 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 12 minutes, 5 seconds)
2025-09-16 10:05:42,976 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 10:05:54,165 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 1909.19556 ± 712.349
2025-09-16 10:05:54,165 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1926.9918, 1156.9568, 1450.2805, 1910.3784, 3187.9028, 2969.9944, 1538.8922, 2382.6516, 840.04865, 1727.8568]
2025-09-16 10:05:54,165 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:05:54,169 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 10 minutes, 18 seconds)
2025-09-16 10:07:33,200 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 10:07:44,348 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2291.62061 ± 901.284
2025-09-16 10:07:44,348 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [3160.1497, 3045.6077, 3204.354, 1142.836, 1367.009, 3375.2625, 1492.8054, 3133.876, 1562.1611, 1432.1439]
2025-09-16 10:07:44,348 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:07:44,355 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 8 minutes, 31 seconds)
2025-09-16 10:09:23,331 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 10:09:34,548 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2725.56519 ± 600.170
2025-09-16 10:09:34,548 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [2749.77, 1545.5469, 2911.9258, 3352.544, 2063.6345, 3591.4304, 2454.935, 2537.748, 3415.5862, 2632.5315]
2025-09-16 10:09:34,548 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:09:34,548 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (2725.57) for latency 18
2025-09-16 10:09:34,552 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 6 minutes, 41 seconds)
2025-09-16 10:11:13,674 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 10:11:24,817 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2880.32959 ± 1161.745
2025-09-16 10:11:24,817 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1295.7577, 1203.9205, 3953.5503, 4048.8032, 3016.466, 2926.139, 3579.7517, 1076.8618, 3926.397, 3775.6487]
2025-09-16 10:11:24,817 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:11:24,817 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (2880.33) for latency 18
2025-09-16 10:11:24,822 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 4 minutes, 53 seconds)
2025-09-16 10:13:03,806 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 10:13:14,944 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 2984.19775 ± 1262.823
2025-09-16 10:13:14,944 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [4181.061, 4078.0464, 1247.3069, 1295.1592, 4424.1177, 4275.97, 1961.3341, 1492.4631, 3525.572, 3360.9478]
2025-09-16 10:13:14,944 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:13:14,944 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (2984.20) for latency 18
2025-09-16 10:13:14,960 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 3 minutes, 3 seconds)
2025-09-16 10:14:53,878 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 10:15:04,993 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 4075.60938 ± 875.777
2025-09-16 10:15:04,993 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [4724.9233, 4422.107, 4919.067, 3062.5713, 4719.9023, 4655.077, 4782.0806, 3080.3289, 2300.8567, 4089.182]
2025-09-16 10:15:04,993 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:15:04,993 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (4075.61) for latency 18
2025-09-16 10:15:05,012 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 1 minute, 11 seconds)
2025-09-16 10:16:44,024 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 10:16:55,243 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 4450.21875 ± 1047.975
2025-09-16 10:16:55,244 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [4693.0356, 4867.4966, 4686.556, 4693.3877, 4653.4346, 4876.705, 4866.3975, 1323.4468, 4825.0806, 5016.6455]
2025-09-16 10:16:55,244 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:16:55,244 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (4450.22) for latency 18
2025-09-16 10:16:55,248 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 59 minutes, 21 seconds)
2025-09-16 10:18:34,221 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 10:18:45,338 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 4636.30713 ± 106.805
2025-09-16 10:18:45,339 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [4609.5596, 4613.879, 4729.423, 4583.6104, 4530.869, 4767.524, 4499.722, 4782.5806, 4750.7456, 4495.156]
2025-09-16 10:18:45,339 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:18:45,339 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (4636.31) for latency 18
2025-09-16 10:18:45,349 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 57 minutes, 30 seconds)
2025-09-16 10:20:24,318 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 10:20:35,503 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 4137.23486 ± 1482.040
2025-09-16 10:20:35,503 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [4832.459, 4850.6953, 4903.625, 1189.9382, 4847.276, 4893.654, 4915.545, 1157.4257, 4869.9453, 4911.784]
2025-09-16 10:20:35,503 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:20:35,514 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 55 minutes, 38 seconds)
2025-09-16 10:22:14,507 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 10:22:25,567 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 4723.24121 ± 42.027
2025-09-16 10:22:25,567 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [4723.9116, 4702.927, 4728.787, 4740.031, 4609.545, 4735.855, 4773.705, 4723.6167, 4756.328, 4737.707]
2025-09-16 10:22:25,567 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:22:25,567 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (4723.24) for latency 18
2025-09-16 10:22:25,572 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 53 minutes, 47 seconds)
2025-09-16 10:24:04,643 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 10:24:15,785 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 4894.98535 ± 27.288
2025-09-16 10:24:15,785 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [4874.048, 4905.7275, 4922.1562, 4872.8413, 4876.447, 4915.0244, 4855.929, 4929.5522, 4931.964, 4866.1597]
2025-09-16 10:24:15,785 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:24:15,785 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (4894.99) for latency 18
2025-09-16 10:24:15,789 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 51 minutes, 59 seconds)
2025-09-16 10:25:54,860 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 10:26:06,070 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 4926.61572 ± 80.002
2025-09-16 10:26:06,070 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [4952.1406, 4786.8677, 4955.9478, 4883.453, 4790.5117, 4928.9087, 4960.4214, 5034.5864, 4950.904, 5022.418]
2025-09-16 10:26:06,070 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:26:06,070 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (4926.62) for latency 18
2025-09-16 10:26:06,075 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 50 minutes, 9 seconds)
2025-09-16 10:27:45,048 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 10:27:56,162 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 4748.65479 ± 1200.572
2025-09-16 10:27:56,163 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5145.2417, 1149.0275, 5125.076, 5224.2554, 5143.72, 5140.577, 5054.2163, 5175.192, 5154.2744, 5174.969]
2025-09-16 10:27:56,163 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:27:56,169 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 48 minutes, 19 seconds)
2025-09-16 10:29:35,155 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 10:29:46,243 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 4461.37793 ± 1664.274
2025-09-16 10:29:46,243 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1081.8273, 5366.2104, 5397.4033, 5296.998, 5312.5615, 5220.4917, 5154.108, 5254.7544, 5339.5044, 1189.9254]
2025-09-16 10:29:46,243 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:29:46,280 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 46 minutes, 28 seconds)
2025-09-16 10:31:25,264 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 10:31:36,308 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5313.61084 ± 47.196
2025-09-16 10:31:36,308 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5291.3296, 5294.6655, 5375.613, 5329.711, 5282.491, 5233.752, 5372.404, 5373.1914, 5322.9663, 5259.983]
2025-09-16 10:31:36,308 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:31:36,308 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (5313.61) for latency 18
2025-09-16 10:31:36,324 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 44 minutes, 38 seconds)
2025-09-16 10:33:15,326 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 10:33:26,358 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5313.59619 ± 24.077
2025-09-16 10:33:26,359 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5326.1597, 5276.862, 5310.6665, 5330.7334, 5314.1455, 5362.7495, 5317.599, 5277.0063, 5300.259, 5319.7827]
2025-09-16 10:33:26,359 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:33:26,365 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 42 minutes, 46 seconds)
2025-09-16 10:35:05,521 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 10:35:16,702 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5361.57324 ± 76.142
2025-09-16 10:35:16,702 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5416.519, 5357.323, 5376.8433, 5356.75, 5405.3677, 5369.6777, 5391.79, 5424.744, 5373.442, 5143.274]
2025-09-16 10:35:16,702 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:35:16,702 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (5361.57) for latency 18
2025-09-16 10:35:16,707 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 40 minutes, 56 seconds)
2025-09-16 10:36:55,763 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 10:37:06,968 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5062.08252 ± 792.834
2025-09-16 10:37:06,968 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5312.094, 2685.251, 5313.5195, 5386.5317, 5345.3315, 5289.1934, 5341.7397, 5275.0884, 5331.7827, 5340.294]
2025-09-16 10:37:06,968 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:37:06,973 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 39 minutes, 8 seconds)
2025-09-16 10:38:45,974 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 10:38:57,144 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5228.72949 ± 49.180
2025-09-16 10:38:57,144 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5228.9575, 5198.6426, 5217.966, 5269.1807, 5278.9043, 5236.145, 5201.8496, 5111.0845, 5289.612, 5254.9526]
2025-09-16 10:38:57,144 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:38:57,150 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 37 minutes, 19 seconds)
2025-09-16 10:40:36,137 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 10:40:47,367 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5451.58691 ± 71.714
2025-09-16 10:40:47,367 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5515.5728, 5379.196, 5284.0635, 5471.9097, 5498.692, 5444.463, 5429.743, 5457.392, 5484.494, 5550.3457]
2025-09-16 10:40:47,367 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:40:47,368 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (5451.59) for latency 18
2025-09-16 10:40:47,378 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 35 minutes, 30 seconds)
2025-09-16 10:42:26,319 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 10:42:37,458 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5479.87451 ± 32.874
2025-09-16 10:42:37,458 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5542.963, 5449.429, 5450.3228, 5455.444, 5490.5244, 5490.679, 5467.4717, 5435.5503, 5523.647, 5492.7134]
2025-09-16 10:42:37,458 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:42:37,458 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (5479.87) for latency 18
2025-09-16 10:42:37,463 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 33 minutes, 41 seconds)
2025-09-16 10:44:16,528 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 10:44:27,800 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5580.03662 ± 30.136
2025-09-16 10:44:27,800 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5583.3335, 5626.4995, 5572.954, 5520.35, 5563.6323, 5617.1523, 5581.128, 5610.1855, 5570.7046, 5554.4307]
2025-09-16 10:44:27,800 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:44:27,800 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (5580.04) for latency 18
2025-09-16 10:44:27,815 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 31 minutes, 51 seconds)
2025-09-16 10:46:06,745 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 10:46:17,818 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5280.13770 ± 502.205
2025-09-16 10:46:17,819 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5499.3574, 5439.909, 3787.8286, 5479.6543, 5400.6426, 5258.127, 5478.8276, 5502.921, 5481.893, 5472.218]
2025-09-16 10:46:17,819 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:46:17,836 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 29 minutes, 58 seconds)
2025-09-16 10:47:56,718 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 10:48:07,769 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5421.63916 ± 614.726
2025-09-16 10:48:07,769 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5630.6777, 5649.7188, 3581.2573, 5644.399, 5561.1953, 5692.0815, 5593.814, 5654.9863, 5564.007, 5644.2515]
2025-09-16 10:48:07,769 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:48:07,798 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 28 minutes, 6 seconds)
2025-09-16 10:49:46,710 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 10:49:57,816 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5467.35938 ± 20.592
2025-09-16 10:49:57,816 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5457.9214, 5453.5303, 5466.782, 5474.665, 5475.772, 5458.381, 5519.208, 5435.0527, 5469.3086, 5462.973]
2025-09-16 10:49:57,816 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:49:57,825 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 26 minutes, 14 seconds)
2025-09-16 10:51:36,849 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 10:51:48,134 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5581.05762 ± 149.720
2025-09-16 10:51:48,134 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5671.045, 5637.93, 5616.564, 5670.199, 5716.803, 5445.442, 5574.4756, 5189.3784, 5695.9907, 5592.746]
2025-09-16 10:51:48,134 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:51:48,134 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (5581.06) for latency 18
2025-09-16 10:51:48,141 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 24 minutes, 26 seconds)
2025-09-16 10:53:27,087 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 10:53:38,333 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5610.29639 ± 106.583
2025-09-16 10:53:38,333 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5490.9385, 5371.9736, 5707.88, 5751.0684, 5673.0522, 5543.518, 5634.5293, 5657.8374, 5618.1772, 5653.993]
2025-09-16 10:53:38,333 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:53:38,333 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (5610.30) for latency 18
2025-09-16 10:53:38,339 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 22 minutes, 34 seconds)
2025-09-16 10:55:17,283 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 10:55:28,408 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 4824.88525 ± 1178.733
2025-09-16 10:55:28,408 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [2878.8245, 5634.0107, 5466.8594, 5458.6816, 4851.504, 2181.1472, 5410.4673, 5554.9473, 5186.404, 5626.007]
2025-09-16 10:55:28,408 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:55:28,414 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 20 minutes, 45 seconds)
2025-09-16 10:57:07,307 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 10:57:18,383 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5200.70410 ± 920.473
2025-09-16 10:57:18,383 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5386.919, 5528.325, 5180.8223, 5581.099, 5590.7305, 2463.8206, 5580.966, 5572.5845, 5578.0825, 5543.6953]
2025-09-16 10:57:18,383 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:57:18,396 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 18 minutes, 55 seconds)
2025-09-16 10:58:57,282 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 10:59:08,384 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5535.20996 ± 172.142
2025-09-16 10:59:08,384 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5634.5767, 5636.6895, 5581.221, 5043.124, 5542.9033, 5614.979, 5634.355, 5456.6157, 5608.641, 5598.995]
2025-09-16 10:59:08,384 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 10:59:08,395 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 17 minutes, 4 seconds)
2025-09-16 11:00:46,804 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 11:00:57,834 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5689.21777 ± 44.144
2025-09-16 11:00:57,834 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5734.107, 5699.8516, 5631.919, 5642.038, 5683.9507, 5726.305, 5742.5015, 5744.0884, 5628.638, 5658.7817]
2025-09-16 11:00:57,834 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:00:57,834 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (5689.22) for latency 18
2025-09-16 11:00:57,862 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 15 minutes, 7 seconds)
2025-09-16 11:02:35,537 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 11:02:46,519 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5501.43945 ± 130.662
2025-09-16 11:02:46,519 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5579.7983, 5547.4287, 5599.4297, 5596.138, 5610.6616, 5345.3667, 5378.4575, 5592.5015, 5549.3535, 5215.2603]
2025-09-16 11:02:46,519 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:02:46,525 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 13 minutes, 5 seconds)
2025-09-16 11:04:24,161 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 11:04:35,268 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5516.57910 ± 51.668
2025-09-16 11:04:35,268 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5564.1084, 5502.7085, 5554.978, 5450.31, 5548.7285, 5405.1562, 5540.9785, 5552.878, 5561.2764, 5484.6646]
2025-09-16 11:04:35,268 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:04:35,274 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 11 minutes, 5 seconds)
2025-09-16 11:06:12,527 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 11:06:23,521 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5524.90771 ± 168.148
2025-09-16 11:06:23,521 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5612.8945, 5648.544, 5419.1387, 5270.404, 5735.0347, 5197.9146, 5624.8247, 5652.678, 5596.9736, 5490.672]
2025-09-16 11:06:23,521 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:06:23,532 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 9 minutes, 3 seconds)
2025-09-16 11:08:00,519 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 11:08:11,382 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5723.44482 ± 41.373
2025-09-16 11:08:11,383 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5707.092, 5758.109, 5729.165, 5664.673, 5730.4224, 5725.6133, 5763.327, 5707.4087, 5795.936, 5652.699]
2025-09-16 11:08:11,383 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:08:11,383 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (5723.44) for latency 18
2025-09-16 11:08:11,390 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 6 minutes, 58 seconds)
2025-09-16 11:09:48,408 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 11:09:59,487 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5453.90234 ± 22.131
2025-09-16 11:09:59,487 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5455.627, 5430.9165, 5437.9717, 5411.3003, 5469.2246, 5455.7256, 5469.7695, 5453.409, 5496.1177, 5458.9614]
2025-09-16 11:09:59,487 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:09:59,492 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 4 minutes, 59 seconds)
2025-09-16 11:11:36,520 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 11:11:47,440 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5602.95361 ± 44.736
2025-09-16 11:11:47,440 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5570.013, 5519.9556, 5622.774, 5596.01, 5591.752, 5567.817, 5612.509, 5698.3154, 5630.8306, 5619.557]
2025-09-16 11:11:47,440 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:11:47,448 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 3 minutes, 6 seconds)
2025-09-16 11:13:24,534 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 11:13:35,542 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5228.95117 ± 737.590
2025-09-16 11:13:35,542 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5549.699, 5537.5415, 5534.907, 5594.2417, 5535.1094, 3080.6074, 5265.9106, 5563.057, 5015.3994, 5613.0435]
2025-09-16 11:13:35,542 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:13:35,550 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 1 minute, 13 seconds)
2025-09-16 11:15:12,539 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 11:15:23,256 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5668.51855 ± 27.724
2025-09-16 11:15:23,256 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5622.002, 5636.243, 5709.524, 5646.095, 5655.9697, 5687.6465, 5682.477, 5705.0435, 5681.032, 5659.1533]
2025-09-16 11:15:23,256 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:15:23,269 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 59 minutes, 22 seconds)
2025-09-16 11:17:00,371 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 11:17:11,065 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5686.62598 ± 33.634
2025-09-16 11:17:11,066 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5658.5786, 5689.472, 5673.824, 5683.4663, 5670.5854, 5702.0957, 5722.181, 5743.8164, 5616.336, 5705.8984]
2025-09-16 11:17:11,066 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:17:11,081 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 57 minutes, 34 seconds)
2025-09-16 11:18:48,131 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 11:18:58,974 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5085.73193 ± 882.056
2025-09-16 11:18:58,974 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5679.8525, 5560.696, 3861.8655, 5663.984, 5449.408, 5245.8257, 2923.6187, 5636.7783, 5497.741, 5337.5483]
2025-09-16 11:18:58,974 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:18:58,980 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 55 minutes, 44 seconds)
2025-09-16 11:20:35,980 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 11:20:46,813 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5283.38184 ± 1011.463
2025-09-16 11:20:46,813 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5670.6367, 5580.8677, 5609.0366, 2253.3894, 5657.974, 5483.1787, 5625.406, 5656.086, 5614.9385, 5682.3096]
2025-09-16 11:20:46,813 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:20:46,831 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 53 minutes, 56 seconds)
2025-09-16 11:22:23,433 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 11:22:34,157 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5596.91357 ± 141.296
2025-09-16 11:22:34,157 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5640.236, 5623.6626, 5642.745, 5612.5557, 5627.057, 5565.124, 5711.7935, 5693.479, 5189.59, 5662.896]
2025-09-16 11:22:34,157 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:22:34,175 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 52 minutes, 4 seconds)
2025-09-16 11:24:10,893 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 11:24:21,756 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5279.48828 ± 926.707
2025-09-16 11:24:21,756 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [2501.437, 5523.8125, 5648.876, 5648.7153, 5579.264, 5598.04, 5565.2456, 5570.7656, 5587.2236, 5571.5034]
2025-09-16 11:24:21,756 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:24:21,786 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 50 minutes, 15 seconds)
2025-09-16 11:25:58,410 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 11:26:09,170 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 4981.08203 ± 1129.413
2025-09-16 11:26:09,171 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [3952.0498, 5526.213, 5523.0254, 5531.535, 5551.9536, 1894.5201, 5286.3335, 5546.4043, 5508.0874, 5490.6978]
2025-09-16 11:26:09,171 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:26:09,184 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 48 minutes, 25 seconds)
2025-09-16 11:27:45,732 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 11:27:56,593 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5538.95215 ± 51.968
2025-09-16 11:27:56,593 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5499.565, 5527.6465, 5613.0737, 5622.323, 5573.979, 5499.016, 5497.688, 5581.261, 5512.68, 5462.2866]
2025-09-16 11:27:56,593 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:27:56,624 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 46 minutes, 35 seconds)
2025-09-16 11:29:33,161 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 11:29:43,986 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 4801.78467 ± 1310.190
2025-09-16 11:29:43,986 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5244.3745, 5487.5596, 1193.7018, 5503.556, 3674.7297, 5329.061, 5437.379, 5255.5605, 5423.943, 5467.9854]
2025-09-16 11:29:43,986 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:29:43,995 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 44 minutes, 45 seconds)
2025-09-16 11:31:20,588 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 11:31:31,560 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5674.38965 ± 57.477
2025-09-16 11:31:31,561 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5663.9487, 5741.2764, 5606.1978, 5755.0776, 5600.7056, 5743.2607, 5600.323, 5691.9985, 5645.1143, 5695.9966]
2025-09-16 11:31:31,561 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:31:31,573 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 42 minutes, 59 seconds)
2025-09-16 11:33:08,112 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 11:33:18,969 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 4818.57764 ± 1215.351
2025-09-16 11:33:18,969 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [1629.9398, 5560.7065, 5559.3545, 5509.8564, 5379.7505, 5476.531, 5620.0913, 4313.028, 3776.1553, 5360.3647]
2025-09-16 11:33:18,969 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:33:18,977 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 41 minutes, 11 seconds)
2025-09-16 11:34:55,573 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 11:35:06,545 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5619.00537 ± 43.117
2025-09-16 11:35:06,545 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5694.701, 5658.1562, 5603.842, 5668.76, 5597.689, 5617.192, 5584.4224, 5633.044, 5541.841, 5590.411]
2025-09-16 11:35:06,545 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:35:06,559 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 39 minutes, 24 seconds)
2025-09-16 11:36:43,115 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 11:36:53,911 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5724.73389 ± 38.850
2025-09-16 11:36:53,911 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5692.8086, 5710.7676, 5738.3115, 5770.949, 5741.1274, 5720.9517, 5646.4043, 5787.9834, 5742.1924, 5695.845]
2025-09-16 11:36:53,911 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:36:53,911 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (5724.73) for latency 18
2025-09-16 11:36:53,948 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 37 minutes, 36 seconds)
2025-09-16 11:38:30,543 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 11:38:41,336 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5652.96826 ± 262.670
2025-09-16 11:38:41,336 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5749.3457, 5774.7046, 5763.7227, 5751.439, 5743.743, 5720.8086, 4868.1016, 5732.6665, 5740.59, 5684.562]
2025-09-16 11:38:41,336 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:38:41,342 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 35 minutes, 49 seconds)
2025-09-16 11:40:17,946 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 11:40:28,848 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5191.71240 ± 1507.107
2025-09-16 11:40:28,849 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5675.034, 5446.678, 5698.1016, 5747.2173, 677.80743, 5741.169, 5709.712, 5747.818, 5759.679, 5713.9077]
2025-09-16 11:40:28,849 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:40:28,855 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 34 minutes, 1 second)
2025-09-16 11:42:05,362 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 11:42:16,141 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5233.12988 ± 1057.375
2025-09-16 11:42:16,141 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5594.276, 5646.5654, 5616.52, 5598.04, 2080.0874, 5659.33, 5588.1763, 5578.0884, 5712.6426, 5257.5728]
2025-09-16 11:42:16,141 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:42:16,151 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 32 minutes, 13 seconds)
2025-09-16 11:43:52,702 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 11:44:03,453 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5723.95947 ± 44.331
2025-09-16 11:44:03,453 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5756.881, 5782.0195, 5714.8843, 5680.2114, 5752.6973, 5791.9106, 5669.3145, 5672.746, 5680.0635, 5738.8677]
2025-09-16 11:44:03,453 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:44:03,462 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 30 minutes, 25 seconds)
2025-09-16 11:45:39,977 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 11:45:50,745 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5070.36670 ± 951.566
2025-09-16 11:45:50,745 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5679.101, 5483.0474, 5667.1333, 4077.2747, 5541.6255, 5528.8438, 5672.652, 4768.655, 5653.335, 2631.9983]
2025-09-16 11:45:50,745 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:45:50,758 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 28 minutes, 37 seconds)
2025-09-16 11:47:27,287 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 11:47:38,070 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5026.39404 ± 1192.741
2025-09-16 11:47:38,070 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [4140.344, 5374.9067, 5505.787, 1689.0074, 5629.087, 5553.5835, 5607.981, 5470.2153, 5590.2915, 5702.735]
2025-09-16 11:47:38,070 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:47:38,076 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 26 minutes, 50 seconds)
2025-09-16 11:49:15,418 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 11:49:26,244 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 4935.54785 ± 1073.140
2025-09-16 11:49:26,244 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5613.198, 2921.8364, 5602.2563, 5618.6636, 5613.181, 2982.5757, 5583.114, 4219.174, 5592.916, 5608.5625]
2025-09-16 11:49:26,244 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:49:26,277 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 25 minutes, 4 seconds)
2025-09-16 11:51:03,311 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 11:51:14,096 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 4689.82520 ± 1555.876
2025-09-16 11:51:14,096 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [4676.866, 5677.6733, 5546.828, 5466.2793, 5650.411, 5587.9966, 1915.3612, 5630.5728, 5387.4155, 1358.8469]
2025-09-16 11:51:14,096 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:51:14,105 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 23 minutes, 18 seconds)
2025-09-16 11:52:51,114 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 11:53:01,936 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5455.17432 ± 582.533
2025-09-16 11:53:01,936 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5679.7534, 5508.2725, 5700.022, 5694.041, 5686.254, 5577.1494, 5679.8584, 5651.413, 3716.0881, 5658.886]
2025-09-16 11:53:01,936 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:53:01,948 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 21 minutes, 32 seconds)
2025-09-16 11:54:38,955 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 11:54:49,847 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5283.75537 ± 1221.626
2025-09-16 11:54:49,847 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5707.281, 5722.824, 5673.3525, 5693.682, 5711.2944, 5581.079, 5748.6816, 5690.749, 1621.0184, 5687.591]
2025-09-16 11:54:49,847 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:54:49,856 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 19 minutes, 46 seconds)
2025-09-16 11:56:26,883 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 11:56:37,549 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5085.15576 ± 1355.115
2025-09-16 11:56:37,549 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [4369.3447, 5738.92, 5712.62, 5444.2573, 5735.782, 5727.6616, 1197.1453, 5625.3774, 5556.129, 5744.3247]
2025-09-16 11:56:37,549 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:56:37,567 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 17 minutes, 58 seconds)
2025-09-16 11:58:15,637 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 11:58:26,385 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5016.37109 ± 929.283
2025-09-16 11:58:26,385 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5626.3003, 3803.552, 3895.3008, 5617.496, 5634.6245, 5583.6816, 5624.458, 3171.3887, 5664.4307, 5542.4814]
2025-09-16 11:58:26,385 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 11:58:26,393 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 16 minutes, 12 seconds)
2025-09-16 12:00:03,925 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 12:00:14,711 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5591.47217 ± 350.548
2025-09-16 12:00:14,711 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [4551.6777, 5762.2915, 5719.801, 5686.1064, 5731.279, 5670.7827, 5573.708, 5748.414, 5756.446, 5714.2188]
2025-09-16 12:00:14,712 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:00:14,729 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 14 minutes, 24 seconds)
2025-09-16 12:01:52,327 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 12:02:03,143 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 4625.60938 ± 1649.252
2025-09-16 12:02:03,143 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [3529.0977, 1147.5924, 5582.393, 5512.963, 5586.044, 5613.694, 5734.3276, 5735.9385, 2062.4377, 5751.6094]
2025-09-16 12:02:03,143 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:02:03,152 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 12 minutes, 37 seconds)
2025-09-16 12:03:40,701 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 12:03:51,511 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5730.15088 ± 101.623
2025-09-16 12:03:51,511 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5521.2007, 5811.2646, 5813.4233, 5786.556, 5832.5977, 5706.1597, 5566.5493, 5715.491, 5751.8027, 5796.463]
2025-09-16 12:03:51,511 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:03:51,512 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1226 [INFO]: New best (5730.15) for latency 18
2025-09-16 12:03:51,518 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 10 minutes, 49 seconds)
2025-09-16 12:05:28,987 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 12:05:39,863 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5617.51025 ± 51.050
2025-09-16 12:05:39,863 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5601.449, 5627.4146, 5615.767, 5587.48, 5554.8364, 5605.597, 5659.2515, 5624.3027, 5557.5933, 5741.412]
2025-09-16 12:05:39,863 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:05:39,874 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes, 2 seconds)
2025-09-16 12:07:17,410 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 12:07:28,251 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5704.77588 ± 70.929
2025-09-16 12:07:28,251 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5727.8804, 5661.6904, 5741.3794, 5754.514, 5516.0327, 5738.9717, 5786.2417, 5686.0444, 5716.8457, 5718.157]
2025-09-16 12:07:28,251 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:07:28,258 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 13 seconds)
2025-09-16 12:09:05,793 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 12:09:16,591 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 4681.58691 ± 1230.999
2025-09-16 12:09:16,591 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [3561.1855, 5486.494, 5580.222, 5668.0806, 5682.6533, 2493.1404, 5578.334, 2822.3704, 5761.98, 4181.4097]
2025-09-16 12:09:16,591 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:09:16,599 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 25 seconds)
2025-09-16 12:10:54,192 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 12:11:05,010 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5247.56543 ± 1166.743
2025-09-16 12:11:05,010 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5691.431, 5757.29, 1876.0612, 4717.28, 5833.531, 5669.885, 5788.3174, 5549.2275, 5781.451, 5811.179]
2025-09-16 12:11:05,010 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:11:05,046 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 36 seconds)
2025-09-16 12:12:42,623 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 12:12:53,466 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5561.40039 ± 51.507
2025-09-16 12:12:53,466 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5569.907, 5642.9307, 5456.373, 5576.1016, 5571.3296, 5555.14, 5597.0757, 5482.384, 5588.836, 5573.9263]
2025-09-16 12:12:53,466 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:12:53,477 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 48 seconds)
2025-09-16 12:14:30,880 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1214 [DEBUG]: Evaluating for latency 18...
2025-09-16 12:14:41,804 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1221 [DEBUG]: Total Reward: 5484.47656 ± 708.997
2025-09-16 12:14:41,804 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1222 [DEBUG]: All rewards: [5760.4233, 5517.4146, 5833.988, 5710.1157, 5823.7114, 3374.2512, 5640.4917, 5740.8276, 5774.9927, 5668.551]
2025-09-16 12:14:41,804 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-16 12:14:41,812 latency_env.delayed_mdp:training_loop(baseline-bpql-halfcheetah):1251 [DEBUG]: Training session finished
