2025-05-11 10:42:56,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-bpql-mem2
2025-05-11 10:42:56,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-bpql-mem2
2025-05-11 10:42:56,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x74e1c49cde80>}
2025-05-11 10:42:56,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1111 [DEBUG]: using device: cpu
2025-05-11 10:42:56,179 baseline-bpql-noisy-halfcheetah:77 [WARNING]: args.assumed_delay != args.horizon: 2 != 24
2025-05-11 10:42:56,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1133 [INFO]: Creating new trainer
2025-05-11 10:42:56,188 baseline-bpql-noisy-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=29, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-05-11 10:42:56,188 baseline-bpql-noisy-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-11 10:42:56,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1194 [DEBUG]: Starting training session...
2025-05-11 10:42:56,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 1/100
2025-05-11 10:45:34,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 10:45:46,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -280.37766 ± 49.418
2025-05-11 10:45:46,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-270.10718, -263.49893, -242.8077, -226.90224, -296.98798, -326.21555, -401.4633, -258.02728, -284.35403, -233.41234]
2025-05-11 10:45:46,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 10:45:46,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (-280.38) for latency ExtremeClogL1U23
2025-05-11 10:45:46,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 10:45:46,258 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 10:45:46,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 4 hours, 40 minutes, 13 seconds)
2025-05-11 10:48:34,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 10:48:45,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 157.26231 ± 143.869
2025-05-11 10:48:45,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [174.909, 190.17525, 299.24225, 359.76285, 344.886, -78.95174, -58.01441, 119.46862, 135.99136, 85.154]
2025-05-11 10:48:45,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 10:48:45,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (157.26) for latency ExtremeClogL1U23
2025-05-11 10:48:45,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 10:48:45,282 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 10:48:45,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 4 hours, 44 minutes, 53 seconds)
2025-05-11 10:51:36,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 10:51:48,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 281.62222 ± 254.930
2025-05-11 10:51:48,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-33.334385, 96.14095, 391.24445, 543.4261, 449.54877, 382.61368, -27.730783, 54.72222, 772.7165, 186.87444]
2025-05-11 10:51:48,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 10:51:48,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (281.62) for latency ExtremeClogL1U23
2025-05-11 10:51:48,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 10:51:48,638 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 10:51:48,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 4 hours, 46 minutes, 48 seconds)
2025-05-11 10:54:40,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 10:54:52,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1147.64062 ± 451.481
2025-05-11 10:54:52,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1628.6377, 89.28337, 1475.497, 1559.6674, 1503.6351, 1334.6049, 764.1702, 1049.0289, 854.00836, 1217.8723]
2025-05-11 10:54:52,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 10:54:52,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (1147.64) for latency ExtremeClogL1U23
2025-05-11 10:54:52,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 10:54:52,071 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 10:54:52,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 4 hours, 46 minutes, 15 seconds)
2025-05-11 10:57:42,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 10:57:54,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1469.82996 ± 419.906
2025-05-11 10:57:54,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1906.2786, 1442.5024, 1514.3529, 1705.8467, 1296.4711, 1809.5682, 341.11484, 1419.1077, 1772.8685, 1490.1898]
2025-05-11 10:57:54,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 10:57:54,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (1469.83) for latency ExtremeClogL1U23
2025-05-11 10:57:54,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 10:57:54,542 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 10:57:54,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 4 hours, 44 minutes, 24 seconds)
2025-05-11 11:00:44,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 11:00:57,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1217.21313 ± 1001.055
2025-05-11 11:00:57,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [506.6993, 2545.3933, 2126.767, -15.072681, 2270.8945, 1839.0334, 1862.2625, -161.39809, 1242.1716, -44.61869]
2025-05-11 11:00:57,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:00:57,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 4 hours, 45 minutes, 25 seconds)
2025-05-11 11:03:46,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 11:03:58,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1384.08679 ± 704.305
2025-05-11 11:03:58,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2096.006, 1376.8285, 1735.2994, 1624.1508, 624.3525, 717.469, -102.25166, 1586.2006, 2104.7307, 2078.083]
2025-05-11 11:03:58,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:03:58,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 4 hours, 43 minutes, 4 seconds)
2025-05-11 11:06:54,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 11:07:07,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2234.26343 ± 222.426
2025-05-11 11:07:07,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2293.3252, 2320.6482, 2368.7852, 2069.8943, 1673.821, 2247.2532, 2135.0042, 2388.7402, 2323.179, 2521.9827]
2025-05-11 11:07:07,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:07:07,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (2234.26) for latency ExtremeClogL1U23
2025-05-11 11:07:07,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 11:07:07,214 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 11:07:07,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 4 hours, 41 minutes, 41 seconds)
2025-05-11 11:09:57,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 11:10:10,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2410.77271 ± 335.225
2025-05-11 11:10:10,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2601.711, 2311.5972, 1785.7537, 2206.8635, 3062.901, 2534.9207, 2559.1387, 2579.3904, 2019.7592, 2445.6892]
2025-05-11 11:10:10,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:10:10,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (2410.77) for latency ExtremeClogL1U23
2025-05-11 11:10:10,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 11:10:10,099 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 11:10:10,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 4 hours, 38 minutes, 28 seconds)
2025-05-11 11:12:59,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 11:13:12,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2216.42822 ± 513.011
2025-05-11 11:13:12,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2211.546, 2747.3198, 2120.0417, 787.5931, 2494.4775, 2055.0396, 2459.4788, 2432.304, 2440.7332, 2415.75]
2025-05-11 11:13:12,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:13:12,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 4 hours, 35 minutes, 25 seconds)
2025-05-11 11:16:06,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 11:16:20,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2423.44702 ± 630.806
2025-05-11 11:16:20,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2487.6467, 2720.6245, 2780.6785, 2175.5103, 2682.1216, 2453.7068, 2755.9546, 2313.2568, 3162.3254, 702.6448]
2025-05-11 11:16:20,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:16:20,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (2423.45) for latency ExtremeClogL1U23
2025-05-11 11:16:20,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 11:16:20,206 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 11:16:20,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 4 hours, 33 minutes, 50 seconds)
2025-05-11 11:19:14,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 11:19:28,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3028.76978 ± 358.943
2025-05-11 11:19:28,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2444.6284, 3359.729, 2607.5098, 3058.33, 3442.8037, 3532.172, 2763.6755, 3166.8274, 3215.3853, 2696.6375]
2025-05-11 11:19:28,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:19:28,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (3028.77) for latency ExtremeClogL1U23
2025-05-11 11:19:28,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 11:19:28,108 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 11:19:28,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 4 hours, 32 minutes, 42 seconds)
2025-05-11 11:22:21,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 11:22:35,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2752.73657 ± 710.216
2025-05-11 11:22:35,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2681.024, 3105.7068, 2609.4502, 2695.6636, 3071.792, 967.55035, 3837.5068, 3130.1008, 2340.2698, 3088.3022]
2025-05-11 11:22:35,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:22:35,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 4 hours, 29 minutes, 3 seconds)
2025-05-11 11:25:29,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 11:25:43,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3173.81372 ± 375.346
2025-05-11 11:25:43,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3526.5808, 2413.143, 3306.1953, 3646.841, 3395.055, 2739.3315, 3296.657, 2767.6719, 3291.4836, 3355.175]
2025-05-11 11:25:43,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:25:43,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (3173.81) for latency ExtremeClogL1U23
2025-05-11 11:25:43,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 11:25:43,143 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 11:25:43,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 4 hours, 27 minutes, 28 seconds)
2025-05-11 11:28:37,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 11:28:50,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3215.13843 ± 474.636
2025-05-11 11:28:50,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3829.4531, 3324.9197, 3432.4368, 3677.7734, 2187.8967, 3436.2673, 2858.714, 3576.6348, 3084.6238, 2742.6663]
2025-05-11 11:28:50,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:28:50,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (3215.14) for latency ExtremeClogL1U23
2025-05-11 11:28:50,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 11:28:50,735 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 11:28:50,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 4 hours, 25 minutes, 48 seconds)
2025-05-11 11:31:45,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 11:31:58,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2934.43628 ± 931.281
2025-05-11 11:31:58,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2237.3677, 3449.8718, 4256.4604, 3643.167, 3628.7373, 744.4438, 2857.1003, 3319.5054, 2704.4998, 2503.2095]
2025-05-11 11:31:58,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:31:58,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 4 hours, 22 minutes, 40 seconds)
2025-05-11 11:34:53,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 11:35:06,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3074.33057 ± 1030.297
2025-05-11 11:35:06,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3267.3464, 3666.5146, 3697.6936, 472.22232, 3542.2698, 2125.4307, 3299.5479, 3130.412, 4450.5947, 3091.2744]
2025-05-11 11:35:06,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:35:06,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 4 hours, 19 minutes, 37 seconds)
2025-05-11 11:38:01,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 11:38:14,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3639.66553 ± 268.777
2025-05-11 11:38:14,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3311.6062, 3279.7063, 3905.2102, 3608.9302, 3812.8015, 3981.455, 3958.605, 3259.0366, 3538.752, 3740.5537]
2025-05-11 11:38:14,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:38:14,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (3639.67) for latency ExtremeClogL1U23
2025-05-11 11:38:14,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 11:38:14,265 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 11:38:14,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 4 hours, 16 minutes, 43 seconds)
2025-05-11 11:41:06,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 11:41:19,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3536.23975 ± 347.187
2025-05-11 11:41:19,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3711.062, 3776.3547, 3475.2546, 3092.433, 3049.9534, 3088.0168, 3453.669, 4033.0828, 3981.517, 3701.0527]
2025-05-11 11:41:19,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:41:19,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 4 hours, 12 minutes, 48 seconds)
2025-05-11 11:44:09,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 11:44:21,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3637.37231 ± 640.525
2025-05-11 11:44:21,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3815.321, 2112.7925, 3571.3867, 4019.3943, 3612.485, 3789.0012, 3578.1152, 4765.9434, 3179.188, 3930.096]
2025-05-11 11:44:21,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:44:21,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 4 hours, 8 minutes, 17 seconds)
2025-05-11 11:47:09,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 11:47:21,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3605.49683 ± 489.421
2025-05-11 11:47:21,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2997.3665, 3745.5144, 3202.6787, 4272.839, 3907.7773, 4218.184, 3454.9502, 4060.473, 2792.6863, 3402.4985]
2025-05-11 11:47:21,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:47:21,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 4 hours, 3 minutes, 5 seconds)
2025-05-11 11:50:05,328 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 11:50:17,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4072.18433 ± 280.247
2025-05-11 11:50:17,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4203.118, 4226.669, 3967.153, 4514.6797, 3976.5876, 4077.1501, 4073.0796, 3795.4695, 4401.639, 3486.297]
2025-05-11 11:50:17,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:50:17,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (4072.18) for latency ExtremeClogL1U23
2025-05-11 11:50:17,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 11:50:17,407 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 11:50:17,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 3 hours, 56 minutes, 50 seconds)
2025-05-11 11:53:03,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 11:53:15,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3798.53979 ± 290.844
2025-05-11 11:53:15,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3836.4006, 4345.3213, 3505.6023, 3686.8176, 3350.0493, 4205.1694, 3941.3796, 3840.0217, 3579.8464, 3694.7905]
2025-05-11 11:53:15,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:53:15,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 3 hours, 51 minutes, 18 seconds)
2025-05-11 11:56:01,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 11:56:13,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3798.88867 ± 588.338
2025-05-11 11:56:13,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3752.7532, 4123.2363, 3459.018, 2684.827, 4593.823, 3909.598, 3456.1042, 4384.2104, 3144.531, 4480.784]
2025-05-11 11:56:13,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:56:13,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 3 hours, 46 minutes, 34 seconds)
2025-05-11 11:59:01,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 11:59:14,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4120.80957 ± 429.329
2025-05-11 11:59:14,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3831.1267, 5029.649, 4055.9204, 4246.8555, 3701.9749, 4495.2695, 4449.266, 3493.2712, 3812.4514, 4092.3123]
2025-05-11 11:59:14,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 11:59:14,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (4120.81) for latency ExtremeClogL1U23
2025-05-11 11:59:14,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 11:59:14,433 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 11:59:14,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 3 hours, 43 minutes, 9 seconds)
2025-05-11 12:02:03,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 12:02:16,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3661.32495 ± 1148.420
2025-05-11 12:02:16,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3829.7546, 3704.8945, 4214.327, 4164.69, 4681.715, 556.29663, 3362.6633, 4931.0557, 3265.0017, 3902.8513]
2025-05-11 12:02:16,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:02:16,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 3 hours, 40 minutes, 48 seconds)
2025-05-11 12:05:05,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 12:05:17,813 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4206.60596 ± 582.709
2025-05-11 12:05:17,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3193.2026, 4464.1606, 5089.3613, 4151.0195, 3895.581, 4295.35, 3221.136, 4547.647, 4528.6636, 4679.933]
2025-05-11 12:05:17,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:05:17,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (4206.61) for latency ExtremeClogL1U23
2025-05-11 12:05:17,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 12:05:17,818 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 12:05:17,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 3 hours, 39 minutes, 5 seconds)
2025-05-11 12:08:06,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 12:08:19,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3980.90479 ± 672.571
2025-05-11 12:08:19,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4371.9556, 3718.1558, 2046.4049, 4054.1624, 4234.3174, 4367.906, 4349.098, 4085.9058, 4318.295, 4262.8457]
2025-05-11 12:08:19,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:08:19,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 3 hours, 37 minutes)
2025-05-11 12:11:09,341 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 12:11:22,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3741.57666 ± 342.898
2025-05-11 12:11:22,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4418.2344, 3588.8052, 3813.669, 3061.9983, 3544.8953, 3800.214, 3725.8826, 3488.2153, 3969.4702, 4004.3823]
2025-05-11 12:11:22,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:11:22,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 3 hours, 34 minutes, 55 seconds)
2025-05-11 12:14:12,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 12:14:24,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4046.35498 ± 1240.802
2025-05-11 12:14:24,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4527.9316, 4520.699, 3775.7896, 4057.1921, 5013.422, 4834.5005, 3817.8386, 4760.428, 4631.4414, 524.30927]
2025-05-11 12:14:24,904 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:14:24,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 3 hours, 32 minutes, 26 seconds)
2025-05-11 12:17:14,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 12:17:27,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3684.90869 ± 781.577
2025-05-11 12:17:27,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4727.838, 2543.268, 3550.9333, 3613.68, 2146.1467, 3871.789, 3893.0996, 3614.6719, 4229.7695, 4657.891]
2025-05-11 12:17:27,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:17:27,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 3 hours, 29 minutes, 30 seconds)
2025-05-11 12:20:17,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 12:20:29,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4880.68604 ± 314.172
2025-05-11 12:20:29,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4486.803, 5520.816, 4684.304, 4880.4546, 4740.6997, 4477.2324, 4782.3784, 4874.9663, 5099.6147, 5259.591]
2025-05-11 12:20:29,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:20:29,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (4880.69) for latency ExtremeClogL1U23
2025-05-11 12:20:29,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 12:20:29,738 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 12:20:29,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 3 hours, 26 minutes, 42 seconds)
2025-05-11 12:23:17,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 12:23:30,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4678.47363 ± 430.397
2025-05-11 12:23:30,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4583.2485, 3966.1035, 4872.298, 5072.375, 4640.952, 4494.575, 3916.2651, 4909.3877, 5183.2373, 5146.2915]
2025-05-11 12:23:30,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:23:30,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 3 hours, 23 minutes, 25 seconds)
2025-05-11 12:26:19,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 12:26:32,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4661.15430 ± 457.180
2025-05-11 12:26:32,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4278.1704, 4286.442, 5067.0205, 5175.4272, 3647.3655, 4596.497, 4834.986, 5163.9883, 4903.699, 4657.949]
2025-05-11 12:26:32,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:26:32,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 3 hours, 20 minutes, 11 seconds)
2025-05-11 12:29:19,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 12:29:32,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4299.39111 ± 648.498
2025-05-11 12:29:32,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4355.7417, 4659.9087, 5017.8633, 3891.1895, 2730.3914, 5090.49, 4337.462, 4415.146, 3872.5205, 4623.1978]
2025-05-11 12:29:32,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:29:32,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 3 hours, 16 minutes, 39 seconds)
2025-05-11 12:32:20,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 12:32:32,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4439.92676 ± 601.115
2025-05-11 12:32:32,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3937.1282, 4767.394, 4804.8696, 4083.1497, 4118.653, 5268.1943, 3363.5242, 5120.8125, 5006.2114, 3929.3325]
2025-05-11 12:32:32,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:32:32,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 3 hours, 13 minutes, 8 seconds)
2025-05-11 12:35:21,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 12:35:33,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4831.91650 ± 591.844
2025-05-11 12:35:33,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3660.9463, 5093.0005, 3937.9946, 4731.378, 5088.3193, 5394.6753, 5453.3877, 5193.8384, 5300.7974, 4464.828]
2025-05-11 12:35:33,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:35:33,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 3 hours, 9 minutes, 52 seconds)
2025-05-11 12:38:22,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 12:38:34,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4828.59082 ± 401.967
2025-05-11 12:38:34,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4971.0205, 5191.7847, 4570.8975, 5000.211, 4951.148, 5204.124, 4303.189, 4283.135, 4345.574, 5464.827]
2025-05-11 12:38:34,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:38:34,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 6 minutes, 53 seconds)
2025-05-11 12:41:22,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 12:41:34,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4802.15967 ± 366.666
2025-05-11 12:41:34,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4452.247, 4847.668, 4858.258, 4386.5405, 5288.561, 4559.483, 5162.2583, 4447.027, 4558.912, 5460.6445]
2025-05-11 12:41:34,566 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:41:34,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 3 minutes, 31 seconds)
2025-05-11 12:44:22,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 12:44:35,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4843.82324 ± 508.360
2025-05-11 12:44:35,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5100.3433, 4664.948, 5250.962, 4758.8228, 4437.9175, 3619.5598, 4715.308, 5188.4893, 5338.9414, 5362.937]
2025-05-11 12:44:35,088 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:44:35,094 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 29 seconds)
2025-05-11 12:47:22,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 12:47:35,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4471.95117 ± 641.492
2025-05-11 12:47:35,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4867.8975, 5441.3716, 4695.4897, 3338.6, 3249.1997, 4737.5176, 4450.925, 4716.805, 4709.5586, 4512.1475]
2025-05-11 12:47:35,118 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:47:35,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 57 minutes, 25 seconds)
2025-05-11 12:50:22,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 12:50:34,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4226.43848 ± 1173.014
2025-05-11 12:50:34,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5037.2197, 4833.3145, 5109.5005, 4717.347, 4904.284, 2920.7327, 5635.8184, 3652.7668, 3896.085, 1557.313]
2025-05-11 12:50:34,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:50:34,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 54 minutes, 6 seconds)
2025-05-11 12:53:21,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 12:53:34,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4486.54199 ± 520.655
2025-05-11 12:53:34,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4885.627, 5645.809, 4388.543, 4824.8345, 4230.4834, 3652.9575, 4570.5786, 4253.2646, 4435.8213, 3977.5059]
2025-05-11 12:53:34,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:53:34,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 2 hours, 50 minutes, 53 seconds)
2025-05-11 12:56:22,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 12:56:34,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4784.85205 ± 465.642
2025-05-11 12:56:34,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5221.6367, 5047.91, 5437.9023, 4648.9995, 4422.786, 4693.543, 4969.1724, 3702.191, 5085.304, 4619.075]
2025-05-11 12:56:34,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:56:34,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 2 hours, 48 minutes)
2025-05-11 12:59:22,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 12:59:34,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4612.86426 ± 925.361
2025-05-11 12:59:34,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5300.2085, 5431.099, 2114.5476, 4883.386, 5411.2417, 4846.541, 4737.8228, 4680.2476, 3995.752, 4727.8]
2025-05-11 12:59:34,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 12:59:34,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 2 hours, 44 minutes, 55 seconds)
2025-05-11 13:02:22,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 13:02:35,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4787.04688 ± 399.295
2025-05-11 13:02:35,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4277.3594, 4092.084, 4778.2026, 4910.5234, 4927.75, 5425.352, 4529.176, 4688.134, 5355.25, 4886.64]
2025-05-11 13:02:35,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:02:35,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 2 hours, 42 minutes, 5 seconds)
2025-05-11 13:05:24,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 13:05:36,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4904.00000 ± 566.897
2025-05-11 13:05:36,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3963.3823, 4826.818, 5314.521, 4923.579, 4189.959, 5153.573, 5330.0337, 4192.6616, 5576.2207, 5569.252]
2025-05-11 13:05:36,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:05:36,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (4904.00) for latency ExtremeClogL1U23
2025-05-11 13:05:36,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 13:05:36,974 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 13:05:36,985 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 2 hours, 39 minutes, 26 seconds)
2025-05-11 13:08:26,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 13:08:38,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4606.16504 ± 1477.146
2025-05-11 13:08:38,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5068.954, 5353.6875, 6249.5337, 5287.2334, 3539.1592, 704.3305, 4155.3794, 4959.758, 5468.4067, 5275.2065]
2025-05-11 13:08:38,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:08:38,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 2 hours, 36 minutes, 46 seconds)
2025-05-11 13:11:27,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 13:11:39,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5011.21143 ± 399.162
2025-05-11 13:11:39,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5203.8154, 4869.2573, 4268.7007, 4528.0913, 5143.194, 5593.7524, 4859.2104, 5167.0195, 5588.0415, 4891.0303]
2025-05-11 13:11:39,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:11:39,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (5011.21) for latency ExtremeClogL1U23
2025-05-11 13:11:39,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 13:11:39,734 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 13:11:39,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 2 hours, 33 minutes, 52 seconds)
2025-05-11 13:14:27,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 13:14:40,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5135.22998 ± 288.321
2025-05-11 13:14:40,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5033.539, 4575.5864, 5235.6597, 5307.9727, 5374.039, 4992.8164, 4876.4985, 5665.8667, 5000.5977, 5289.728]
2025-05-11 13:14:40,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:14:40,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (5135.23) for latency ExtremeClogL1U23
2025-05-11 13:14:40,349 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 13:14:40,354 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 13:14:40,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 2 hours, 30 minutes, 56 seconds)
2025-05-11 13:17:29,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 13:17:41,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4357.84033 ± 1451.207
2025-05-11 13:17:41,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [707.50073, 5590.4307, 4684.616, 4338.6978, 3545.3892, 5698.058, 5154.1597, 5077.4385, 3273.427, 5508.6875]
2025-05-11 13:17:41,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:17:41,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 2 hours, 27 minutes, 59 seconds)
2025-05-11 13:20:31,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 13:20:44,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4888.85059 ± 488.363
2025-05-11 13:20:44,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5510.3154, 4725.66, 5065.9307, 4276.453, 5706.812, 5016.3003, 4622.1094, 3999.6636, 5005.498, 4959.766]
2025-05-11 13:20:44,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:20:44,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 2 hours, 25 minutes, 8 seconds)
2025-05-11 13:23:33,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 13:23:45,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4453.64062 ± 1438.711
2025-05-11 13:23:45,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5067.504, 5507.9795, 3835.315, 5101.0586, 5078.822, 5593.7007, 435.4624, 5157.7153, 4197.9434, 4560.9062]
2025-05-11 13:23:45,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:23:45,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 22 minutes, 6 seconds)
2025-05-11 13:26:35,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 13:26:47,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4729.20410 ± 548.522
2025-05-11 13:26:47,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4783.0605, 4665.0737, 5322.8433, 3886.735, 4457.95, 4893.175, 5491.5034, 3710.4229, 4926.336, 5154.9355]
2025-05-11 13:26:47,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:26:47,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 19 minutes, 9 seconds)
2025-05-11 13:29:36,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 13:29:48,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4886.09619 ± 355.128
2025-05-11 13:29:48,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4354.0327, 4930.1743, 4769.7476, 5416.7036, 4961.3687, 4424.504, 4489.7207, 5246.816, 4972.3555, 5295.5337]
2025-05-11 13:29:48,033 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:29:48,041 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 16 minutes, 9 seconds)
2025-05-11 13:32:37,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 13:32:49,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5052.27881 ± 962.362
2025-05-11 13:32:49,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5831.363, 5109.7197, 4572.8394, 5480.129, 2430.5698, 5872.079, 5758.233, 5401.459, 4843.777, 5222.623]
2025-05-11 13:32:49,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:32:49,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 13 minutes, 5 seconds)
2025-05-11 13:35:39,870 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 13:35:51,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4948.83350 ± 469.792
2025-05-11 13:35:51,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4966.217, 4368.687, 4910.7974, 4755.8574, 4247.033, 4689.256, 5542.2876, 5704.9673, 4787.327, 5515.9067]
2025-05-11 13:35:51,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:35:51,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 10 minutes, 2 seconds)
2025-05-11 13:38:42,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 13:38:54,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5185.71729 ± 345.495
2025-05-11 13:38:54,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4963.7065, 4915.384, 5937.5513, 5647.336, 4893.5874, 5175.9404, 4737.8853, 5225.827, 5128.566, 5231.3857]
2025-05-11 13:38:54,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:38:54,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (5185.72) for latency ExtremeClogL1U23
2025-05-11 13:38:54,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 13:38:54,116 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 13:38:54,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 7 minutes, 9 seconds)
2025-05-11 13:41:43,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 13:41:55,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5057.86914 ± 515.210
2025-05-11 13:41:55,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5038.459, 4807.0044, 4505.7896, 5925.261, 5728.0127, 4423.184, 5135.7046, 5667.484, 4605.104, 4742.689]
2025-05-11 13:41:55,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:41:55,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 4 minutes, 9 seconds)
2025-05-11 13:44:51,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 13:45:03,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4765.82861 ± 1153.989
2025-05-11 13:45:03,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5360.5654, 1407.3413, 5564.1504, 4923.7524, 5140.1587, 5374.012, 5469.2646, 4862.5137, 4719.067, 4837.4624]
2025-05-11 13:45:03,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:45:03,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 2 minutes, 3 seconds)
2025-05-11 13:47:50,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 13:48:03,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5290.97314 ± 408.548
2025-05-11 13:48:03,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5289.9336, 5521.1123, 4648.31, 5455.4775, 5290.798, 4787.923, 5167.6245, 5132.493, 5390.9067, 6225.1533]
2025-05-11 13:48:03,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:48:03,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (5290.97) for latency ExtremeClogL1U23
2025-05-11 13:48:03,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 13:48:03,343 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 13:48:03,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 58 minutes, 50 seconds)
2025-05-11 13:50:51,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 13:51:03,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4660.75000 ± 1639.976
2025-05-11 13:51:03,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5007.9126, 3418.6208, 5848.967, 496.43375, 3991.274, 5782.9395, 5337.476, 6357.639, 5849.3735, 4516.8613]
2025-05-11 13:51:03,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:51:03,689 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 55 minutes, 34 seconds)
2025-05-11 13:53:50,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 13:54:03,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5358.33936 ± 456.955
2025-05-11 13:54:03,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5298.9263, 5698.968, 4970.809, 5071.326, 5471.071, 5809.6167, 5181.621, 4427.9336, 5523.0264, 6130.093]
2025-05-11 13:54:03,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:54:03,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (5358.34) for latency ExtremeClogL1U23
2025-05-11 13:54:03,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 13:54:03,132 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 13:54:03,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 52 minutes, 6 seconds)
2025-05-11 13:56:49,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 13:57:01,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4955.88086 ± 907.166
2025-05-11 13:57:01,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2871.181, 5258.3794, 3853.0935, 5285.612, 5890.5977, 6024.836, 4736.8853, 5071.727, 5579.1143, 4987.382]
2025-05-11 13:57:01,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 13:57:01,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 48 minutes, 43 seconds)
2025-05-11 13:59:48,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 14:00:01,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5336.08594 ± 570.918
2025-05-11 14:00:01,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [6408.2695, 5700.0225, 4927.154, 4782.24, 4578.608, 4796.21, 5415.727, 5069.4087, 5999.1914, 5684.028]
2025-05-11 14:00:01,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:00:01,180 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 44 minutes, 43 seconds)
2025-05-11 14:02:48,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 14:03:00,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5121.96045 ± 634.552
2025-05-11 14:03:00,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5357.755, 4607.8564, 5426.355, 4434.5317, 4414.3853, 5722.2295, 6332.4106, 5654.8794, 4466.9297, 4802.268]
2025-05-11 14:03:00,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:03:00,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 41 minutes, 42 seconds)
2025-05-11 14:05:46,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 14:05:59,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4695.45654 ± 556.701
2025-05-11 14:05:59,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4231.623, 4771.353, 5471.7466, 3821.318, 5053.469, 5682.103, 4639.335, 4071.4688, 4632.3135, 4579.8394]
2025-05-11 14:05:59,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:05:59,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 38 minutes, 31 seconds)
2025-05-11 14:08:45,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 14:08:58,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5419.75488 ± 308.842
2025-05-11 14:08:58,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5526.9897, 5720.053, 5147.075, 5364.0586, 5752.4097, 5524.6875, 5513.0557, 5020.668, 4833.233, 5795.317]
2025-05-11 14:08:58,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:08:58,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (5419.75) for latency ExtremeClogL1U23
2025-05-11 14:08:58,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 14:08:58,055 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 14:08:58,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 35 minutes, 27 seconds)
2025-05-11 14:11:43,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 14:11:56,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4897.88232 ± 510.432
2025-05-11 14:11:56,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4519.8726, 5259.483, 5594.9604, 5600.336, 4391.0645, 4952.96, 4215.1904, 4763.0645, 4292.416, 5389.48]
2025-05-11 14:11:56,161 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:11:56,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 32 minutes, 24 seconds)
2025-05-11 14:14:43,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 14:14:55,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4839.70947 ± 1081.588
2025-05-11 14:14:55,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5622.998, 5195.3564, 5144.1763, 4368.6406, 5295.5464, 5414.0986, 4995.5576, 4750.684, 5796.5737, 1813.4625]
2025-05-11 14:14:55,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:14:55,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 29 minutes, 27 seconds)
2025-05-11 14:17:42,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 14:17:55,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4965.24365 ± 548.786
2025-05-11 14:17:55,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4876.582, 5311.5317, 5719.912, 5571.487, 4726.521, 4772.604, 4335.5425, 3831.691, 5304.923, 5201.6436]
2025-05-11 14:17:55,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:17:55,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 26 minutes, 28 seconds)
2025-05-11 14:20:43,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 14:20:55,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4465.21387 ± 1730.189
2025-05-11 14:20:55,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4885.8745, 4687.322, 5250.226, 5107.218, 5410.3555, 5537.3525, 5647.778, 360.1493, 5846.3413, 1919.5234]
2025-05-11 14:20:55,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:20:55,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 23 minutes, 38 seconds)
2025-05-11 14:23:42,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 14:23:55,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5162.99463 ± 407.104
2025-05-11 14:23:55,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4656.5337, 5368.692, 4883.2354, 5284.229, 5437.2534, 6000.0117, 4540.075, 5092.292, 5402.733, 4964.888]
2025-05-11 14:23:55,279 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:23:55,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 20 minutes, 44 seconds)
2025-05-11 14:26:42,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 14:26:55,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5223.55762 ± 499.186
2025-05-11 14:26:55,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5609.9443, 5483.785, 5351.6523, 4281.5854, 5749.6357, 5373.7266, 5235.103, 5858.013, 4678.9604, 4613.1675]
2025-05-11 14:26:55,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:26:55,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 17 minutes, 54 seconds)
2025-05-11 14:29:43,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 14:29:55,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4680.63184 ± 1585.532
2025-05-11 14:29:55,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4395.682, 5008.278, 5208.79, 5127.481, 5176.314, 91.29392, 6173.4116, 5370.139, 4952.707, 5302.224]
2025-05-11 14:29:55,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:29:55,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 14 minutes, 59 seconds)
2025-05-11 14:32:43,412 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 14:32:55,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4837.03760 ± 1635.423
2025-05-11 14:32:55,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5919.944, 4982.19, 4058.6902, 4559.4, 333.12704, 5268.447, 5163.2095, 5882.3247, 6126.3276, 6076.7207]
2025-05-11 14:32:55,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:32:55,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 12 minutes, 2 seconds)
2025-05-11 14:35:43,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 14:35:56,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5113.96191 ± 486.478
2025-05-11 14:35:56,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5812.835, 5642.6816, 4913.7563, 5476.576, 4866.1895, 4694.317, 4519.5093, 4329.3647, 5391.5146, 5492.875]
2025-05-11 14:35:56,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:35:56,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 9 minutes, 2 seconds)
2025-05-11 14:38:43,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 14:38:56,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5596.05859 ± 643.530
2025-05-11 14:38:56,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [6133.446, 4943.146, 6346.9224, 4212.93, 5391.3774, 5792.9917, 5809.046, 5542.1367, 5327.4854, 6461.107]
2025-05-11 14:38:56,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:38:56,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (5596.06) for latency ExtremeClogL1U23
2025-05-11 14:38:56,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 14:38:56,268 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 14:38:56,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 6 minutes, 4 seconds)
2025-05-11 14:41:45,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 14:41:57,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4616.95215 ± 329.046
2025-05-11 14:41:57,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4485.239, 4647.649, 4471.1665, 4538.1714, 4376.8345, 4724.334, 5248.1865, 4319.167, 5165.6177, 4193.1533]
2025-05-11 14:41:57,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:41:57,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 3 minutes, 10 seconds)
2025-05-11 14:44:45,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 14:44:57,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5146.42529 ± 524.554
2025-05-11 14:44:57,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5370.0454, 5175.1484, 4454.996, 5008.5996, 4769.286, 5716.8604, 5798.306, 5408.1807, 5623.747, 4139.086]
2025-05-11 14:44:57,895 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:44:57,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 9 seconds)
2025-05-11 14:47:45,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 14:47:57,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5329.32471 ± 631.886
2025-05-11 14:47:57,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4908.8564, 5574.844, 5028.0493, 5007.1074, 6056.2773, 5278.7925, 3955.534, 5624.3374, 6338.89, 5520.5576]
2025-05-11 14:47:57,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:47:57,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 57 minutes, 7 seconds)
2025-05-11 14:50:45,642 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 14:50:58,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5433.46826 ± 546.238
2025-05-11 14:50:58,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4966.952, 5265.6416, 4409.423, 5085.375, 6507.4473, 5498.6963, 5475.001, 5547.291, 5522.6475, 6056.207]
2025-05-11 14:50:58,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:50:58,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 54 minutes, 7 seconds)
2025-05-11 14:53:46,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 14:53:59,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4987.24170 ± 600.588
2025-05-11 14:53:59,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [6085.542, 4883.7153, 5594.894, 3657.9348, 4938.6626, 4919.9546, 5278.0137, 5072.3423, 4801.2607, 4640.0977]
2025-05-11 14:53:59,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:53:59,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 51 minutes, 9 seconds)
2025-05-11 14:56:46,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 14:56:59,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4833.21777 ± 436.798
2025-05-11 14:56:59,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5448.0425, 5135.7397, 5510.535, 4673.294, 4562.4287, 4120.987, 5061.665, 4495.0835, 4389.7617, 4934.639]
2025-05-11 14:56:59,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:56:59,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 48 minutes, 5 seconds)
2025-05-11 14:59:46,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 14:59:59,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4737.23047 ± 1078.299
2025-05-11 14:59:59,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5455.452, 5213.073, 4502.462, 4731.547, 5864.7285, 1917.7954, 3925.6025, 5517.631, 5272.9087, 4971.101]
2025-05-11 14:59:59,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 14:59:59,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 45 minutes, 4 seconds)
2025-05-11 15:02:48,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 15:03:01,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5408.98340 ± 567.984
2025-05-11 15:03:01,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [6162.6304, 4726.9287, 5874.6953, 5064.987, 6112.3955, 5587.392, 5971.8477, 5063.711, 4590.212, 4935.0366]
2025-05-11 15:03:01,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:03:01,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 42 minutes, 11 seconds)
2025-05-11 15:05:55,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 15:06:08,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5604.21094 ± 761.774
2025-05-11 15:06:08,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [6145.2173, 6006.0317, 6240.381, 5536.103, 4001.242, 5918.703, 6263.2163, 5537.4287, 4330.719, 6063.0664]
2025-05-11 15:06:08,252 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:06:08,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (5604.21) for latency ExtremeClogL1U23
2025-05-11 15:06:08,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 15:06:08,257 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 15:06:08,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 39 minutes, 26 seconds)
2025-05-11 15:09:01,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 15:09:14,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5184.86816 ± 388.782
2025-05-11 15:09:14,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5410.4355, 5069.7607, 5118.3013, 5710.674, 4911.982, 5479.049, 4700.4565, 5546.4116, 5465.843, 4435.7656]
2025-05-11 15:09:14,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:09:14,449 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 36 minutes, 36 seconds)
2025-05-11 15:12:07,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 15:12:20,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5077.06299 ± 1592.501
2025-05-11 15:12:20,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5915.9062, 6161.446, 631.086, 4394.825, 6309.592, 4678.9717, 5367.4023, 5871.1304, 5598.985, 5841.2837]
2025-05-11 15:12:20,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:12:20,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 33 minutes, 47 seconds)
2025-05-11 15:15:13,699 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 15:15:26,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5288.47705 ± 529.078
2025-05-11 15:15:26,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4905.6167, 4888.568, 5832.902, 5218.603, 5987.727, 4669.204, 5568.9927, 5547.6045, 4382.883, 5882.6675]
2025-05-11 15:15:26,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:15:26,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 30 minutes, 54 seconds)
2025-05-11 15:18:19,246 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 15:18:32,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5492.10059 ± 561.646
2025-05-11 15:18:32,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5838.0195, 5313.7983, 4527.0654, 5526.7427, 6444.5063, 4828.002, 5738.008, 5412.774, 5094.5713, 6197.523]
2025-05-11 15:18:32,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:18:32,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 27 minutes, 54 seconds)
2025-05-11 15:21:25,244 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 15:21:38,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5201.68701 ± 466.787
2025-05-11 15:21:38,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4327.837, 5417.2485, 4927.853, 4858.032, 5295.5454, 5539.0923, 5388.2573, 4918.572, 5174.5615, 6169.8726]
2025-05-11 15:21:38,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:21:38,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 24 minutes, 48 seconds)
2025-05-11 15:24:30,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 15:24:43,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5711.72168 ± 329.989
2025-05-11 15:24:43,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5620.96, 6067.0645, 5881.9014, 6019.277, 5426.256, 5108.4707, 6169.9277, 5408.579, 5911.8203, 5502.9575]
2025-05-11 15:24:43,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:24:43,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (5711.72) for latency ExtremeClogL1U23
2025-05-11 15:24:43,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 15:24:43,917 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 15:24:43,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 21 minutes, 41 seconds)
2025-05-11 15:27:36,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 15:27:49,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5632.33643 ± 615.981
2025-05-11 15:27:49,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5692.833, 6430.5166, 6088.8545, 5911.0127, 5544.1157, 5797.83, 4721.6377, 4543.5757, 5178.568, 6414.421]
2025-05-11 15:27:49,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:27:49,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 18 minutes, 34 seconds)
2025-05-11 15:30:42,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 15:30:55,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5456.17383 ± 638.108
2025-05-11 15:30:55,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [6001.5674, 5652.6694, 6262.7036, 4821.711, 5045.0093, 5974.7593, 4463.624, 6270.398, 5381.962, 4687.341]
2025-05-11 15:30:55,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:30:55,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 15 minutes, 28 seconds)
2025-05-11 15:33:48,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 15:34:01,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5944.67480 ± 395.557
2025-05-11 15:34:01,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [6219.145, 5209.3604, 6197.938, 5615.3794, 5444.2256, 6094.9033, 6230.3506, 5793.0166, 6560.8594, 6081.569]
2025-05-11 15:34:01,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:34:01,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1226 [INFO]: New best (5944.67) for latency ExtremeClogL1U23
2025-05-11 15:34:01,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1229 [INFO]: saving network
2025-05-11 15:34:01,460 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of BPQL to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-bpql-mem2/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-11 15:34:01,478 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 12 minutes, 23 seconds)
2025-05-11 15:36:54,066 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 15:37:07,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5350.93799 ± 528.705
2025-05-11 15:37:07,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4972.5225, 5494.0835, 5794.501, 4068.1401, 5877.5015, 5502.8496, 5002.314, 5457.732, 5393.68, 5946.053]
2025-05-11 15:37:07,156 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:37:07,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 9 minutes, 17 seconds)
2025-05-11 15:40:00,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 15:40:13,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5141.72461 ± 931.863
2025-05-11 15:40:13,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4500.062, 5503.2954, 4305.787, 6672.6904, 5280.5796, 5905.311, 6213.595, 5266.2583, 4178.3247, 3591.344]
2025-05-11 15:40:13,141 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:40:13,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 6 minutes, 11 seconds)
2025-05-11 15:43:05,408 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 15:43:18,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5460.90430 ± 769.638
2025-05-11 15:43:18,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [6102.029, 5688.926, 6251.1196, 3402.1704, 5023.627, 5311.518, 5633.209, 5922.9385, 5437.956, 5835.546]
2025-05-11 15:43:18,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:43:18,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 3 minutes, 5 seconds)
2025-05-11 15:46:09,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-11 15:46:22,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5388.54395 ± 637.849
2025-05-11 15:46:22,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4246.3496, 6273.3193, 5667.45, 5434.708, 4974.214, 5925.6245, 5448.0254, 6026.388, 5508.8745, 4380.4897]
2025-05-11 15:46:22,463 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-11 15:46:22,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noisy-halfcheetah):1251 [DEBUG]: Training session finished
