2025-08-07 04:31:48,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc10-halfcheetah/ExtremeClogL1U23-bpql-mem24
2025-08-07 04:31:48,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc10-halfcheetah/ExtremeClogL1U23-bpql-mem24
2025-08-07 04:31:48,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x14a44db0a650>}
2025-08-07 04:31:48,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1111 [DEBUG]: using device: cuda
2025-08-07 04:31:48,545 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1133 [INFO]: Creating new trainer
2025-08-07 04:31:48,563 baseline-bpql-noiseperc10-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=161, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 04:31:48,563 baseline-bpql-noiseperc10-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 04:31:49,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1194 [DEBUG]: Starting training session...
2025-08-07 04:31:49,590 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 1/100
2025-08-07 04:33:26,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:33:40,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: -422.33038 ± 38.394
2025-08-07 04:33:40,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [-409.79587, -353.15247, -425.41354, -418.86246, -451.35962, -363.2383, -420.01376, -435.3348, -480.79742, -465.33533]
2025-08-07 04:33:40,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:33:40,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (-422.33) for latency ExtremeClogL1U23
2025-08-07 04:33:40,194 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 2 minutes, 29 seconds)
2025-08-07 04:35:23,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:35:36,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: -220.05119 ± 42.947
2025-08-07 04:35:36,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [-195.29437, -250.0631, -275.45908, -192.00249, -145.59445, -267.2662, -179.92035, -278.43213, -207.067, -209.41278]
2025-08-07 04:35:36,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:35:36,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (-220.05) for latency ExtremeClogL1U23
2025-08-07 04:35:36,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 5 minutes, 13 seconds)
2025-08-07 04:37:19,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:37:32,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: -166.09445 ± 71.273
2025-08-07 04:37:32,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [-273.41156, -98.93797, -28.530497, -201.68837, -259.1809, -160.15817, -185.77458, -191.45299, -165.34789, -96.46158]
2025-08-07 04:37:32,638 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:37:32,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (-166.09) for latency ExtremeClogL1U23
2025-08-07 04:37:32,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 4 minutes, 52 seconds)
2025-08-07 04:39:15,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:39:28,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: -139.12285 ± 62.314
2025-08-07 04:39:28,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [-99.013176, -120.69857, -249.61237, -111.76561, -41.900963, -177.54762, -143.00685, -214.36787, -172.83162, -60.483707]
2025-08-07 04:39:28,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:39:28,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (-139.12) for latency ExtremeClogL1U23
2025-08-07 04:39:28,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 3 minutes, 40 seconds)
2025-08-07 04:41:11,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:41:24,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: -44.17692 ± 87.416
2025-08-07 04:41:24,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [-60.291344, -28.060839, 22.027077, 175.9483, -67.74324, -114.42915, -58.06429, -45.258274, -161.33922, -104.55821]
2025-08-07 04:41:24,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:41:24,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (-44.18) for latency ExtremeClogL1U23
2025-08-07 04:41:25,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 2 minutes, 12 seconds)
2025-08-07 04:43:08,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:43:21,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 61.74596 ± 72.029
2025-08-07 04:43:21,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [178.51128, -27.42873, 118.092384, 165.52809, 88.6223, 65.90797, 2.7263486, -30.633434, 2.328829, 53.80459]
2025-08-07 04:43:21,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:43:21,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (61.75) for latency ExtremeClogL1U23
2025-08-07 04:43:21,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 2 minutes, 3 seconds)
2025-08-07 04:45:04,306 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:45:17,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 214.09436 ± 64.270
2025-08-07 04:45:17,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [195.11046, 196.5245, 75.69325, 294.93103, 256.64627, 247.51837, 312.94766, 191.83253, 180.17192, 189.56761]
2025-08-07 04:45:17,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:45:17,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (214.09) for latency ExtremeClogL1U23
2025-08-07 04:45:17,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 9 seconds)
2025-08-07 04:47:00,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:47:13,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 445.70905 ± 119.566
2025-08-07 04:47:13,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [488.4856, 445.8422, 226.5957, 523.5559, 264.0638, 380.21326, 508.27716, 442.332, 638.17596, 539.54877]
2025-08-07 04:47:13,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:47:13,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (445.71) for latency ExtremeClogL1U23
2025-08-07 04:47:13,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 58 minutes, 10 seconds)
2025-08-07 04:48:56,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:49:09,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 418.28232 ± 216.003
2025-08-07 04:49:09,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [315.30298, 207.91302, 854.4383, 304.4652, 703.0922, 444.57327, 453.64117, 531.99866, 218.8883, 148.51001]
2025-08-07 04:49:09,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:49:09,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 56 minutes, 18 seconds)
2025-08-07 04:50:53,036 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:51:06,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 728.74768 ± 79.807
2025-08-07 04:51:06,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [766.9997, 773.9035, 777.70325, 710.04956, 765.1861, 514.94183, 739.2468, 687.73224, 821.5267, 730.18744]
2025-08-07 04:51:06,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:51:06,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (728.75) for latency ExtremeClogL1U23
2025-08-07 04:51:06,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 54 minutes, 22 seconds)
2025-08-07 04:52:49,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:53:02,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 744.08026 ± 175.784
2025-08-07 04:53:02,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [897.06714, 758.90137, 764.6851, 685.17694, 857.44464, 667.6283, 299.2103, 786.19086, 994.8676, 729.6302]
2025-08-07 04:53:02,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:53:02,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (744.08) for latency ExtremeClogL1U23
2025-08-07 04:53:02,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 52 minutes, 25 seconds)
2025-08-07 04:54:45,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:54:58,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 734.58502 ± 153.016
2025-08-07 04:54:58,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [820.5409, 883.0528, 774.7157, 386.7115, 896.601, 835.81, 533.2133, 708.8267, 796.0866, 710.2917]
2025-08-07 04:54:58,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:54:58,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 50 minutes, 27 seconds)
2025-08-07 04:56:41,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:56:54,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 896.14032 ± 122.532
2025-08-07 04:56:54,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1027.154, 993.44354, 806.4548, 977.69806, 905.47253, 917.04803, 586.51447, 857.21906, 890.0061, 1000.39264]
2025-08-07 04:56:54,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:56:54,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (896.14) for latency ExtremeClogL1U23
2025-08-07 04:56:54,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 48 minutes, 32 seconds)
2025-08-07 04:58:37,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 04:58:51,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 932.12488 ± 142.711
2025-08-07 04:58:51,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1143.1995, 721.38336, 841.4772, 900.24536, 913.6589, 971.4159, 685.6878, 1056.3386, 1078.7328, 1009.11]
2025-08-07 04:58:51,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:58:51,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (932.12) for latency ExtremeClogL1U23
2025-08-07 04:58:51,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 46 minutes, 34 seconds)
2025-08-07 05:00:34,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:00:47,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 718.08612 ± 317.541
2025-08-07 05:00:47,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [500.31638, 745.64734, 531.4089, 944.59845, 950.0861, 812.09094, 946.65497, 929.3863, -102.017296, 922.6889]
2025-08-07 05:00:47,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:00:47,169 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 44 minutes, 35 seconds)
2025-08-07 05:02:30,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:02:43,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 909.67395 ± 128.960
2025-08-07 05:02:43,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [835.39014, 633.7334, 888.8589, 813.0235, 1005.0938, 864.5567, 897.73016, 1063.7969, 1015.9421, 1078.6135]
2025-08-07 05:02:43,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:02:43,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 42 minutes, 38 seconds)
2025-08-07 05:04:26,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:04:39,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 964.45929 ± 132.425
2025-08-07 05:04:39,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1080.2913, 834.7611, 832.0991, 975.69415, 1105.5233, 1231.464, 827.8066, 832.761, 973.49225, 950.70026]
2025-08-07 05:04:39,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:04:39,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (964.46) for latency ExtremeClogL1U23
2025-08-07 05:04:39,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 40 minutes, 39 seconds)
2025-08-07 05:06:22,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:06:35,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1042.62671 ± 83.344
2025-08-07 05:06:35,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1104.3687, 1078.6564, 1045.8286, 991.26434, 1007.8168, 1032.9861, 1101.1724, 1202.0953, 988.6462, 873.4318]
2025-08-07 05:06:35,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:06:35,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (1042.63) for latency ExtremeClogL1U23
2025-08-07 05:06:35,594 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 38 minutes, 44 seconds)
2025-08-07 05:08:18,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:08:31,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1039.73376 ± 61.236
2025-08-07 05:08:31,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1069.2113, 1099.7211, 952.1489, 1091.1635, 1104.4619, 964.1029, 1098.0856, 1055.8954, 947.32935, 1015.2176]
2025-08-07 05:08:31,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:08:31,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 36 minutes, 47 seconds)
2025-08-07 05:10:14,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:10:27,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 981.03357 ± 114.753
2025-08-07 05:10:27,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1051.3613, 1057.361, 1081.5159, 871.19055, 1038.114, 1033.5725, 1006.0114, 1050.45, 930.8839, 689.8755]
2025-08-07 05:10:27,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:10:27,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 34 minutes, 50 seconds)
2025-08-07 05:12:11,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:12:24,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1217.85474 ± 172.140
2025-08-07 05:12:24,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1251.2069, 1082.229, 1500.1559, 1180.6255, 1505.3353, 1330.5475, 1002.1844, 1105.5935, 1205.6017, 1015.06757]
2025-08-07 05:12:24,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:12:24,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (1217.85) for latency ExtremeClogL1U23
2025-08-07 05:12:24,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 32 minutes, 59 seconds)
2025-08-07 05:14:07,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:14:20,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1149.29956 ± 144.800
2025-08-07 05:14:20,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1089.2396, 1132.943, 1040.7783, 1129.8529, 1031.0125, 1172.1372, 1193.1111, 1241.5493, 952.53955, 1509.8308]
2025-08-07 05:14:20,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:14:20,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 31 minutes, 5 seconds)
2025-08-07 05:16:03,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:16:16,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1059.31006 ± 90.044
2025-08-07 05:16:16,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1056.1898, 1018.3376, 1079.651, 1105.6035, 875.773, 1137.3601, 959.1702, 1017.8739, 1178.8379, 1164.3036]
2025-08-07 05:16:16,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:16:16,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 29 minutes, 8 seconds)
2025-08-07 05:17:59,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:18:12,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1158.53625 ± 129.304
2025-08-07 05:18:12,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1335.5101, 1096.7614, 1126.911, 1112.0195, 985.721, 1229.1986, 1016.947, 1287.7388, 1362.0415, 1032.5123]
2025-08-07 05:18:12,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:18:12,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 27 minutes, 12 seconds)
2025-08-07 05:19:55,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:20:08,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1154.08179 ± 132.242
2025-08-07 05:20:08,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1340.315, 1022.8498, 1161.0482, 1110.0455, 1118.7048, 1230.9714, 1422.1647, 981.69867, 1066.4304, 1086.5879]
2025-08-07 05:20:08,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:20:08,956 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 25 minutes, 16 seconds)
2025-08-07 05:21:51,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:22:05,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1240.48657 ± 105.367
2025-08-07 05:22:05,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1493.6863, 1344.7622, 1193.4387, 1302.4421, 1142.2084, 1144.9872, 1181.1215, 1224.6681, 1221.06, 1156.4911]
2025-08-07 05:22:05,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:22:05,084 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (1240.49) for latency ExtremeClogL1U23
2025-08-07 05:22:05,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 23 minutes, 16 seconds)
2025-08-07 05:23:48,062 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:24:01,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1258.56238 ± 120.742
2025-08-07 05:24:01,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1174.6831, 1243.6764, 1375.3478, 1420.0157, 1185.327, 1177.5581, 1459.0299, 1200.3385, 1295.851, 1053.7965]
2025-08-07 05:24:01,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:24:01,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (1258.56) for latency ExtremeClogL1U23
2025-08-07 05:24:01,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 21 minutes, 20 seconds)
2025-08-07 05:25:44,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:25:57,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1233.32739 ± 156.966
2025-08-07 05:25:57,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1140.9802, 1003.32605, 1550.28, 1158.8685, 1387.433, 1213.1106, 1400.205, 1189.1508, 1213.4321, 1076.489]
2025-08-07 05:25:57,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:25:57,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 19 minutes, 23 seconds)
2025-08-07 05:27:40,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:27:53,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1270.08032 ± 192.689
2025-08-07 05:27:53,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1116.2733, 1449.2731, 1405.8353, 1708.4924, 1138.2329, 1095.9141, 1244.4268, 1164.5778, 1062.0018, 1315.7753]
2025-08-07 05:27:53,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:27:53,674 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (1270.08) for latency ExtremeClogL1U23
2025-08-07 05:27:53,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 17 minutes, 28 seconds)
2025-08-07 05:29:36,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:29:49,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1230.72302 ± 222.093
2025-08-07 05:29:49,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1117.5592, 1197.8959, 1209.4781, 995.83716, 1119.3922, 1086.3026, 1265.434, 1286.1178, 1850.0468, 1179.1661]
2025-08-07 05:29:49,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:29:49,888 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 15 minutes, 33 seconds)
2025-08-07 05:31:32,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:31:46,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1328.52185 ± 234.237
2025-08-07 05:31:46,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1165.7954, 1116.6534, 1551.0724, 1650.0916, 1214.2885, 1285.6292, 1806.2358, 1154.5996, 1147.7457, 1193.1073]
2025-08-07 05:31:46,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:31:46,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (1328.52) for latency ExtremeClogL1U23
2025-08-07 05:31:46,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 13 minutes, 37 seconds)
2025-08-07 05:33:29,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:33:42,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1379.18689 ± 182.520
2025-08-07 05:33:42,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1376.7206, 1196.6038, 1285.1417, 1093.8346, 1720.2992, 1346.6632, 1375.2899, 1594.2295, 1551.2937, 1251.7925]
2025-08-07 05:33:42,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:33:42,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (1379.19) for latency ExtremeClogL1U23
2025-08-07 05:33:42,385 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 11 minutes, 43 seconds)
2025-08-07 05:35:25,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:35:38,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1368.63635 ± 151.535
2025-08-07 05:35:38,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1322.2867, 1375.893, 1129.5234, 1319.1859, 1229.3513, 1504.1416, 1614.6473, 1312.9865, 1271.1328, 1607.2152]
2025-08-07 05:35:38,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:35:38,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 9 minutes, 45 seconds)
2025-08-07 05:37:21,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:37:34,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1349.97986 ± 194.673
2025-08-07 05:37:34,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1156.8285, 1530.184, 1365.1437, 1773.3538, 1264.1792, 1425.5703, 1026.37, 1392.5073, 1306.2163, 1259.4454]
2025-08-07 05:37:34,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:37:34,676 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 7 minutes, 49 seconds)
2025-08-07 05:39:17,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:39:30,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1587.47595 ± 364.312
2025-08-07 05:39:30,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2261.587, 1543.1664, 1146.7404, 1141.5114, 2053.7136, 1968.6859, 1369.8759, 1581.998, 1411.4094, 1396.0723]
2025-08-07 05:39:30,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:39:30,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (1587.48) for latency ExtremeClogL1U23
2025-08-07 05:39:31,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 5 minutes, 54 seconds)
2025-08-07 05:41:14,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:41:27,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1548.56763 ± 349.490
2025-08-07 05:41:27,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2285.2036, 1367.9288, 1095.5775, 1204.0491, 1689.7596, 1628.23, 1972.9536, 1427.1089, 1232.361, 1582.5038]
2025-08-07 05:41:27,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:41:27,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 3 minutes, 59 seconds)
2025-08-07 05:43:10,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:43:23,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1555.24707 ± 412.929
2025-08-07 05:43:23,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1607.5562, 1992.9138, 1580.0197, 901.9631, 1436.973, 1176.5969, 1496.2242, 1264.4241, 2467.771, 1628.0298]
2025-08-07 05:43:23,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:43:23,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 2 minutes, 1 second)
2025-08-07 05:45:06,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:45:19,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1457.51306 ± 703.574
2025-08-07 05:45:19,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2437.941, 1398.6555, 1346.909, -125.96455, 1287.3999, 1259.1567, 1162.4303, 1303.4701, 2358.687, 2146.4456]
2025-08-07 05:45:19,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:45:19,768 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 7 seconds)
2025-08-07 05:47:02,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:47:15,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1451.52808 ± 311.268
2025-08-07 05:47:15,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1257.5846, 1430.8105, 1388.1573, 1569.2747, 1137.4497, 1270.5248, 1135.7081, 1609.2638, 1454.4746, 2262.0322]
2025-08-07 05:47:15,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:47:15,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 58 minutes, 9 seconds)
2025-08-07 05:48:58,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:49:11,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1476.16333 ± 776.350
2025-08-07 05:49:11,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1207.7421, 1275.7375, 2004.2544, -256.61655, 1664.4996, 2507.1467, 1158.3342, 993.5454, 2558.5469, 1648.4423]
2025-08-07 05:49:11,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:49:11,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 56 minutes, 9 seconds)
2025-08-07 05:50:54,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:51:08,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2182.96167 ± 548.185
2025-08-07 05:51:08,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2949.1816, 2372.5784, 2064.3896, 2655.9746, 2416.3625, 1458.5778, 1533.1586, 2118.0317, 2883.3706, 1377.9929]
2025-08-07 05:51:08,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:51:08,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (2182.96) for latency ExtremeClogL1U23
2025-08-07 05:51:08,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 54 minutes, 12 seconds)
2025-08-07 05:52:51,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:53:04,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1652.34888 ± 457.121
2025-08-07 05:53:04,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2144.3003, 1248.9928, 1225.6194, 1424.9288, 1113.6221, 1258.8812, 2189.6467, 1570.3549, 2462.7651, 1884.3766]
2025-08-07 05:53:04,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:53:04,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 52 minutes, 15 seconds)
2025-08-07 05:54:46,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:55:00,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1621.05640 ± 554.906
2025-08-07 05:55:00,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1793.4413, 1280.7057, 1643.5891, 3011.1826, 1256.1925, 1296.9562, 1269.2814, 1193.5038, 2197.066, 1268.6469]
2025-08-07 05:55:00,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:55:00,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 50 minutes, 16 seconds)
2025-08-07 05:56:42,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:56:56,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1934.16504 ± 646.263
2025-08-07 05:56:56,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [922.1161, 1049.947, 2540.0405, 1392.7976, 1390.971, 2278.8083, 2134.7014, 2452.9932, 2842.4133, 2336.8618]
2025-08-07 05:56:56,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:56:56,195 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 48 minutes, 20 seconds)
2025-08-07 05:58:39,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:58:52,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1648.54260 ± 377.496
2025-08-07 05:58:52,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1845.0109, 1864.9724, 1791.5498, 1191.934, 1343.3386, 1289.0591, 1990.9207, 1569.4321, 2399.249, 1199.9597]
2025-08-07 05:58:52,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:58:52,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 46 minutes, 26 seconds)
2025-08-07 06:00:35,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:00:48,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1780.76929 ± 520.364
2025-08-07 06:00:48,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2838.3472, 1679.0605, 2002.4265, 2261.295, 1146.5791, 2219.83, 1233.3682, 1422.6862, 1256.3634, 1747.738]
2025-08-07 06:00:48,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:00:48,618 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 44 minutes, 30 seconds)
2025-08-07 06:02:31,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:02:44,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1913.40430 ± 910.598
2025-08-07 06:02:44,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1192.1628, 1689.5966, 1406.4733, 1458.9187, 1494.8593, 3751.2368, 3635.2544, 1620.2007, 1767.7439, 1117.5977]
2025-08-07 06:02:44,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:02:44,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 42 minutes, 33 seconds)
2025-08-07 06:04:27,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:04:40,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1875.91724 ± 388.612
2025-08-07 06:04:40,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1253.0111, 1424.3523, 2306.2505, 1470.6589, 2425.9348, 2358.7292, 1867.0336, 1878.5062, 2015.559, 1759.1361]
2025-08-07 06:04:40,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:04:40,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 40 minutes, 39 seconds)
2025-08-07 06:06:23,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:06:36,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1475.44971 ± 463.232
2025-08-07 06:06:36,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1461.7845, 2029.8124, 1301.54, 1261.1838, 1277.0365, 360.38535, 1515.2356, 1978.4825, 1654.5698, 1914.4656]
2025-08-07 06:06:36,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:06:36,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 38 minutes, 42 seconds)
2025-08-07 06:08:19,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:08:32,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2468.34961 ± 873.997
2025-08-07 06:08:32,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2236.0657, 3630.8699, 1150.8336, 2402.6082, 3173.4336, 1225.9534, 1437.3323, 3113.9412, 3041.5679, 3270.8906]
2025-08-07 06:08:32,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:08:32,988 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (2468.35) for latency ExtremeClogL1U23
2025-08-07 06:08:32,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 36 minutes, 46 seconds)
2025-08-07 06:10:15,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:10:29,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1871.88513 ± 856.485
2025-08-07 06:10:29,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1372.172, 1481.0273, 1126.1464, 1221.672, 1172.083, 3728.1963, 2101.056, 1383.1632, 1949.4757, 3183.8596]
2025-08-07 06:10:29,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:10:29,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 34 minutes, 48 seconds)
2025-08-07 06:12:12,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:12:25,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2594.82983 ± 817.989
2025-08-07 06:12:25,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3217.552, 3481.987, 3242.2112, 3545.4316, 2678.431, 1336.2704, 1648.2034, 1308.5424, 2686.209, 2803.4604]
2025-08-07 06:12:25,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:12:25,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (2594.83) for latency ExtremeClogL1U23
2025-08-07 06:12:25,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 32 minutes, 53 seconds)
2025-08-07 06:14:08,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:14:21,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3205.20435 ± 987.116
2025-08-07 06:14:21,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1424.5889, 3874.1707, 3952.683, 3889.75, 3792.4434, 3905.5432, 3815.5234, 1983.9811, 1731.4108, 3681.9487]
2025-08-07 06:14:21,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:14:21,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (3205.20) for latency ExtremeClogL1U23
2025-08-07 06:14:21,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 30 minutes, 54 seconds)
2025-08-07 06:16:04,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:16:17,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3908.33081 ± 77.245
2025-08-07 06:16:17,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3937.2446, 3903.8276, 3852.003, 3989.2563, 3939.0498, 3979.1694, 3894.8757, 3957.6199, 3707.5564, 3922.7087]
2025-08-07 06:16:17,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:16:17,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (3908.33) for latency ExtremeClogL1U23
2025-08-07 06:16:17,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 28 minutes, 59 seconds)
2025-08-07 06:18:00,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:18:13,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3749.73755 ± 443.344
2025-08-07 06:18:13,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4124.4263, 3869.5872, 3767.2466, 3906.8013, 3272.3225, 4126.2227, 4081.8064, 3820.3494, 3908.354, 2620.2563]
2025-08-07 06:18:13,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:18:13,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 27 minutes, 1 second)
2025-08-07 06:19:56,196 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:20:09,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3829.37622 ± 646.514
2025-08-07 06:20:09,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3913.2742, 4040.9673, 4111.6753, 4105.7515, 4067.364, 3935.9355, 4148.0537, 1904.3391, 4115.4062, 3950.9976]
2025-08-07 06:20:09,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:20:09,160 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 25 minutes, 4 seconds)
2025-08-07 06:21:52,259 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:22:05,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3701.10596 ± 515.100
2025-08-07 06:22:05,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3891.6677, 2593.324, 2888.6384, 3568.445, 4254.176, 4131.434, 3976.5886, 3778.704, 3984.4836, 3943.5986]
2025-08-07 06:22:05,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:22:05,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 23 minutes, 10 seconds)
2025-08-07 06:23:48,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:24:01,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3228.17773 ± 899.640
2025-08-07 06:24:01,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2094.09, 4023.1433, 4136.275, 3302.2852, 3969.0054, 1571.7538, 2488.604, 4097.573, 2691.3691, 3907.6787]
2025-08-07 06:24:01,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:24:01,600 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 21 minutes, 15 seconds)
2025-08-07 06:25:44,641 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:25:57,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3588.37256 ± 829.541
2025-08-07 06:25:57,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1635.7504, 4006.9739, 4106.7114, 3819.4717, 4012.9644, 4006.0613, 3890.7512, 4066.0764, 4048.6936, 2290.2715]
2025-08-07 06:25:57,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:25:57,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 19 minutes, 21 seconds)
2025-08-07 06:27:40,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:27:53,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3486.59106 ± 655.614
2025-08-07 06:27:53,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3920.1692, 2128.7644, 3275.4968, 2407.2458, 3742.5916, 3945.2708, 3456.7896, 4004.2747, 3987.3813, 3997.926]
2025-08-07 06:27:53,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:27:53,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 17 minutes, 26 seconds)
2025-08-07 06:29:37,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:29:50,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3605.68823 ± 825.680
2025-08-07 06:29:50,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3828.716, 3988.2925, 3949.9758, 3976.3948, 4037.6401, 3949.352, 1199.4259, 3900.0117, 3310.336, 3916.7363]
2025-08-07 06:29:50,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:29:50,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 15 minutes, 31 seconds)
2025-08-07 06:31:33,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:31:46,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3722.70190 ± 551.754
2025-08-07 06:31:46,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4023.1362, 4084.7908, 3915.347, 3316.509, 3985.9448, 2267.714, 4035.6433, 4121.5215, 4035.7715, 3440.6406]
2025-08-07 06:31:46,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:31:46,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 13 minutes, 34 seconds)
2025-08-07 06:33:29,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:33:42,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3690.03394 ± 875.311
2025-08-07 06:33:42,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1093.5037, 4106.567, 4084.792, 3998.546, 3930.6084, 4121.166, 3838.1506, 4052.7031, 3671.772, 4002.532]
2025-08-07 06:33:42,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:33:42,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 11 minutes, 38 seconds)
2025-08-07 06:35:25,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:35:38,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3268.50073 ± 1072.031
2025-08-07 06:35:38,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4105.1646, 3927.1272, 3397.8542, 3906.2747, 1716.6168, 1066.7623, 2387.2432, 4096.6943, 3930.3635, 4150.907]
2025-08-07 06:35:38,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:35:38,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 9 minutes, 41 seconds)
2025-08-07 06:37:21,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:37:34,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3715.35815 ± 840.893
2025-08-07 06:37:34,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4045.1252, 4128.0645, 3354.3716, 4080.5547, 3970.8635, 1279.6036, 4082.2114, 4093.4507, 4147.538, 3971.7996]
2025-08-07 06:37:34,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:37:34,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 7 minutes, 45 seconds)
2025-08-07 06:39:17,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:39:30,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3406.25903 ± 730.184
2025-08-07 06:39:30,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3809.8193, 3998.4902, 4055.1248, 3891.701, 2158.121, 2340.312, 3681.5486, 2432.462, 4011.593, 3683.4177]
2025-08-07 06:39:30,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:39:30,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 5 minutes, 47 seconds)
2025-08-07 06:41:13,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:41:26,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3329.43359 ± 837.695
2025-08-07 06:41:26,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4022.914, 3862.6936, 3965.6658, 4004.5269, 4045.2976, 1943.3563, 3260.936, 3532.8665, 3001.1575, 1654.9202]
2025-08-07 06:41:26,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:41:26,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 3 minutes, 50 seconds)
2025-08-07 06:43:09,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:43:22,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3528.61646 ± 919.107
2025-08-07 06:43:22,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3880.5142, 3918.206, 4054.3647, 3922.0286, 3784.758, 4148.2407, 2610.2412, 3933.3176, 3972.9778, 1061.5149]
2025-08-07 06:43:22,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:43:23,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 1 minute, 55 seconds)
2025-08-07 06:45:06,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:45:19,300 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3202.14355 ± 759.317
2025-08-07 06:45:19,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3976.0996, 1660.7366, 3870.007, 3288.1736, 3088.8572, 3064.142, 3939.3157, 4105.187, 2532.1262, 2496.789]
2025-08-07 06:45:19,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:45:19,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour)
2025-08-07 06:47:02,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:47:15,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2963.11572 ± 1107.091
2025-08-07 06:47:15,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2961.875, 4087.345, 3836.0059, 3995.009, 4121.5728, 2169.0151, 2050.3645, 3811.315, 1070.9397, 1527.7169]
2025-08-07 06:47:15,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:47:15,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 58 minutes, 5 seconds)
2025-08-07 06:48:58,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:49:11,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3294.92578 ± 1004.770
2025-08-07 06:49:11,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1617.3473, 4080.126, 3928.3577, 3811.665, 2006.4142, 3813.7275, 1692.2129, 3945.9834, 3984.5864, 4068.8396]
2025-08-07 06:49:11,761 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:49:11,770 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 56 minutes, 10 seconds)
2025-08-07 06:50:54,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:51:07,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3187.57397 ± 1051.876
2025-08-07 06:51:07,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3957.1, 1116.2848, 4072.0073, 2194.174, 3230.5493, 3477.564, 3930.1643, 3994.4087, 4179.2104, 1724.2762]
2025-08-07 06:51:07,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:51:07,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 54 minutes, 14 seconds)
2025-08-07 06:52:50,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:53:03,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3543.07935 ± 796.098
2025-08-07 06:53:03,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3609.7874, 3923.1172, 3462.6953, 3178.6943, 4046.698, 1303.191, 3952.3684, 3996.8433, 3864.8542, 4092.5432]
2025-08-07 06:53:03,917 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:53:03,925 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 52 minutes, 16 seconds)
2025-08-07 06:54:47,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:55:00,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3151.42041 ± 992.253
2025-08-07 06:55:00,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4163.4014, 1714.459, 2523.5999, 1485.8428, 2472.0525, 4119.614, 3028.174, 4019.886, 3965.9524, 4021.223]
2025-08-07 06:55:00,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:55:00,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 50 minutes, 21 seconds)
2025-08-07 06:56:43,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:56:56,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3361.91528 ± 742.910
2025-08-07 06:56:56,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3906.7563, 4034.89, 2214.6965, 2768.2642, 4023.2837, 2250.508, 4012.7961, 3757.6748, 3975.244, 2675.0388]
2025-08-07 06:56:56,595 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:56:56,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 48 minutes, 25 seconds)
2025-08-07 06:58:39,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:58:52,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3159.77417 ± 1101.693
2025-08-07 06:58:52,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3962.3354, 2655.9614, 2183.922, 1078.9796, 3834.5696, 4211.5757, 1657.7856, 3989.9883, 3999.8281, 4022.796]
2025-08-07 06:58:52,871 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:58:52,879 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 46 minutes, 29 seconds)
2025-08-07 07:00:35,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:00:48,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3391.02979 ± 1018.425
2025-08-07 07:00:48,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4078.5574, 4070.78, 4054.0466, 2431.9038, 3832.4075, 4103.23, 1872.2834, 1348.5664, 4105.4473, 4013.0767]
2025-08-07 07:00:48,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:00:48,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 44 minutes, 32 seconds)
2025-08-07 07:02:31,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:02:45,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3337.96167 ± 908.848
2025-08-07 07:02:45,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1147.4657, 3959.1362, 4031.9036, 3421.7078, 3853.639, 2249.71, 2984.0476, 3793.9055, 3908.0303, 4030.07]
2025-08-07 07:02:45,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:02:45,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 42 minutes, 37 seconds)
2025-08-07 07:04:28,120 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:04:41,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3719.67041 ± 801.893
2025-08-07 07:04:41,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3923.6348, 4117.5317, 4002.4958, 4146.2705, 3744.9817, 3974.217, 4025.83, 4024.1423, 1335.6881, 3901.9111]
2025-08-07 07:04:41,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:04:41,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 40 minutes, 39 seconds)
2025-08-07 07:06:24,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:06:37,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3400.15381 ± 845.940
2025-08-07 07:06:37,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4113.094, 3944.949, 3178.7195, 4106.6865, 4075.1692, 3204.3867, 1992.2928, 1722.9962, 3565.5312, 4097.714]
2025-08-07 07:06:37,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:06:37,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 38 minutes, 43 seconds)
2025-08-07 07:08:20,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:08:33,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3549.56323 ± 681.090
2025-08-07 07:08:33,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2951.471, 4129.656, 3983.7358, 3866.2385, 2530.3713, 3952.9006, 3974.1816, 3933.7793, 2165.2688, 4008.0298]
2025-08-07 07:08:33,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:08:33,604 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 36 minutes, 46 seconds)
2025-08-07 07:10:16,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:10:29,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3015.39404 ± 968.202
2025-08-07 07:10:29,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4034.7527, 2394.514, 4066.9512, 3933.9678, 3189.8289, 2820.58, 2571.6003, 1716.6188, 4102.609, 1322.5161]
2025-08-07 07:10:29,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:10:29,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 34 minutes, 51 seconds)
2025-08-07 07:12:12,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:12:25,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3428.26685 ± 871.927
2025-08-07 07:12:25,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2450.3955, 3847.3655, 3712.5737, 3902.8748, 1374.0593, 4080.3499, 4026.7366, 3853.9875, 2822.7522, 4211.575]
2025-08-07 07:12:25,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:12:25,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 32 minutes, 54 seconds)
2025-08-07 07:14:08,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:14:22,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3455.85547 ± 809.648
2025-08-07 07:14:22,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4008.687, 1698.9159, 3983.5762, 4077.9106, 2358.461, 3941.5747, 4082.0493, 2868.8926, 3520.106, 4018.381]
2025-08-07 07:14:22,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:14:22,072 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 30 minutes, 58 seconds)
2025-08-07 07:16:05,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:16:18,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3122.54639 ± 1102.293
2025-08-07 07:16:18,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1620.6564, 1782.706, 3640.8875, 4144.837, 1689.5394, 3982.5364, 4048.932, 4143.656, 4107.9253, 2063.7878]
2025-08-07 07:16:18,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:16:18,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 29 minutes, 1 second)
2025-08-07 07:18:01,116 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:18:14,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3637.30347 ± 720.294
2025-08-07 07:18:14,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3906.4453, 3807.18, 3694.1692, 4158.128, 4083.3655, 4080.4998, 2364.519, 4069.2476, 4115.492, 2093.9868]
2025-08-07 07:18:14,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:18:14,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 27 minutes, 5 seconds)
2025-08-07 07:19:57,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:20:10,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3090.90479 ± 1034.811
2025-08-07 07:20:10,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3923.9778, 3838.103, 3994.4614, 1579.317, 1790.187, 2744.2424, 3617.9072, 1456.9401, 4041.7166, 3922.195]
2025-08-07 07:20:10,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:20:10,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 25 minutes, 9 seconds)
2025-08-07 07:21:53,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:22:06,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3177.72119 ± 1095.723
2025-08-07 07:22:06,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3643.6445, 4182.994, 1155.4907, 3783.859, 4230.7754, 1548.7968, 2151.9124, 3015.8914, 3928.2278, 4135.618]
2025-08-07 07:22:06,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:22:06,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 23 minutes, 12 seconds)
2025-08-07 07:23:49,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:24:02,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3488.81714 ± 893.078
2025-08-07 07:24:02,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3636.787, 1545.9248, 2953.2751, 3954.658, 4112.411, 4092.0884, 4086.5762, 4009.8423, 4291.869, 2204.7393]
2025-08-07 07:24:02,346 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:24:02,358 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 21 minutes, 16 seconds)
2025-08-07 07:25:45,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:25:58,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3721.53955 ± 545.078
2025-08-07 07:25:58,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3098.5466, 4175.457, 4053.4878, 3918.3794, 4031.6804, 4382.351, 4171.733, 2809.4321, 3663.4414, 2910.8877]
2025-08-07 07:25:58,499 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:25:58,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 19 minutes, 20 seconds)
2025-08-07 07:27:41,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:27:54,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2859.29565 ± 1128.765
2025-08-07 07:27:54,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3830.5103, 2286.908, 3948.8882, 2644.393, 1131.5874, 4040.174, 1660.4696, 3716.5974, 4010.4128, 1323.0184]
2025-08-07 07:27:54,753 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:27:54,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 17 minutes, 24 seconds)
2025-08-07 07:29:37,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:29:51,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3880.62646 ± 489.872
2025-08-07 07:29:51,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2468.6301, 4102.571, 3705.0266, 4051.6104, 4063.6428, 4075.8198, 3915.1016, 4121.4756, 4249.882, 4052.5024]
2025-08-07 07:29:51,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:29:51,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 15 minutes, 29 seconds)
2025-08-07 07:31:34,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:31:47,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3180.92505 ± 1078.045
2025-08-07 07:31:47,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [576.02686, 2097.4666, 3952.9878, 3684.039, 3959.7222, 3814.3872, 3728.9053, 3459.436, 2427.7922, 4108.485]
2025-08-07 07:31:47,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:31:47,083 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 13 minutes, 33 seconds)
2025-08-07 07:33:30,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:33:43,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3318.65186 ± 1013.592
2025-08-07 07:33:43,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4103.9355, 2102.3696, 1529.816, 3303.5493, 1833.9147, 4035.1035, 4016.0024, 4060.6375, 4106.2007, 4094.9888]
2025-08-07 07:33:43,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:33:43,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 11 minutes, 37 seconds)
2025-08-07 07:35:26,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:35:39,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3678.51099 ± 829.006
2025-08-07 07:35:39,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4212.0854, 4155.8184, 2206.4097, 3034.9985, 2161.1, 4004.876, 4343.508, 4368.857, 4082.6199, 4214.8354]
2025-08-07 07:35:39,414 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:35:39,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes, 40 seconds)
2025-08-07 07:37:22,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:37:35,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3261.01782 ± 990.808
2025-08-07 07:37:35,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2029.3096, 4207.428, 3990.1438, 1742.5269, 2134.29, 3897.3557, 4333.78, 3843.825, 2379.7712, 4051.7485]
2025-08-07 07:37:35,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:37:35,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 44 seconds)
2025-08-07 07:39:18,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:39:31,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2775.66284 ± 1152.532
2025-08-07 07:39:31,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [747.03046, 4153.8296, 3961.5823, 4067.343, 3988.2314, 2378.143, 2498.4539, 1953.524, 2562.1428, 1446.3503]
2025-08-07 07:39:31,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:39:31,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 48 seconds)
2025-08-07 07:41:14,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:41:27,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3331.10107 ± 936.111
2025-08-07 07:41:27,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3714.9033, 3867.7317, 3981.823, 2039.5366, 4093.9978, 3945.4749, 2328.8281, 1466.4297, 3758.6465, 4113.642]
2025-08-07 07:41:27,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:41:27,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 52 seconds)
2025-08-07 07:43:10,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:43:23,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3289.07227 ± 925.719
2025-08-07 07:43:23,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [3933.0178, 4076.4958, 2020.9613, 4131.001, 3814.1438, 1784.5829, 3270.6958, 1951.09, 3992.7192, 3916.0159]
2025-08-07 07:43:23,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:43:23,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 56 seconds)
2025-08-07 07:45:05,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:45:18,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 3515.80127 ± 895.084
2025-08-07 07:45:18,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [4014.406, 1250.2395, 2356.788, 3841.68, 3836.3103, 4132.083, 4002.212, 3908.5757, 3913.358, 3902.3596]
2025-08-07 07:45:18,405 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:45:18,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1251 [DEBUG]: Training session finished
