2025-08-07 05:16:47,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc15-halfcheetah/ExtremeClogL1U23-bpql-mem24
2025-08-07 05:16:47,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc15-halfcheetah/ExtremeClogL1U23-bpql-mem24
2025-08-07 05:16:47,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x14663af23bd0>}
2025-08-07 05:16:47,602 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1111 [DEBUG]: using device: cuda
2025-08-07 05:16:47,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1133 [INFO]: Creating new trainer
2025-08-07 05:16:47,624 baseline-bpql-noiseperc15-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=161, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 05:16:47,624 baseline-bpql-noiseperc15-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 05:16:49,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1194 [DEBUG]: Starting training session...
2025-08-07 05:16:49,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 1/100
2025-08-07 05:18:22,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:18:35,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: -409.18094 ± 33.927
2025-08-07 05:18:35,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [-425.4909, -348.47696, -464.54956, -447.01578, -377.04382, -434.80313, -374.6029, -415.27448, -397.5233, -407.02866]
2025-08-07 05:18:35,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:18:35,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (-409.18) for latency ExtremeClogL1U23
2025-08-07 05:18:35,677 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 2 hours, 55 minutes, 36 seconds)
2025-08-07 05:20:15,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:20:27,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: -223.93254 ± 49.216
2025-08-07 05:20:27,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [-230.22615, -163.5231, -282.73663, -291.62292, -245.6017, -245.33746, -159.47746, -267.96313, -154.53876, -198.29817]
2025-08-07 05:20:27,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:20:27,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (-223.93) for latency ExtremeClogL1U23
2025-08-07 05:20:27,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 2 hours, 58 minutes, 29 seconds)
2025-08-07 05:22:06,859 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:22:19,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: -157.87137 ± 57.190
2025-08-07 05:22:19,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [-100.34523, -187.29414, -71.1641, -68.31108, -246.77408, -173.39133, -210.60832, -176.04208, -198.0329, -146.7504]
2025-08-07 05:22:19,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:22:19,507 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (-157.87) for latency ExtremeClogL1U23
2025-08-07 05:22:19,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 2 hours, 57 minutes, 58 seconds)
2025-08-07 05:23:57,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:24:10,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: -176.50008 ± 80.107
2025-08-07 05:24:10,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [-273.73984, -193.81804, -250.76991, -112.63811, -245.92705, -151.89607, -152.00166, -160.78975, 11.077947, -234.49837]
2025-08-07 05:24:10,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:24:10,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 2 hours, 56 minutes, 31 seconds)
2025-08-07 05:25:48,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:26:01,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: -99.95927 ± 72.524
2025-08-07 05:26:01,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [-208.72604, -31.023705, -151.14856, -156.84349, -154.1937, 32.334816, -108.526855, -58.092255, -21.594126, -141.77878]
2025-08-07 05:26:01,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:26:01,423 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (-99.96) for latency ExtremeClogL1U23
2025-08-07 05:26:01,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 2 hours, 54 minutes, 51 seconds)
2025-08-07 05:27:39,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:27:51,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: -55.52736 ± 27.241
2025-08-07 05:27:51,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [-43.04256, -47.53619, -38.664883, -48.315063, -78.19686, -48.02046, -76.288994, 3.9850354, -89.9961, -89.197525]
2025-08-07 05:27:51,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:27:51,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (-55.53) for latency ExtremeClogL1U23
2025-08-07 05:27:51,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 2 hours, 54 minutes, 16 seconds)
2025-08-07 05:29:29,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:29:42,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 62.73618 ± 88.634
2025-08-07 05:29:42,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [256.3035, -102.872696, 75.44307, 43.667675, 30.28575, 21.46467, 76.85765, 103.69511, -3.9407496, 126.45779]
2025-08-07 05:29:42,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:29:42,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (62.74) for latency ExtremeClogL1U23
2025-08-07 05:29:42,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 2 hours, 51 minutes, 49 seconds)
2025-08-07 05:31:19,518 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:31:32,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 207.35124 ± 77.744
2025-08-07 05:31:32,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [167.15472, 193.10933, 243.21829, 178.38931, 191.55573, 32.910355, 217.86546, 357.30798, 235.18945, 256.81177]
2025-08-07 05:31:32,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:31:32,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (207.35) for latency ExtremeClogL1U23
2025-08-07 05:31:32,077 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 2 hours, 49 minutes, 27 seconds)
2025-08-07 05:33:09,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:33:22,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 266.07578 ± 229.195
2025-08-07 05:33:22,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [491.06253, 336.9435, 389.91266, 335.34558, 4.740674, -242.93274, 246.79503, 369.23068, 578.8704, 150.78949]
2025-08-07 05:33:22,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:33:22,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (266.08) for latency ExtremeClogL1U23
2025-08-07 05:33:22,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 47 minutes, 19 seconds)
2025-08-07 05:34:59,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:35:12,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 493.13019 ± 195.290
2025-08-07 05:35:12,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [749.906, 565.9019, 281.0171, 612.91693, 196.74074, 559.0471, 585.3698, 147.25142, 615.21906, 617.9317]
2025-08-07 05:35:12,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:35:12,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (493.13) for latency ExtremeClogL1U23
2025-08-07 05:35:12,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 45 minutes, 11 seconds)
2025-08-07 05:36:49,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:37:01,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 447.35489 ± 184.503
2025-08-07 05:37:01,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [222.01694, 549.9612, 145.26614, 314.22803, 481.11005, 264.6972, 569.78986, 587.9392, 703.7556, 634.7844]
2025-08-07 05:37:01,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:37:01,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 43 minutes, 9 seconds)
2025-08-07 05:38:39,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:38:51,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 536.80310 ± 79.651
2025-08-07 05:38:51,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [500.7994, 541.75616, 415.14276, 592.44507, 653.41016, 596.92426, 618.60876, 536.5529, 395.67444, 516.71735]
2025-08-07 05:38:51,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:38:51,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (536.80) for latency ExtremeClogL1U23
2025-08-07 05:38:51,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 41 minutes, 13 seconds)
2025-08-07 05:40:29,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:40:41,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 581.20764 ± 116.658
2025-08-07 05:40:41,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [472.435, 630.0552, 550.56647, 719.46954, 733.408, 615.9336, 343.79218, 589.6605, 475.92505, 680.831]
2025-08-07 05:40:41,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:40:41,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (581.21) for latency ExtremeClogL1U23
2025-08-07 05:40:41,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 39 minutes, 20 seconds)
2025-08-07 05:42:19,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:42:31,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 580.58478 ± 323.602
2025-08-07 05:42:31,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [850.58636, -312.6086, 459.31693, 713.0214, 823.4296, 698.75085, 500.955, 665.3607, 586.1462, 820.8892]
2025-08-07 05:42:31,695 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:42:31,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 37 minutes, 31 seconds)
2025-08-07 05:44:09,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:44:21,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 594.34790 ± 197.553
2025-08-07 05:44:21,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [841.135, 447.6839, 740.2582, 888.4707, 589.5979, 415.49448, 441.33322, 345.56357, 819.10675, 414.83517]
2025-08-07 05:44:21,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:44:21,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (594.35) for latency ExtremeClogL1U23
2025-08-07 05:44:21,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 35 minutes, 46 seconds)
2025-08-07 05:45:58,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:46:11,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 689.03632 ± 194.937
2025-08-07 05:46:11,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [640.31854, 492.36353, 814.5492, 985.8987, 563.98413, 796.19434, 985.8466, 509.99362, 393.33435, 707.88055]
2025-08-07 05:46:11,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:46:11,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (689.04) for latency ExtremeClogL1U23
2025-08-07 05:46:11,401 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 33 minutes, 52 seconds)
2025-08-07 05:47:48,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:48:00,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 729.88519 ± 282.678
2025-08-07 05:48:00,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [948.20795, 589.60364, 622.9032, 1056.2069, 905.808, 932.84546, 62.34176, 718.696, 512.1317, 950.10754]
2025-08-07 05:48:00,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:48:00,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (729.89) for latency ExtremeClogL1U23
2025-08-07 05:48:00,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 31 minutes, 54 seconds)
2025-08-07 05:49:37,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:49:50,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 705.90656 ± 189.445
2025-08-07 05:49:50,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [921.39594, 734.0706, 705.457, 622.17865, 923.49475, 994.0267, 675.89825, 587.40045, 333.82065, 561.3223]
2025-08-07 05:49:50,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:49:50,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 29 minutes, 58 seconds)
2025-08-07 05:51:27,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:51:39,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 786.97168 ± 189.367
2025-08-07 05:51:39,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1049.7366, 795.0054, 588.565, 869.79126, 878.29944, 487.7505, 953.69684, 474.24072, 881.09247, 891.5391]
2025-08-07 05:51:39,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:51:39,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (786.97) for latency ExtremeClogL1U23
2025-08-07 05:51:39,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 27 minutes, 59 seconds)
2025-08-07 05:53:16,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:53:29,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 517.79242 ± 377.327
2025-08-07 05:53:29,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [855.89185, 720.27216, -305.92996, 913.8913, 410.53323, 803.99506, 13.838429, 474.60947, 805.4305, 485.39194]
2025-08-07 05:53:29,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:53:29,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 26 minutes, 1 second)
2025-08-07 05:55:06,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:55:19,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 783.93079 ± 133.568
2025-08-07 05:55:19,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [918.164, 833.3227, 865.48334, 671.2805, 763.4509, 873.233, 895.4332, 789.55664, 440.8205, 788.56287]
2025-08-07 05:55:19,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:55:19,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 24 minutes, 13 seconds)
2025-08-07 05:56:56,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:57:08,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 861.38220 ± 137.861
2025-08-07 05:57:08,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [973.60144, 886.15967, 1052.3837, 584.7461, 798.446, 720.685, 915.36694, 1034.3834, 776.866, 871.1839]
2025-08-07 05:57:08,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:57:08,470 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (861.38) for latency ExtremeClogL1U23
2025-08-07 05:57:08,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 22 minutes, 23 seconds)
2025-08-07 05:58:45,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 05:58:57,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 901.64032 ± 87.521
2025-08-07 05:58:57,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [965.029, 830.4364, 971.889, 838.0492, 884.0548, 926.99664, 896.67505, 1007.55334, 990.2559, 705.46344]
2025-08-07 05:58:57,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 05:58:57,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (901.64) for latency ExtremeClogL1U23
2025-08-07 05:58:57,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 20 minutes, 34 seconds)
2025-08-07 06:00:34,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:00:47,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 894.97919 ± 126.416
2025-08-07 06:00:47,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [853.9775, 1097.9883, 750.7626, 1002.4511, 784.8638, 894.35657, 1006.4332, 676.06067, 879.1022, 1003.7968]
2025-08-07 06:00:47,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:00:47,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 18 minutes, 42 seconds)
2025-08-07 06:02:24,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:02:36,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 909.63361 ± 138.216
2025-08-07 06:02:36,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [902.50055, 894.30255, 965.812, 904.95715, 929.7161, 786.1527, 1017.76044, 977.4214, 584.55743, 1133.1558]
2025-08-07 06:02:36,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:02:36,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (909.63) for latency ExtremeClogL1U23
2025-08-07 06:02:36,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 16 minutes, 50 seconds)
2025-08-07 06:04:13,842 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:04:26,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1043.08472 ± 132.190
2025-08-07 06:04:26,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [985.21564, 1133.1409, 938.84863, 1048.1589, 869.69714, 1004.1921, 1388.4481, 1014.6318, 1027.487, 1021.02673]
2025-08-07 06:04:26,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:04:26,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (1043.08) for latency ExtremeClogL1U23
2025-08-07 06:04:26,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 15 minutes)
2025-08-07 06:06:03,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:06:15,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 919.98700 ± 142.513
2025-08-07 06:06:15,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1034.8881, 1074.6895, 967.7541, 893.54425, 958.2352, 573.6073, 750.58716, 998.7469, 953.5791, 994.2386]
2025-08-07 06:06:15,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:06:15,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 13 minutes, 10 seconds)
2025-08-07 06:07:52,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:08:05,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 855.00146 ± 131.872
2025-08-07 06:08:05,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [754.1022, 849.41235, 710.2708, 775.83435, 1066.5746, 1021.54443, 915.3495, 912.09515, 625.0477, 919.7833]
2025-08-07 06:08:05,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:08:05,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 11 minutes, 24 seconds)
2025-08-07 06:09:42,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:09:54,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 994.27557 ± 88.229
2025-08-07 06:09:54,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1007.51746, 872.133, 1004.61914, 940.59766, 892.96204, 1077.6658, 1112.8596, 1099.365, 1056.9724, 878.0631]
2025-08-07 06:09:54,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:09:54,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 9 minutes, 34 seconds)
2025-08-07 06:11:31,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:11:44,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1041.96899 ± 73.534
2025-08-07 06:11:44,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [952.35913, 1234.0367, 1068.5123, 1006.32544, 1039.8517, 1075.0514, 1047.3549, 1012.8311, 1000.15454, 983.2136]
2025-08-07 06:11:44,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:11:44,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 7 minutes, 47 seconds)
2025-08-07 06:13:21,659 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:13:34,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 971.20575 ± 80.954
2025-08-07 06:13:34,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1140.2856, 897.68713, 880.2353, 920.84796, 1044.2217, 981.9174, 908.63605, 1064.598, 924.9194, 948.70905]
2025-08-07 06:13:34,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:13:34,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 5 minutes, 58 seconds)
2025-08-07 06:15:11,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:15:23,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1012.16406 ± 100.289
2025-08-07 06:15:23,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [897.0114, 990.8934, 1082.469, 1040.9819, 912.90485, 893.55206, 1031.23, 1242.4093, 1058.5634, 971.6251]
2025-08-07 06:15:23,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:15:23,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 4 minutes, 13 seconds)
2025-08-07 06:17:00,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:17:13,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 962.93634 ± 76.322
2025-08-07 06:17:13,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [906.39185, 1046.2843, 1065.2029, 1018.0542, 927.1704, 1062.7297, 906.2355, 846.4537, 881.9853, 968.8554]
2025-08-07 06:17:13,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:17:13,397 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 2 minutes, 23 seconds)
2025-08-07 06:18:50,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:19:03,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1069.19336 ± 143.310
2025-08-07 06:19:03,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1040.4628, 1063.3431, 1170.6128, 1165.7207, 1147.5518, 743.0762, 1000.9885, 1071.8217, 1312.2826, 976.0728]
2025-08-07 06:19:03,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:19:03,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (1069.19) for latency ExtremeClogL1U23
2025-08-07 06:19:03,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 36 seconds)
2025-08-07 06:20:40,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:20:52,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1199.15222 ± 212.584
2025-08-07 06:20:52,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [997.5082, 924.0134, 1306.5154, 1672.9904, 1004.7892, 1354.2783, 1217.1421, 1206.5936, 1282.6578, 1025.0339]
2025-08-07 06:20:52,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:20:52,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (1199.15) for latency ExtremeClogL1U23
2025-08-07 06:20:52,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 1 hour, 58 minutes, 48 seconds)
2025-08-07 06:22:29,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:22:42,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1197.20740 ± 161.717
2025-08-07 06:22:42,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1326.2457, 1115.8672, 1513.6472, 1385.9446, 1170.2089, 925.5094, 1197.0931, 1047.1965, 1141.033, 1149.3276]
2025-08-07 06:22:42,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:22:42,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 1 hour, 56 minutes, 58 seconds)
2025-08-07 06:24:19,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:24:31,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1134.89771 ± 164.064
2025-08-07 06:24:31,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1403.3885, 960.2837, 1001.23267, 1117.5977, 983.90814, 994.5859, 1015.4585, 1398.3215, 1263.4905, 1210.7089]
2025-08-07 06:24:31,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:24:32,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 1 hour, 55 minutes, 7 seconds)
2025-08-07 06:26:09,162 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:26:21,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1127.95374 ± 167.892
2025-08-07 06:26:21,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1103.8214, 1273.4526, 1279.3506, 1254.7402, 1295.8383, 1069.2759, 1254.0891, 1006.5006, 987.4881, 754.98]
2025-08-07 06:26:21,609 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:26:21,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 1 hour, 53 minutes, 17 seconds)
2025-08-07 06:27:58,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:28:11,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1133.34692 ± 130.371
2025-08-07 06:28:11,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1063.3549, 1192.2368, 1060.1996, 1432.3405, 1126.9021, 984.2374, 992.0433, 1246.1517, 1193.5435, 1042.4601]
2025-08-07 06:28:11,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:28:11,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 51 minutes, 28 seconds)
2025-08-07 06:29:48,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:30:00,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1169.73853 ± 168.703
2025-08-07 06:30:00,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1055.5939, 1055.445, 1015.03394, 949.2422, 1261.3549, 1337.0704, 1496.2213, 1258.337, 1262.3143, 1006.77216]
2025-08-07 06:30:00,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:30:00,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 49 minutes, 37 seconds)
2025-08-07 06:31:37,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:31:50,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 938.24249 ± 263.454
2025-08-07 06:31:50,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [880.1434, 1072.719, 1162.1677, 996.99286, 802.4953, 1030.2816, 246.12106, 1265.5911, 1011.5135, 914.3994]
2025-08-07 06:31:50,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:31:50,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 47 minutes, 46 seconds)
2025-08-07 06:33:27,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:33:40,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1224.80859 ± 152.276
2025-08-07 06:33:40,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1393.9825, 1074.6799, 1371.7968, 1410.4629, 1091.0023, 1016.4091, 1232.8267, 1037.7665, 1392.9215, 1226.2365]
2025-08-07 06:33:40,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:33:40,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (1224.81) for latency ExtremeClogL1U23
2025-08-07 06:33:40,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 45 minutes, 57 seconds)
2025-08-07 06:35:17,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:35:29,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1256.75903 ± 434.844
2025-08-07 06:35:29,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [402.97086, 1309.3578, 1092.2551, 1685.2882, 1411.7115, 948.1558, 1131.0321, 2125.3938, 1061.1058, 1400.3193]
2025-08-07 06:35:29,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:35:29,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (1256.76) for latency ExtremeClogL1U23
2025-08-07 06:35:29,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 44 minutes, 10 seconds)
2025-08-07 06:37:06,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:37:19,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1338.53625 ± 188.198
2025-08-07 06:37:19,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1193.7999, 1301.0894, 1053.6776, 1262.2369, 1159.4838, 1463.1631, 1307.5645, 1753.4929, 1470.6794, 1420.1744]
2025-08-07 06:37:19,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:37:19,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (1338.54) for latency ExtremeClogL1U23
2025-08-07 06:37:19,493 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 42 minutes, 19 seconds)
2025-08-07 06:38:56,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:39:09,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1190.66956 ± 235.148
2025-08-07 06:39:09,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1001.495, 1018.91315, 1374.2874, 1046.9608, 1224.7378, 1118.0461, 1810.6754, 997.3425, 1119.5627, 1194.675]
2025-08-07 06:39:09,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:39:09,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 40 minutes, 30 seconds)
2025-08-07 06:40:46,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:40:58,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1249.86743 ± 294.464
2025-08-07 06:40:58,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1253.0137, 1302.0815, 2035.3074, 1094.5172, 968.5752, 1225.4584, 1224.6423, 963.34595, 1386.9486, 1044.7844]
2025-08-07 06:40:58,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:40:58,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 38 minutes, 42 seconds)
2025-08-07 06:42:36,277 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:42:48,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1294.75024 ± 212.279
2025-08-07 06:42:48,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1033.6733, 1351.3643, 1482.8678, 1721.2834, 1363.7606, 1006.8891, 1201.1489, 1188.9955, 1133.9358, 1463.5836]
2025-08-07 06:42:48,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:42:48,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 36 minutes, 56 seconds)
2025-08-07 06:44:25,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:44:38,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1423.57532 ± 329.375
2025-08-07 06:44:38,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1274.5555, 2150.6597, 1670.8406, 1059.4498, 1507.798, 1574.1838, 1337.2803, 996.96545, 1560.3811, 1103.6387]
2025-08-07 06:44:38,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:44:38,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (1423.58) for latency ExtremeClogL1U23
2025-08-07 06:44:38,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 35 minutes, 5 seconds)
2025-08-07 06:46:15,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:46:28,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1293.02930 ± 204.215
2025-08-07 06:46:28,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1126.5773, 1678.2239, 1155.841, 995.45166, 1179.8739, 1126.0387, 1406.2228, 1332.334, 1366.1763, 1563.5549]
2025-08-07 06:46:28,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:46:28,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 33 minutes, 15 seconds)
2025-08-07 06:48:05,173 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:48:17,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1169.48572 ± 204.195
2025-08-07 06:48:17,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1078.7721, 1090.8265, 1691.7223, 1183.0753, 1127.5924, 1133.4326, 932.721, 1196.5704, 947.49054, 1312.6548]
2025-08-07 06:48:17,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:48:17,648 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 31 minutes, 24 seconds)
2025-08-07 06:49:54,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:50:07,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1367.90991 ± 333.806
2025-08-07 06:50:07,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1204.0139, 1415.163, 1696.6444, 1079.9948, 1259.3663, 1034.9276, 1276.6174, 1002.6189, 1576.0349, 2133.718]
2025-08-07 06:50:07,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:50:07,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 29 minutes, 33 seconds)
2025-08-07 06:51:44,238 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:51:56,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1240.41064 ± 284.771
2025-08-07 06:51:56,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1555.1005, 1636.6996, 1028.6372, 1072.0381, 858.05914, 1120.2172, 1030.9, 1040.5382, 1715.8275, 1346.0886]
2025-08-07 06:51:56,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:51:56,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 27 minutes, 41 seconds)
2025-08-07 06:53:33,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:53:46,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1308.07349 ± 260.159
2025-08-07 06:53:46,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [827.8326, 970.6475, 1597.8143, 1161.0779, 1574.1115, 1334.0549, 1583.9576, 1224.9077, 1238.5183, 1567.8121]
2025-08-07 06:53:46,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:53:46,227 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 25 minutes, 48 seconds)
2025-08-07 06:55:23,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:55:35,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1229.17871 ± 260.814
2025-08-07 06:55:35,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [988.7731, 1304.5217, 1168.5793, 937.71643, 1404.2905, 1845.5991, 1132.9501, 1112.008, 983.4353, 1413.9128]
2025-08-07 06:55:35,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:55:35,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 23 minutes, 59 seconds)
2025-08-07 06:57:13,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:57:25,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1363.38538 ± 378.841
2025-08-07 06:57:25,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1055.7441, 998.0904, 1989.229, 1141.3053, 1124.72, 1942.1216, 1243.6381, 1026.348, 1849.4476, 1263.21]
2025-08-07 06:57:25,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:57:25,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 22 minutes, 11 seconds)
2025-08-07 06:59:02,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 06:59:15,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1396.10205 ± 240.968
2025-08-07 06:59:15,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1179.8455, 1160.3279, 1431.9946, 1755.7087, 1589.4615, 1716.48, 1344.5925, 1572.032, 1055.7062, 1154.8707]
2025-08-07 06:59:15,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 06:59:15,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 20 minutes, 23 seconds)
2025-08-07 07:00:52,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:01:05,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1389.76562 ± 345.682
2025-08-07 07:01:05,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1036.5294, 1082.4103, 1022.27484, 1143.8348, 1023.40674, 1632.1655, 1825.8291, 1913.6967, 1493.3381, 1724.171]
2025-08-07 07:01:05,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:01:05,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 18 minutes, 34 seconds)
2025-08-07 07:02:42,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:02:54,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1172.06726 ± 201.943
2025-08-07 07:02:54,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1459.027, 1230.8148, 1093.504, 1094.7484, 1211.1013, 1012.4411, 734.35486, 1214.8479, 1200.0126, 1469.82]
2025-08-07 07:02:54,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:02:54,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 16 minutes, 47 seconds)
2025-08-07 07:04:31,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:04:44,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1360.83398 ± 235.697
2025-08-07 07:04:44,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1113.9851, 1944.2867, 1390.4482, 1229.6863, 1062.6206, 1194.4271, 1391.145, 1395.9264, 1486.8328, 1398.9814]
2025-08-07 07:04:44,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:04:44,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 14 minutes, 58 seconds)
2025-08-07 07:06:21,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:06:33,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1358.82556 ± 248.834
2025-08-07 07:06:33,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1315.6556, 1872.0165, 1455.867, 1357.165, 1116.8817, 1196.3441, 1278.6255, 1481.1874, 927.4389, 1587.0742]
2025-08-07 07:06:33,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:06:33,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 13 minutes, 7 seconds)
2025-08-07 07:08:11,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:08:23,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1387.40088 ± 298.101
2025-08-07 07:08:23,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1123.4801, 1639.8937, 1081.1971, 1521.706, 1509.7256, 1231.1702, 1029.7319, 1117.1401, 1961.5486, 1658.4153]
2025-08-07 07:08:23,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:08:23,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 11 minutes, 17 seconds)
2025-08-07 07:10:00,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:10:13,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1538.83789 ± 454.748
2025-08-07 07:10:13,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1476.3407, 1997.4917, 1026.079, 2500.8594, 1210.1293, 1425.9454, 1123.0231, 1091.6163, 1586.601, 1950.2943]
2025-08-07 07:10:13,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:10:13,490 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (1538.84) for latency ExtremeClogL1U23
2025-08-07 07:10:13,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 9 minutes, 28 seconds)
2025-08-07 07:11:50,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:12:03,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1443.92065 ± 276.610
2025-08-07 07:12:03,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1342.6094, 1582.1588, 1167.6494, 1964.8069, 1087.4193, 1773.3373, 1486.9761, 1613.178, 1129.717, 1291.3551]
2025-08-07 07:12:03,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:12:03,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 7 minutes, 38 seconds)
2025-08-07 07:13:40,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:13:52,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1527.91504 ± 462.312
2025-08-07 07:13:52,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1152.3347, 1095.3527, 1765.875, 1278.8121, 1787.5321, 2693.1052, 1152.7974, 1291.028, 1346.009, 1716.3046]
2025-08-07 07:13:52,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:13:52,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 5 minutes, 48 seconds)
2025-08-07 07:15:29,986 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:15:42,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1501.50342 ± 423.562
2025-08-07 07:15:42,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1850.6931, 1297.6274, 2180.1335, 1907.8378, 997.9857, 1135.4989, 2016.0253, 1161.3075, 1450.5144, 1017.41016]
2025-08-07 07:15:42,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:15:42,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 3 minutes, 59 seconds)
2025-08-07 07:17:19,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:17:32,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1452.24561 ± 304.977
2025-08-07 07:17:32,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1604.8131, 1488.1466, 1736.9288, 1441.7351, 1712.6478, 1280.4303, 1115.1936, 1071.654, 2012.2952, 1058.6116]
2025-08-07 07:17:32,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:17:32,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 2 minutes, 8 seconds)
2025-08-07 07:19:09,207 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:19:21,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1544.03210 ± 248.154
2025-08-07 07:19:21,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1555.4111, 1991.0361, 1278.8949, 1611.5685, 1394.6344, 1189.039, 1809.7965, 1414.4563, 1374.7277, 1820.757]
2025-08-07 07:19:21,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:19:21,773 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (1544.03) for latency ExtremeClogL1U23
2025-08-07 07:19:21,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 18 seconds)
2025-08-07 07:20:59,024 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:21:11,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1290.26953 ± 308.492
2025-08-07 07:21:11,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1460.5409, 1061.2665, 1333.8435, 1386.4498, 1275.116, 2101.5996, 1121.8053, 1053.7445, 1103.0769, 1005.25146]
2025-08-07 07:21:11,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:21:11,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 58 minutes, 29 seconds)
2025-08-07 07:22:48,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:23:00,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1235.62378 ± 116.725
2025-08-07 07:23:00,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1043.739, 1379.204, 1190.1813, 1281.2266, 1249.691, 1273.4235, 1448.7258, 1080.248, 1233.1785, 1176.6189]
2025-08-07 07:23:00,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:23:00,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 56 minutes, 38 seconds)
2025-08-07 07:24:38,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:24:50,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1393.75562 ± 389.532
2025-08-07 07:24:50,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1023.9349, 1460.8403, 1752.6311, 1363.0137, 1109.439, 986.154, 2032.0375, 1089.9453, 2031.2338, 1088.3265]
2025-08-07 07:24:50,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:24:50,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 54 minutes, 49 seconds)
2025-08-07 07:26:27,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:26:40,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1328.64136 ± 301.953
2025-08-07 07:26:40,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1547.106, 1781.0952, 1095.2487, 1039.8601, 1630.8424, 1109.5747, 1062.2964, 1025.7181, 1212.2778, 1782.3945]
2025-08-07 07:26:40,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:26:40,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 53 minutes)
2025-08-07 07:28:17,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:28:30,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1338.04138 ± 216.546
2025-08-07 07:28:30,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1157.8818, 1576.9681, 976.75385, 1391.3352, 1236.1216, 1231.4508, 1810.1237, 1348.4918, 1303.5063, 1347.781]
2025-08-07 07:28:30,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:28:30,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 51 minutes, 10 seconds)
2025-08-07 07:30:07,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:30:19,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1531.86646 ± 574.492
2025-08-07 07:30:19,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2382.4211, 1114.3385, 2795.1423, 1443.499, 1415.3291, 1173.3679, 1079.3186, 1204.2286, 1731.3492, 979.6687]
2025-08-07 07:30:19,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:30:19,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 49 minutes, 20 seconds)
2025-08-07 07:31:56,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:32:09,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1367.50415 ± 299.997
2025-08-07 07:32:09,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1567.7145, 1113.4552, 1701.6968, 1113.4468, 1039.6709, 1397.2057, 1140.1735, 1780.0477, 1033.2275, 1788.4037]
2025-08-07 07:32:09,427 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:32:09,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 47 minutes, 32 seconds)
2025-08-07 07:33:46,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:33:58,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1536.70679 ± 294.382
2025-08-07 07:33:58,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1469.3579, 1128.8834, 1338.9727, 1804.2006, 1284.3315, 1344.5675, 1541.033, 2119.3684, 1433.2202, 1903.133]
2025-08-07 07:33:58,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:33:58,960 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 45 minutes, 41 seconds)
2025-08-07 07:35:35,974 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:35:48,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1673.99182 ± 457.955
2025-08-07 07:35:48,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2678.4385, 1669.5898, 1173.0444, 1601.27, 1853.6501, 1195.8802, 1448.3994, 1237.7549, 1634.7866, 2247.1038]
2025-08-07 07:35:48,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:35:48,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (1673.99) for latency ExtremeClogL1U23
2025-08-07 07:35:48,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 43 minutes, 50 seconds)
2025-08-07 07:37:26,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:37:39,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1553.97424 ± 313.599
2025-08-07 07:37:39,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1639.0527, 1211.4088, 1183.8848, 1385.6443, 1419.7107, 1772.6923, 1667.0114, 1526.3644, 2324.2595, 1409.7135]
2025-08-07 07:37:39,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:37:39,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 42 minutes, 6 seconds)
2025-08-07 07:39:17,627 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:39:30,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1326.55835 ± 192.714
2025-08-07 07:39:30,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1030.5552, 1242.4666, 1403.4736, 1334.814, 1208.6266, 1801.5773, 1307.6034, 1238.6378, 1453.9707, 1243.8574]
2025-08-07 07:39:30,221 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:39:30,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 40 minutes, 22 seconds)
2025-08-07 07:41:08,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:41:21,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1368.39893 ± 287.526
2025-08-07 07:41:21,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1129.4849, 1040.0131, 1839.7506, 1932.5791, 1171.4131, 1115.1411, 1419.2047, 1314.908, 1280.6887, 1440.8064]
2025-08-07 07:41:21,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:41:21,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 38 minutes, 36 seconds)
2025-08-07 07:42:59,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:43:11,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1810.59436 ± 635.079
2025-08-07 07:43:11,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1083.8193, 1068.1022, 1619.5347, 1291.698, 2555.236, 1802.4199, 1871.2478, 2785.6118, 2740.8552, 1287.4185]
2025-08-07 07:43:11,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:43:11,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (1810.59) for latency ExtremeClogL1U23
2025-08-07 07:43:11,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 36 minutes, 52 seconds)
2025-08-07 07:44:50,217 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:45:02,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1492.73438 ± 330.473
2025-08-07 07:45:02,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2184.9949, 1090.9459, 1738.8273, 1727.7368, 1165.9368, 1320.5201, 1760.3665, 1270.9613, 1451.2559, 1215.7985]
2025-08-07 07:45:02,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:45:02,865 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 35 minutes, 6 seconds)
2025-08-07 07:46:41,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:46:53,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1513.43579 ± 448.747
2025-08-07 07:46:53,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1635.3177, 1614.3727, 1091.0869, 2579.2988, 1838.4072, 1024.7256, 1632.807, 1465.5697, 1209.0144, 1043.7578]
2025-08-07 07:46:53,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:46:53,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 33 minutes, 16 seconds)
2025-08-07 07:48:32,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:48:44,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1470.57324 ± 338.952
2025-08-07 07:48:44,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [985.41284, 1527.7212, 1526.0221, 1364.609, 1295.4926, 1819.9867, 2263.2556, 1383.325, 1385.5763, 1154.3325]
2025-08-07 07:48:44,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:48:44,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 31 minutes, 25 seconds)
2025-08-07 07:50:23,165 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:50:35,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1411.51282 ± 318.944
2025-08-07 07:50:35,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1358.5789, 1199.5497, 1595.6166, 1220.407, 2165.116, 1077.8044, 1163.3295, 1388.9109, 1198.8793, 1746.9369]
2025-08-07 07:50:35,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:50:35,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 29 minutes, 35 seconds)
2025-08-07 07:52:14,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:52:26,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1511.17151 ± 437.884
2025-08-07 07:52:26,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1423.0994, 1272.6971, 1203.3638, 1097.6506, 2361.1052, 2112.0627, 975.8967, 1326.5546, 1927.2937, 1411.9916]
2025-08-07 07:52:26,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:52:26,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 27 minutes, 44 seconds)
2025-08-07 07:54:05,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:54:17,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1485.80347 ± 229.342
2025-08-07 07:54:17,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1200.0416, 1120.8168, 1517.396, 1623.4102, 1988.8699, 1430.711, 1614.5848, 1539.7292, 1364.0822, 1458.394]
2025-08-07 07:54:17,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:54:17,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 25 minutes, 53 seconds)
2025-08-07 07:55:56,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:56:08,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1361.66614 ± 297.971
2025-08-07 07:56:08,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2071.367, 1158.1545, 1215.5903, 1169.9622, 1345.4142, 1303.831, 1160.92, 1635.0985, 1553.906, 1002.4174]
2025-08-07 07:56:08,943 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:56:08,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 24 minutes, 3 seconds)
2025-08-07 07:57:47,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:57:59,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1663.92615 ± 618.966
2025-08-07 07:57:59,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1057.8226, 1177.829, 1337.8984, 1934.5814, 1163.1815, 2825.412, 1318.8765, 2756.329, 1307.999, 1759.3309]
2025-08-07 07:57:59,965 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:57:59,982 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 22 minutes, 12 seconds)
2025-08-07 07:59:38,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 07:59:50,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1534.18091 ± 290.715
2025-08-07 07:59:50,952 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1777.2854, 1322.001, 1565.016, 2039.118, 1288.696, 1464.7793, 1497.2356, 979.1693, 1541.7886, 1866.7196]
2025-08-07 07:59:50,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 07:59:50,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 20 minutes, 21 seconds)
2025-08-07 08:01:29,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:01:41,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1575.86157 ± 513.111
2025-08-07 08:01:41,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1523.7683, 1533.9995, 2568.0693, 1327.8287, 1110.3806, 953.80475, 1992.3503, 1476.7296, 1008.0477, 2263.6353]
2025-08-07 08:01:41,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:01:41,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 18 minutes, 30 seconds)
2025-08-07 08:03:20,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:03:32,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1698.20642 ± 398.709
2025-08-07 08:03:32,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2050.4412, 1278.6534, 1260.5779, 1754.7808, 1969.8083, 2031.3915, 1522.9764, 1271.5823, 1374.039, 2467.8137]
2025-08-07 08:03:32,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:03:32,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 16 minutes, 38 seconds)
2025-08-07 08:05:11,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:05:23,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1575.71375 ± 498.139
2025-08-07 08:05:23,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1555.8495, 1146.1329, 1083.6859, 1154.3938, 1295.6885, 1118.8695, 2128.5671, 2646.3794, 1703.2339, 1924.3368]
2025-08-07 08:05:23,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:05:23,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 14 minutes, 47 seconds)
2025-08-07 08:07:02,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:07:14,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1377.54456 ± 263.833
2025-08-07 08:07:14,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1122.394, 1346.9896, 1409.459, 1235.3358, 1471.3436, 1103.6438, 1128.403, 1367.7559, 2033.9879, 1556.1326]
2025-08-07 08:07:14,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:07:14,762 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 12 minutes, 56 seconds)
2025-08-07 08:08:53,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:09:05,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1351.98474 ± 268.125
2025-08-07 08:09:05,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1301.3811, 1383.3203, 1434.0952, 1542.3284, 1267.5781, 1014.07947, 1023.13666, 1480.565, 1959.8766, 1113.4865]
2025-08-07 08:09:05,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:09:05,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 11 minutes, 5 seconds)
2025-08-07 08:10:44,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:10:56,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1400.89978 ± 352.909
2025-08-07 08:10:56,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1580.1354, 1314.3723, 1357.837, 1135.746, 1296.8053, 937.6935, 2101.4678, 1115.2191, 1218.6752, 1951.0466]
2025-08-07 08:10:56,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:10:56,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes, 14 seconds)
2025-08-07 08:12:35,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:12:47,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1546.83911 ± 354.555
2025-08-07 08:12:47,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1377.8236, 1351.6335, 1246.9924, 1456.5076, 2389.0398, 1694.4948, 1016.25977, 1725.81, 1461.8063, 1748.022]
2025-08-07 08:12:47,744 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:12:47,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 23 seconds)
2025-08-07 08:14:26,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:14:38,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1270.96118 ± 170.237
2025-08-07 08:14:38,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1329.5168, 1155.7317, 1127.5874, 1465.7883, 1325.539, 1132.5073, 1117.7731, 1660.1206, 1255.5764, 1139.471]
2025-08-07 08:14:38,716 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:14:38,727 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 33 seconds)
2025-08-07 08:16:17,001 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:16:29,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1575.54614 ± 423.075
2025-08-07 08:16:29,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2278.454, 1744.4784, 1250.8866, 1213.5608, 2236.6196, 1370.5917, 1627.8663, 957.02295, 1251.8856, 1824.0953]
2025-08-07 08:16:29,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:16:29,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 41 seconds)
2025-08-07 08:18:07,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:18:20,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1243.85657 ± 176.587
2025-08-07 08:18:20,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1203.6659, 1089.3986, 1137.4749, 1096.2804, 1442.9274, 1283.5061, 1129.0211, 1145.1942, 1678.6069, 1232.4895]
2025-08-07 08:18:20,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:18:20,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 50 seconds)
2025-08-07 08:19:58,830 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-08-07 08:20:11,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1429.68872 ± 291.191
2025-08-07 08:20:11,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1506.6113, 1301.6162, 1945.8282, 1146.6434, 1174.6633, 1159.2189, 1205.5944, 1932.7993, 1595.1393, 1328.7738]
2025-08-07 08:20:11,468 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 08:20:11,497 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1251 [DEBUG]: Training session finished
