2025-08-07 00:48:19,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc20-halfcheetah/ExtremeSparseL4U32-bpql-mem32
2025-08-07 00:48:19,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc20-halfcheetah/ExtremeSparseL4U32-bpql-mem32
2025-08-07 00:48:19,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x14642ef9ab50>}
2025-08-07 00:48:19,585 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1111 [DEBUG]: using device: cuda
2025-08-07 00:48:19,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1133 [INFO]: Creating new trainer
2025-08-07 00:48:19,618 baseline-bpql-noiseperc20-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=209, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 00:48:19,619 baseline-bpql-noiseperc20-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 00:48:27,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1194 [DEBUG]: Starting training session...
2025-08-07 00:48:27,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 1/100
2025-08-07 00:50:08,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:50:24,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: -411.41031 ± 46.471
2025-08-07 00:50:24,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [-508.91425, -351.97427, -386.3652, -387.31693, -443.4286, -390.18716, -455.62024, -441.34296, -386.59784, -362.35562]
2025-08-07 00:50:24,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 00:50:24,301 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (-411.41) for latency ExtremeSparseL4U32
2025-08-07 00:50:24,309 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 13 minutes, 3 seconds)
2025-08-07 00:52:10,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:52:26,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: -253.26736 ± 90.946
2025-08-07 00:52:26,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [-298.9541, -251.51866, -409.50104, -334.2585, -155.90298, -202.54282, -99.94236, -354.84186, -197.33073, -227.88036]
2025-08-07 00:52:26,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 00:52:26,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (-253.27) for latency ExtremeSparseL4U32
2025-08-07 00:52:26,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 15 minutes, 20 seconds)
2025-08-07 00:54:12,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:54:28,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: -169.71750 ± 55.084
2025-08-07 00:54:28,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [-234.68466, -145.34663, -104.213936, -238.58139, -186.94885, -263.58838, -160.349, -109.65176, -123.395874, -130.41447]
2025-08-07 00:54:28,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 00:54:28,688 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (-169.72) for latency ExtremeSparseL4U32
2025-08-07 00:54:28,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 14 minutes, 44 seconds)
2025-08-07 00:56:15,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:56:30,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: -160.97943 ± 78.893
2025-08-07 00:56:30,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [-189.68318, -38.32198, -21.483393, -170.79106, -219.12936, -226.3713, -146.1042, -152.59769, -147.62407, -297.6881]
2025-08-07 00:56:30,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 00:56:30,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (-160.98) for latency ExtremeSparseL4U32
2025-08-07 00:56:30,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 13 minutes, 25 seconds)
2025-08-07 00:58:17,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:58:32,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: -159.32463 ± 97.169
2025-08-07 00:58:32,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [-8.085495, -268.74142, -205.45122, -163.60216, -293.75705, -143.05586, -151.56665, -21.119621, -69.57188, -268.2951]
2025-08-07 00:58:32,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 00:58:32,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (-159.32) for latency ExtremeSparseL4U32
2025-08-07 00:58:32,840 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 11 minutes, 45 seconds)
2025-08-07 01:00:19,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:00:34,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: -111.24329 ± 66.574
2025-08-07 01:00:34,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [-48.67712, -123.88081, -122.0233, -226.3472, -103.97724, -12.041269, -164.2986, -198.06448, -38.46414, -74.658646]
2025-08-07 01:00:34,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:00:34,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (-111.24) for latency ExtremeSparseL4U32
2025-08-07 01:00:34,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 11 minutes, 20 seconds)
2025-08-07 01:02:21,142 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:02:37,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: -59.86722 ± 155.773
2025-08-07 01:02:37,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [-20.729452, -182.94243, 6.357047, -457.18408, -11.165212, -69.16157, 159.3284, -5.0600724, -59.3356, 41.220768]
2025-08-07 01:02:37,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:02:37,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (-59.87) for latency ExtremeSparseL4U32
2025-08-07 01:02:37,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 9 minutes, 15 seconds)
2025-08-07 01:04:23,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:04:39,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: -26.31122 ± 75.060
2025-08-07 01:04:39,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [-117.99092, -1.1896446, -93.881256, 60.060802, -5.4074244, -104.61807, 77.05006, 41.10721, 16.135143, -134.37805]
2025-08-07 01:04:39,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:04:39,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (-26.31) for latency ExtremeSparseL4U32
2025-08-07 01:04:39,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 7 minutes, 13 seconds)
2025-08-07 01:06:25,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:06:41,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 20.27538 ± 125.506
2025-08-07 01:06:41,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [139.26132, -291.01633, 34.527336, 80.95272, 15.020005, -27.854042, -21.32698, 38.392513, 215.64275, 19.154524]
2025-08-07 01:06:41,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:06:41,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (20.28) for latency ExtremeSparseL4U32
2025-08-07 01:06:41,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 3 hours, 5 minutes, 11 seconds)
2025-08-07 01:08:27,581 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:08:43,240 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 103.56763 ± 117.958
2025-08-07 01:08:43,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [250.72346, 13.878136, 136.53943, -185.234, 183.77103, 62.984158, 64.38168, 135.83163, 203.26448, 169.53627]
2025-08-07 01:08:43,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:08:43,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (103.57) for latency ExtremeSparseL4U32
2025-08-07 01:08:43,245 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 3 hours, 3 minutes, 7 seconds)
2025-08-07 01:10:29,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:10:45,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 149.69649 ± 163.477
2025-08-07 01:10:45,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [44.257565, 191.3026, 305.75327, -26.702274, 62.151028, -35.134277, 461.6581, -11.5620775, 173.27461, 331.96625]
2025-08-07 01:10:45,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:10:45,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (149.70) for latency ExtremeSparseL4U32
2025-08-07 01:10:45,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 3 hours, 1 minute)
2025-08-07 01:12:31,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:12:47,135 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 279.95743 ± 152.988
2025-08-07 01:12:47,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [143.29593, 463.67102, 250.56702, 275.1543, 273.51804, 430.71457, 192.17165, 562.4206, 169.51479, 38.546116]
2025-08-07 01:12:47,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:12:47,136 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (279.96) for latency ExtremeSparseL4U32
2025-08-07 01:12:47,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 58 minutes, 58 seconds)
2025-08-07 01:14:33,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:14:49,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 284.43280 ± 137.285
2025-08-07 01:14:49,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [354.63525, 143.1601, 394.38196, 274.1805, 266.54028, 51.196213, 202.86311, 307.49686, 583.0543, 266.819]
2025-08-07 01:14:49,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:14:49,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (284.43) for latency ExtremeSparseL4U32
2025-08-07 01:14:49,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 56 minutes, 51 seconds)
2025-08-07 01:16:35,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:16:51,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 251.56796 ± 186.321
2025-08-07 01:16:51,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [537.058, 193.55753, 351.86102, 325.6719, 215.4534, 309.7341, 133.05746, -194.51666, 224.12697, 419.67593]
2025-08-07 01:16:51,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:16:51,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 54 minutes, 47 seconds)
2025-08-07 01:18:37,315 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:18:53,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 276.29984 ± 226.999
2025-08-07 01:18:53,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [256.12485, 176.42319, 153.40146, 695.112, 74.68929, 419.95758, -29.260715, 537.05725, 450.79013, 28.703295]
2025-08-07 01:18:53,095 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:18:53,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 52 minutes, 47 seconds)
2025-08-07 01:20:39,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:20:55,247 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 268.45651 ± 205.800
2025-08-07 01:20:55,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [345.27368, 296.12134, 281.93698, 322.41788, -258.3826, 597.10114, 320.17398, 280.4931, 138.6102, 360.81952]
2025-08-07 01:20:55,248 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:20:55,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 50 minutes, 50 seconds)
2025-08-07 01:22:41,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:22:57,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 501.91772 ± 171.928
2025-08-07 01:22:57,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [417.68875, 253.55768, 239.02563, 716.93976, 472.7675, 519.25244, 678.51636, 597.6144, 739.07605, 384.73895]
2025-08-07 01:22:57,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:22:57,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (501.92) for latency ExtremeSparseL4U32
2025-08-07 01:22:57,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 48 minutes, 48 seconds)
2025-08-07 01:24:43,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:24:59,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 458.59531 ± 214.144
2025-08-07 01:24:59,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [573.9697, 72.456825, 582.00903, 737.7238, 146.04604, 709.5682, 536.14343, 407.47186, 530.8027, 289.76172]
2025-08-07 01:24:59,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:24:59,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 46 minutes, 49 seconds)
2025-08-07 01:26:45,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:27:01,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 319.12003 ± 247.575
2025-08-07 01:27:01,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [314.50406, 654.146, 193.54094, -82.74423, 475.0424, 403.33444, 121.41691, 25.25954, 361.68735, 725.0129]
2025-08-07 01:27:01,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:27:01,685 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 44 minutes, 51 seconds)
2025-08-07 01:28:47,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:29:03,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 424.68198 ± 112.064
2025-08-07 01:29:03,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [402.14984, 267.80396, 346.4911, 373.69214, 596.9728, 619.6923, 382.1192, 404.55627, 531.1087, 322.23373]
2025-08-07 01:29:03,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:29:03,743 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 42 minutes, 50 seconds)
2025-08-07 01:30:50,130 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:31:05,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 447.87079 ± 77.534
2025-08-07 01:31:05,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [455.19955, 463.60684, 441.17493, 414.2545, 429.71176, 519.46594, 597.4111, 483.95587, 386.55173, 287.37613]
2025-08-07 01:31:05,987 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:31:05,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 40 minutes, 49 seconds)
2025-08-07 01:32:52,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:33:08,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 407.59122 ± 120.695
2025-08-07 01:33:08,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [325.4697, 574.2591, 581.85486, 315.5716, 370.95596, 503.56808, 503.88947, 368.67862, 339.2564, 192.40825]
2025-08-07 01:33:08,125 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:33:08,133 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 38 minutes, 49 seconds)
2025-08-07 01:34:54,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:35:10,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 482.24292 ± 152.692
2025-08-07 01:35:10,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [293.65256, 438.90958, 522.26996, 395.48785, 405.7475, 708.5242, 760.06946, 401.2013, 300.8732, 595.6937]
2025-08-07 01:35:10,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:35:10,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 36 minutes, 45 seconds)
2025-08-07 01:36:56,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:37:12,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 449.89468 ± 178.891
2025-08-07 01:37:12,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [656.80457, 387.71207, 142.50247, 256.9224, 468.93597, 321.30573, 646.1412, 473.2186, 739.27985, 406.12384]
2025-08-07 01:37:12,359 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:37:12,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 34 minutes, 42 seconds)
2025-08-07 01:38:58,553 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:39:14,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 627.93951 ± 149.996
2025-08-07 01:39:14,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [841.1138, 684.58826, 698.29126, 413.6627, 604.01117, 781.1349, 434.35526, 411.99457, 653.11, 757.1334]
2025-08-07 01:39:14,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:39:14,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (627.94) for latency ExtremeSparseL4U32
2025-08-07 01:39:14,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 32 minutes, 39 seconds)
2025-08-07 01:41:00,623 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:41:16,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 397.54169 ± 116.948
2025-08-07 01:41:16,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [376.23044, 547.1058, 278.3363, 522.41266, 316.07474, 425.09393, 530.16705, 223.6413, 496.84802, 259.50687]
2025-08-07 01:41:16,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:41:16,350 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 30 minutes, 33 seconds)
2025-08-07 01:43:02,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:43:18,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 557.28082 ± 158.471
2025-08-07 01:43:18,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [673.7176, 490.75067, 526.0558, 157.19467, 711.95764, 610.2343, 712.6096, 589.3504, 650.784, 450.15332]
2025-08-07 01:43:18,337 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:43:18,345 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 28 minutes, 29 seconds)
2025-08-07 01:45:04,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:45:20,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 622.71399 ± 123.179
2025-08-07 01:45:20,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [604.77167, 590.18146, 637.9318, 594.20624, 760.73206, 712.71436, 317.88358, 669.69934, 775.9271, 563.09216]
2025-08-07 01:45:20,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:45:20,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 26 minutes, 25 seconds)
2025-08-07 01:47:06,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:47:22,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 679.52600 ± 180.438
2025-08-07 01:47:22,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [772.3565, 1045.8339, 692.42255, 769.50226, 398.56467, 568.80664, 415.18005, 749.16864, 756.1034, 627.32117]
2025-08-07 01:47:22,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:47:22,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (679.53) for latency ExtremeSparseL4U32
2025-08-07 01:47:22,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 24 minutes, 20 seconds)
2025-08-07 01:49:08,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:49:24,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 635.87891 ± 166.736
2025-08-07 01:49:24,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [645.8778, 629.3576, 221.75116, 528.8002, 696.20526, 784.4443, 697.3154, 558.30725, 740.947, 855.7831]
2025-08-07 01:49:24,303 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:49:24,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 22 minutes, 18 seconds)
2025-08-07 01:51:10,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:51:26,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 456.02271 ± 347.235
2025-08-07 01:51:26,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [665.9369, 620.88885, 743.14124, 561.47156, -212.45737, 450.5865, 783.6012, 709.58606, 430.90656, -193.43452]
2025-08-07 01:51:26,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:51:26,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 20 minutes, 17 seconds)
2025-08-07 01:53:12,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:53:28,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 610.97156 ± 166.113
2025-08-07 01:53:28,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [834.8221, 282.5772, 596.9784, 809.43567, 628.4547, 811.95264, 582.5959, 447.80753, 605.3233, 509.76828]
2025-08-07 01:53:28,557 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:53:28,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 18 minutes, 18 seconds)
2025-08-07 01:55:14,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:55:30,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 709.00378 ± 203.367
2025-08-07 01:55:30,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [700.6466, 760.3371, 725.2108, 119.38157, 819.2149, 768.68146, 893.69244, 752.7171, 810.4183, 739.7379]
2025-08-07 01:55:30,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:55:30,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (709.00) for latency ExtremeSparseL4U32
2025-08-07 01:55:30,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 16 minutes, 21 seconds)
2025-08-07 01:57:17,271 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:57:33,170 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 766.90887 ± 140.159
2025-08-07 01:57:33,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [640.80554, 607.0125, 718.4044, 870.8449, 775.4605, 1072.8733, 572.7006, 804.176, 747.0007, 859.81036]
2025-08-07 01:57:33,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:57:33,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (766.91) for latency ExtremeSparseL4U32
2025-08-07 01:57:33,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 14 minutes, 24 seconds)
2025-08-07 01:59:19,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:59:35,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 781.32959 ± 115.116
2025-08-07 01:59:35,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [761.5846, 712.7379, 641.3861, 1007.72815, 927.32806, 791.1634, 760.6763, 602.4144, 775.06226, 833.21497]
2025-08-07 01:59:35,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:59:35,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (781.33) for latency ExtremeSparseL4U32
2025-08-07 01:59:35,159 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 12 minutes, 21 seconds)
2025-08-07 02:01:21,495 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:01:37,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 871.03320 ± 113.168
2025-08-07 02:01:37,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [770.53143, 1028.7961, 1013.4357, 797.0315, 762.5878, 978.18536, 934.5494, 803.8931, 691.80634, 929.51526]
2025-08-07 02:01:37,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:01:37,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (871.03) for latency ExtremeSparseL4U32
2025-08-07 02:01:37,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 10 minutes, 22 seconds)
2025-08-07 02:03:23,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:03:39,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 764.95703 ± 118.924
2025-08-07 02:03:39,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [791.52673, 885.0946, 928.959, 561.02924, 901.9102, 733.19073, 841.2047, 689.29553, 702.91595, 614.4431]
2025-08-07 02:03:39,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:03:39,514 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 8 minutes, 17 seconds)
2025-08-07 02:05:25,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:05:41,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 694.24841 ± 108.980
2025-08-07 02:05:41,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [595.0198, 736.5593, 662.6518, 605.62, 744.36633, 801.90375, 579.2092, 946.39, 620.6389, 650.12537]
2025-08-07 02:05:41,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:05:41,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 6 minutes, 11 seconds)
2025-08-07 02:07:27,729 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:07:43,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 757.03613 ± 107.591
2025-08-07 02:07:43,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [669.0976, 825.038, 741.59546, 758.20044, 629.7203, 938.3666, 638.3464, 655.41077, 926.4023, 788.18353]
2025-08-07 02:07:43,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:07:43,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 4 minutes, 6 seconds)
2025-08-07 02:09:29,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:09:45,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 838.46826 ± 135.910
2025-08-07 02:09:45,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [699.80927, 1046.2653, 886.974, 703.47986, 773.5789, 1016.1368, 783.151, 816.09106, 1006.00726, 653.18866]
2025-08-07 02:09:45,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:09:45,678 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 2 minutes, 6 seconds)
2025-08-07 02:11:31,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:11:47,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1027.23608 ± 149.947
2025-08-07 02:11:47,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1267.1957, 1170.8479, 846.2442, 919.2511, 973.61926, 903.90173, 1129.2654, 1174.8109, 1081.3749, 805.84924]
2025-08-07 02:11:47,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:11:47,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (1027.24) for latency ExtremeSparseL4U32
2025-08-07 02:11:47,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 1 second)
2025-08-07 02:13:34,015 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:13:49,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 840.03711 ± 117.806
2025-08-07 02:13:49,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [729.0592, 815.19794, 932.4292, 773.3976, 987.69574, 733.33453, 851.62024, 1079.6843, 696.3683, 801.58374]
2025-08-07 02:13:49,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:13:49,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 57 minutes, 59 seconds)
2025-08-07 02:15:36,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:15:51,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 752.30792 ± 146.563
2025-08-07 02:15:51,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [673.77167, 899.26526, 765.65106, 746.2909, 845.3663, 1020.85754, 764.2658, 759.33734, 494.70792, 553.5653]
2025-08-07 02:15:51,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:15:51,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 55 minutes, 58 seconds)
2025-08-07 02:17:38,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:17:54,012 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 917.02087 ± 149.086
2025-08-07 02:17:54,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [904.4529, 957.2927, 1060.8134, 928.49414, 556.2687, 734.6362, 1034.4597, 1003.1616, 957.8914, 1032.7379]
2025-08-07 02:17:54,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:17:54,018 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 53 minutes, 57 seconds)
2025-08-07 02:19:40,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:19:56,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 909.97540 ± 164.377
2025-08-07 02:19:56,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1115.4097, 836.404, 818.56354, 1011.2003, 724.1379, 553.53955, 1041.3585, 1004.1852, 1019.2488, 975.70654]
2025-08-07 02:19:56,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:19:56,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 51 minutes, 56 seconds)
2025-08-07 02:21:42,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:21:58,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1013.79767 ± 123.368
2025-08-07 02:21:58,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1140.9065, 1022.6222, 1096.3447, 990.6654, 1191.3317, 846.3978, 1049.7566, 765.11304, 963.0656, 1071.7721]
2025-08-07 02:21:58,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:21:58,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 49 minutes, 55 seconds)
2025-08-07 02:23:44,752 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:24:00,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1030.09985 ± 127.146
2025-08-07 02:24:00,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [853.44324, 1097.7845, 1161.4635, 901.7042, 824.0874, 1007.9588, 1183.4958, 1168.4166, 1104.406, 998.23804]
2025-08-07 02:24:00,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:24:00,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (1030.10) for latency ExtremeSparseL4U32
2025-08-07 02:24:00,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 47 minutes, 54 seconds)
2025-08-07 02:25:46,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:26:02,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1054.30859 ± 141.298
2025-08-07 02:26:02,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1153.8462, 998.1097, 846.55695, 1089.8442, 837.26166, 1310.7384, 1132.4806, 1000.9439, 1182.1628, 991.142]
2025-08-07 02:26:02,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:26:02,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (1054.31) for latency ExtremeSparseL4U32
2025-08-07 02:26:02,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 45 minutes, 53 seconds)
2025-08-07 02:27:49,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:28:04,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1075.81250 ± 159.493
2025-08-07 02:28:04,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1187.272, 1045.0652, 1398.0211, 816.9796, 1138.2772, 906.92224, 1118.0768, 1026.9685, 1188.9523, 931.5902]
2025-08-07 02:28:04,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:28:04,901 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (1075.81) for latency ExtremeSparseL4U32
2025-08-07 02:28:04,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 43 minutes, 51 seconds)
2025-08-07 02:29:51,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:30:06,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 821.93274 ± 152.940
2025-08-07 02:30:06,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [700.3168, 876.81433, 971.4275, 888.60583, 447.50455, 1001.1992, 895.0878, 871.719, 838.8471, 727.80505]
2025-08-07 02:30:06,885 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:30:06,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 41 minutes, 46 seconds)
2025-08-07 02:31:52,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:32:07,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1090.31445 ± 74.822
2025-08-07 02:32:07,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1116.3324, 1144.2557, 983.1315, 1048.1576, 1054.4781, 972.0388, 1223.7213, 1073.59, 1149.92, 1137.5198]
2025-08-07 02:32:07,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:32:07,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (1090.31) for latency ExtremeSparseL4U32
2025-08-07 02:32:07,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 39 minutes, 32 seconds)
2025-08-07 02:33:52,945 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:34:08,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1005.56281 ± 120.352
2025-08-07 02:34:08,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1046.4042, 923.10156, 1188.9535, 1098.9429, 906.72723, 1081.4824, 729.6795, 1052.822, 1010.9751, 1016.54047]
2025-08-07 02:34:08,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:34:08,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 37 minutes, 18 seconds)
2025-08-07 02:35:53,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:36:08,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1000.97595 ± 154.963
2025-08-07 02:36:08,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [911.60144, 1123.5251, 958.1571, 675.5196, 1106.949, 1007.6193, 1106.4296, 1089.8406, 816.70764, 1213.41]
2025-08-07 02:36:08,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:36:08,999 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 34 minutes, 58 seconds)
2025-08-07 02:37:53,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:38:08,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1166.40674 ± 178.982
2025-08-07 02:38:08,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1455.065, 1094.097, 1203.8733, 1343.4568, 887.07855, 880.9331, 1250.4783, 1127.8655, 1326.9346, 1094.2854]
2025-08-07 02:38:08,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:38:08,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (1166.41) for latency ExtremeSparseL4U32
2025-08-07 02:38:08,691 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 32 minutes, 34 seconds)
2025-08-07 02:39:52,930 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:40:08,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1078.42908 ± 159.362
2025-08-07 02:40:08,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1160.2704, 985.1326, 905.34753, 1191.6818, 775.18164, 1282.755, 1222.7094, 1252.2814, 997.96063, 1010.97034]
2025-08-07 02:40:08,530 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:40:08,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 30 minutes, 14 seconds)
2025-08-07 02:41:52,749 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:42:08,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 794.06744 ± 195.392
2025-08-07 02:42:08,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [829.8097, 641.51245, 1086.9779, 741.8041, 756.7343, 350.23917, 746.39, 912.3597, 1017.90967, 856.93744]
2025-08-07 02:42:08,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:42:08,344 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 28 minutes, 4 seconds)
2025-08-07 02:43:52,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:44:08,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1009.04236 ± 108.807
2025-08-07 02:44:08,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [891.5026, 1103.3448, 1049.2567, 835.21954, 1033.9587, 1243.315, 1016.3395, 923.94635, 1016.1119, 977.4288]
2025-08-07 02:44:08,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:44:08,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 25 minutes, 54 seconds)
2025-08-07 02:45:52,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:46:08,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1068.47546 ± 154.067
2025-08-07 02:46:08,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1115.4576, 1086.5695, 1339.7577, 1172.34, 1020.7381, 1177.8177, 1074.8099, 1073.0066, 860.7519, 763.50586]
2025-08-07 02:46:08,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:46:08,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 23 minutes, 52 seconds)
2025-08-07 02:47:52,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:48:07,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1155.00122 ± 133.737
2025-08-07 02:48:07,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1089.1549, 1359.2273, 1067.7277, 1181.8721, 1369.3253, 948.6472, 1014.57117, 1258.0886, 1083.6548, 1177.7418]
2025-08-07 02:48:07,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:48:07,918 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 21 minutes, 53 seconds)
2025-08-07 02:49:52,237 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:50:07,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1150.67261 ± 155.736
2025-08-07 02:50:07,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1379.314, 1118.6786, 925.2884, 1324.57, 1078.6365, 1273.462, 949.9453, 1270.2632, 977.63196, 1208.9363]
2025-08-07 02:50:07,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:50:07,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 19 minutes, 54 seconds)
2025-08-07 02:51:52,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:52:07,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1025.08118 ± 208.419
2025-08-07 02:52:07,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1250.546, 1186.373, 940.42175, 1113.1309, 484.57596, 1060.8083, 929.2294, 1181.2153, 1136.3743, 968.137]
2025-08-07 02:52:07,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:52:07,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 17 minutes, 55 seconds)
2025-08-07 02:53:52,052 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:54:07,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1134.14453 ± 160.000
2025-08-07 02:54:07,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1052.8673, 1466.076, 1156.8414, 1053.5918, 1240.3477, 1343.157, 1039.8264, 1080.6537, 936.4876, 971.5964]
2025-08-07 02:54:07,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:54:07,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 15 minutes, 56 seconds)
2025-08-07 02:55:51,800 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:56:07,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1219.58813 ± 113.169
2025-08-07 02:56:07,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1136.005, 1197.9216, 1307.9946, 1253.4694, 1236.1794, 1180.008, 1157.7117, 1232.5012, 1475.4181, 1018.6728]
2025-08-07 02:56:07,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:56:07,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (1219.59) for latency ExtremeSparseL4U32
2025-08-07 02:56:07,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 13 minutes, 54 seconds)
2025-08-07 02:57:51,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:58:06,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1139.30762 ± 181.540
2025-08-07 02:58:06,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [956.46075, 1123.7554, 758.84314, 1150.1368, 1407.1132, 1191.9978, 1070.4508, 1376.0004, 1258.488, 1099.8306]
2025-08-07 02:58:06,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:58:06,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 11 minutes, 52 seconds)
2025-08-07 02:59:50,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:00:05,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1187.85950 ± 146.330
2025-08-07 03:00:05,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1144.6722, 1294.2981, 1160.3959, 1217.9138, 1518.3112, 1044.1333, 1199.8318, 1062.8451, 1266.2708, 969.9241]
2025-08-07 03:00:05,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:00:05,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 9 minutes, 45 seconds)
2025-08-07 03:01:49,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:02:04,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1317.90625 ± 118.044
2025-08-07 03:02:04,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1541.5637, 1179.4983, 1260.3623, 1380.244, 1191.3885, 1256.7332, 1323.8861, 1325.6963, 1219.1237, 1500.5652]
2025-08-07 03:02:04,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:02:04,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (1317.91) for latency ExtremeSparseL4U32
2025-08-07 03:02:04,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 7 minutes, 39 seconds)
2025-08-07 03:03:48,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:04:03,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1342.94995 ± 145.977
2025-08-07 03:04:03,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1597.2578, 1298.1254, 1195.765, 1513.962, 1267.9243, 1462.8589, 1104.9517, 1223.1967, 1356.7867, 1408.6704]
2025-08-07 03:04:03,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:04:03,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (1342.95) for latency ExtremeSparseL4U32
2025-08-07 03:04:03,906 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 5 minutes, 35 seconds)
2025-08-07 03:05:47,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:06:02,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1389.53308 ± 152.809
2025-08-07 03:06:02,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1725.8508, 1411.1361, 1219.1332, 1304.796, 1128.4615, 1372.6373, 1378.6283, 1435.894, 1419.1521, 1499.6416]
2025-08-07 03:06:02,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:06:02,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (1389.53) for latency ExtremeSparseL4U32
2025-08-07 03:06:02,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 3 minutes, 31 seconds)
2025-08-07 03:07:46,417 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:08:02,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1323.30835 ± 131.630
2025-08-07 03:08:02,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1597.5685, 1177.9457, 1383.5748, 1211.8225, 1275.2461, 1426.6603, 1355.1411, 1392.9078, 1120.8646, 1291.3516]
2025-08-07 03:08:02,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:08:02,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 1 minute, 30 seconds)
2025-08-07 03:09:45,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:10:01,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1368.31348 ± 94.104
2025-08-07 03:10:01,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1322.4696, 1400.8295, 1540.83, 1511.762, 1397.3013, 1345.022, 1360.6708, 1222.9839, 1305.2424, 1276.0239]
2025-08-07 03:10:01,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:10:01,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 59 minutes, 32 seconds)
2025-08-07 03:11:44,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:12:00,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1503.82312 ± 164.216
2025-08-07 03:12:00,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1766.0667, 1386.8516, 1437.3538, 1448.5835, 1817.3354, 1437.6327, 1557.1918, 1515.3096, 1241.7914, 1430.1155]
2025-08-07 03:12:00,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:12:00,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (1503.82) for latency ExtremeSparseL4U32
2025-08-07 03:12:00,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 57 minutes, 34 seconds)
2025-08-07 03:13:43,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:13:59,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1323.26550 ± 91.466
2025-08-07 03:13:59,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1174.2365, 1417.6326, 1436.5636, 1406.4932, 1358.0245, 1259.544, 1178.9453, 1317.8143, 1396.3644, 1287.0371]
2025-08-07 03:13:59,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:13:59,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 55 minutes, 34 seconds)
2025-08-07 03:15:42,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:15:58,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1605.25171 ± 131.272
2025-08-07 03:15:58,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1552.2047, 1772.0781, 1516.1709, 1447.5034, 1447.6545, 1857.8577, 1706.2911, 1650.7192, 1583.059, 1518.9785]
2025-08-07 03:15:58,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:15:58,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (1605.25) for latency ExtremeSparseL4U32
2025-08-07 03:15:58,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 53 minutes, 35 seconds)
2025-08-07 03:17:41,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:17:57,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1201.78479 ± 134.024
2025-08-07 03:17:57,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1097.3943, 1216.0494, 1338.1414, 1116.0641, 987.6013, 1038.4462, 1393.2009, 1381.4188, 1219.1578, 1230.3726]
2025-08-07 03:17:57,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:17:57,312 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 51 minutes, 35 seconds)
2025-08-07 03:19:40,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:19:56,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1414.07080 ± 140.806
2025-08-07 03:19:56,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1469.9641, 1551.707, 1429.2214, 1525.5752, 1048.4309, 1521.2432, 1456.4299, 1471.9077, 1340.4587, 1325.7692]
2025-08-07 03:19:56,347 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:19:56,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 49 minutes, 35 seconds)
2025-08-07 03:21:39,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:21:55,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1535.33972 ± 133.669
2025-08-07 03:21:55,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1410.7894, 1504.4419, 1581.0624, 1568.8434, 1515.3289, 1814.5175, 1580.0087, 1568.5768, 1552.4482, 1257.3812]
2025-08-07 03:21:55,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:21:55,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 47 minutes, 36 seconds)
2025-08-07 03:23:38,928 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:23:54,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1460.58411 ± 138.560
2025-08-07 03:23:54,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1416.3114, 1252.0564, 1462.2227, 1499.3699, 1477.3956, 1527.7107, 1742.8531, 1539.2346, 1227.3755, 1461.3119]
2025-08-07 03:23:54,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:23:54,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 45 minutes, 37 seconds)
2025-08-07 03:25:37,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:25:53,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1516.17114 ± 118.314
2025-08-07 03:25:53,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1701.3838, 1608.4333, 1421.999, 1516.1625, 1653.8427, 1491.915, 1424.2887, 1511.8932, 1557.6454, 1274.1484]
2025-08-07 03:25:53,379 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:25:53,388 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 43 minutes, 37 seconds)
2025-08-07 03:27:36,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:27:52,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1412.02026 ± 83.908
2025-08-07 03:27:52,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1396.7218, 1408.717, 1400.5032, 1528.7091, 1366.1641, 1231.2574, 1497.1606, 1522.2146, 1365.4938, 1403.2609]
2025-08-07 03:27:52,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:27:52,508 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 41 minutes, 39 seconds)
2025-08-07 03:29:36,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:29:51,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1668.28833 ± 120.020
2025-08-07 03:29:51,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1576.7814, 1654.1648, 1688.355, 1871.7576, 1746.6239, 1747.6084, 1539.1501, 1453.5223, 1792.2048, 1612.7146]
2025-08-07 03:29:51,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:29:51,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (1668.29) for latency ExtremeSparseL4U32
2025-08-07 03:29:51,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 39 minutes, 40 seconds)
2025-08-07 03:31:35,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:31:50,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1578.33667 ± 133.088
2025-08-07 03:31:50,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1554.4702, 1667.96, 1284.0613, 1543.8578, 1547.2297, 1675.7467, 1644.763, 1740.9973, 1422.7092, 1701.5721]
2025-08-07 03:31:50,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:31:50,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 37 minutes, 41 seconds)
2025-08-07 03:33:34,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:33:49,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1585.87964 ± 122.646
2025-08-07 03:33:49,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1868.9113, 1570.837, 1427.0532, 1420.4207, 1544.639, 1601.9015, 1584.4354, 1541.2573, 1701.609, 1597.7325]
2025-08-07 03:33:49,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:33:49,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 35 minutes, 43 seconds)
2025-08-07 03:35:33,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:35:48,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1531.81348 ± 199.566
2025-08-07 03:35:48,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1441.1051, 1648.527, 1229.2258, 1686.5776, 1514.8304, 1193.8569, 1584.4878, 1551.3093, 1558.8916, 1909.3235]
2025-08-07 03:35:48,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:35:48,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 33 minutes, 44 seconds)
2025-08-07 03:37:32,176 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:37:47,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1583.05859 ± 161.636
2025-08-07 03:37:47,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1710.8551, 1540.0978, 1639.3145, 1390.397, 1640.4647, 1332.973, 1414.6217, 1608.285, 1648.5432, 1905.0343]
2025-08-07 03:37:47,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:37:47,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 31 minutes, 44 seconds)
2025-08-07 03:39:31,993 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:39:47,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1653.76978 ± 152.692
2025-08-07 03:39:47,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1834.3353, 1477.1375, 1578.372, 1743.1047, 1731.9619, 1802.5094, 1413.5935, 1552.3064, 1536.6981, 1867.6793]
2025-08-07 03:39:47,502 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:39:47,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 29 minutes, 47 seconds)
2025-08-07 03:41:31,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:41:47,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1418.33984 ± 101.061
2025-08-07 03:41:47,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1370.7761, 1545.8069, 1421.3358, 1246.8574, 1466.8926, 1458.5011, 1382.4863, 1593.7223, 1283.543, 1413.4762]
2025-08-07 03:41:47,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:41:47,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 27 minutes, 51 seconds)
2025-08-07 03:43:31,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:43:47,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1399.81384 ± 179.824
2025-08-07 03:43:47,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1631.3939, 1149.4221, 1382.4445, 1310.7925, 1481.9667, 1550.7937, 1720.1736, 1267.6426, 1200.1311, 1303.3789]
2025-08-07 03:43:47,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:43:47,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 25 minutes, 53 seconds)
2025-08-07 03:45:32,128 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:45:47,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1797.50122 ± 147.738
2025-08-07 03:45:47,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1937.2753, 1708.9572, 1819.3356, 1596.1647, 1996.5381, 1742.025, 1815.6373, 2052.3552, 1666.6, 1640.1241]
2025-08-07 03:45:47,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:45:47,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (1797.50) for latency ExtremeSparseL4U32
2025-08-07 03:45:47,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 23 minutes, 57 seconds)
2025-08-07 03:47:32,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:47:47,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1893.40503 ± 90.501
2025-08-07 03:47:47,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2011.4612, 1904.1825, 1886.123, 1772.9303, 1967.7262, 1754.7859, 1905.223, 2011.6963, 1781.9375, 1937.9847]
2025-08-07 03:47:47,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:47:47,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (1893.41) for latency ExtremeSparseL4U32
2025-08-07 03:47:47,855 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 22 minutes)
2025-08-07 03:49:32,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:49:48,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1841.79395 ± 93.084
2025-08-07 03:49:48,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1908.7411, 1837.1965, 1978.3663, 1806.646, 1737.531, 1784.9856, 1935.5511, 1961.3802, 1763.1732, 1704.367]
2025-08-07 03:49:48,226 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:49:48,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 20 minutes, 1 second)
2025-08-07 03:51:32,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:51:48,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1821.92847 ± 118.078
2025-08-07 03:51:48,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1949.7367, 1798.7522, 1728.0195, 1875.832, 1786.8575, 1916.6392, 1635.5966, 2050.123, 1734.6498, 1743.0789]
2025-08-07 03:51:48,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:51:48,455 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 18 minutes, 1 second)
2025-08-07 03:53:33,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:53:48,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1846.42419 ± 98.384
2025-08-07 03:53:48,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1809.8901, 1691.2667, 1895.867, 1972.7793, 1897.2594, 1966.1243, 1708.9159, 1917.423, 1866.5052, 1738.2108]
2025-08-07 03:53:48,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:53:48,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 16 minutes, 2 seconds)
2025-08-07 03:55:33,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:55:49,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1781.01855 ± 103.699
2025-08-07 03:55:49,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1746.4658, 1702.317, 1995.2274, 1679.7471, 1871.0731, 1714.0458, 1794.0721, 1710.5853, 1911.6412, 1685.0099]
2025-08-07 03:55:49,070 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:55:49,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 14 minutes, 1 second)
2025-08-07 03:57:33,862 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:57:49,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1858.79419 ± 157.566
2025-08-07 03:57:49,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1914.7767, 1922.6816, 1788.7148, 1663.3049, 2109.8289, 1826.117, 2031.9033, 2005.0182, 1609.5736, 1716.0225]
2025-08-07 03:57:49,503 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:57:49,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 12 minutes, 1 second)
2025-08-07 03:59:34,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:59:49,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1880.83826 ± 80.477
2025-08-07 03:59:49,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1932.4833, 1965.2051, 1951.4022, 1874.4761, 2016.7024, 1752.4801, 1868.9879, 1799.6205, 1852.8053, 1794.2198]
2025-08-07 03:59:49,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:59:49,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 10 minutes, 1 second)
2025-08-07 04:01:34,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:01:49,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1759.61548 ± 170.160
2025-08-07 04:01:49,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1800.5039, 2134.159, 1705.8588, 1607.4805, 1460.5903, 1735.6173, 1749.0554, 1685.0055, 1914.4031, 1803.4799]
2025-08-07 04:01:49,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:01:49,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes, 1 second)
2025-08-07 04:03:34,453 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:03:50,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1847.04102 ± 113.858
2025-08-07 04:03:50,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2007.9653, 1646.1587, 1738.0591, 1801.1033, 1861.8325, 1958.591, 1852.7588, 1902.297, 1980.4808, 1721.1636]
2025-08-07 04:03:50,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:03:50,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes)
2025-08-07 04:05:34,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:05:50,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1920.18970 ± 126.278
2025-08-07 04:05:50,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1847.082, 1979.2737, 1703.8792, 1900.0898, 2093.7803, 1889.2418, 2044.9402, 1730.591, 1952.0625, 2060.9546]
2025-08-07 04:05:50,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:05:50,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1226 [INFO]: New best (1920.19) for latency ExtremeSparseL4U32
2025-08-07 04:05:50,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes)
2025-08-07 04:07:34,912 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:07:50,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1788.16052 ± 167.823
2025-08-07 04:07:50,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1766.2852, 1437.4427, 1822.315, 1569.8651, 1867.778, 1792.2054, 1744.4718, 2007.3136, 1870.7603, 2003.1682]
2025-08-07 04:07:50,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:07:50,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes)
2025-08-07 04:09:35,153 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:09:50,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1794.87378 ± 85.924
2025-08-07 04:09:50,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1633.3483, 1769.7968, 1800.724, 1899.58, 1807.7001, 1844.4956, 1743.1881, 1934.3062, 1823.4675, 1692.1326]
2025-08-07 04:09:50,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:09:50,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc20-halfcheetah):1251 [DEBUG]: Training session finished
