2025-08-07 00:48:20,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc15-halfcheetah/ExtremeSparseL4U32-bpql-mem32
2025-08-07 00:48:20,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc15-halfcheetah/ExtremeSparseL4U32-bpql-mem32
2025-08-07 00:48:20,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x150c4a403e90>}
2025-08-07 00:48:20,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1111 [DEBUG]: using device: cuda
2025-08-07 00:48:20,469 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1133 [INFO]: Creating new trainer
2025-08-07 00:48:20,487 baseline-bpql-noiseperc15-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=209, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 00:48:20,487 baseline-bpql-noiseperc15-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 00:48:33,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1194 [DEBUG]: Starting training session...
2025-08-07 00:48:33,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 1/100
2025-08-07 00:50:18,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:50:33,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: -297.53864 ± 26.509
2025-08-07 00:50:33,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [-302.55728, -298.55267, -247.28824, -298.63553, -307.48743, -284.63217, -258.41873, -336.48868, -327.44928, -313.8764]
2025-08-07 00:50:33,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 00:50:33,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (-297.54) for latency ExtremeSparseL4U32
2025-08-07 00:50:33,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 18 minutes, 11 seconds)
2025-08-07 00:52:23,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:52:40,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: -242.74146 ± 45.549
2025-08-07 00:52:40,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [-199.4354, -222.55121, -337.6974, -290.5072, -262.61328, -226.95428, -180.16321, -252.69658, -259.3003, -195.49554]
2025-08-07 00:52:40,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 00:52:40,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (-242.74) for latency ExtremeSparseL4U32
2025-08-07 00:52:40,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 21 minutes, 30 seconds)
2025-08-07 00:54:32,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:54:47,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: -190.54333 ± 64.735
2025-08-07 00:54:47,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [-195.48636, -104.71494, -162.05779, -175.38913, -130.22679, -93.786354, -263.7166, -252.57236, -280.41895, -247.06404]
2025-08-07 00:54:47,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 00:54:47,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (-190.54) for latency ExtremeSparseL4U32
2025-08-07 00:54:47,886 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 21 minutes, 36 seconds)
2025-08-07 00:56:36,868 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:56:52,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: -159.77142 ± 123.779
2025-08-07 00:56:52,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [-163.8359, -232.8189, -158.62222, -189.4709, -33.78509, -190.43854, -226.36794, -316.58478, 149.22249, -235.0124]
2025-08-07 00:56:52,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 00:56:52,637 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (-159.77) for latency ExtremeSparseL4U32
2025-08-07 00:56:52,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 19 minutes, 32 seconds)
2025-08-07 00:58:41,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:58:57,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: -220.94370 ± 64.235
2025-08-07 00:58:57,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [-231.6713, -321.6522, -213.81601, -273.4356, -135.67363, -285.04092, -123.911705, -146.32568, -214.59956, -263.31046]
2025-08-07 00:58:57,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 00:58:57,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 17 minutes, 23 seconds)
2025-08-07 01:00:46,892 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:01:02,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: -102.87093 ± 76.180
2025-08-07 01:01:02,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [35.885014, -23.256166, -206.19075, -205.06343, -172.7709, -50.669933, -71.69663, -150.45227, -103.44495, -81.04929]
2025-08-07 01:01:02,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:01:02,750 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (-102.87) for latency ExtremeSparseL4U32
2025-08-07 01:01:02,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 17 minutes, 2 seconds)
2025-08-07 01:02:52,124 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:03:07,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: -13.49128 ± 50.047
2025-08-07 01:03:07,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [-23.748238, -78.17137, 19.14524, -64.46067, -50.418026, 105.99189, -26.149641, 13.989387, 0.7652576, -31.856632]
2025-08-07 01:03:07,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:03:07,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (-13.49) for latency ExtremeSparseL4U32
2025-08-07 01:03:07,809 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 14 minutes, 27 seconds)
2025-08-07 01:04:57,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:05:13,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 105.38019 ± 72.521
2025-08-07 01:05:13,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [132.49509, 63.80712, 227.6378, 103.823, 103.37776, -29.802666, 76.9643, 32.997314, 204.24855, 138.25359]
2025-08-07 01:05:13,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:05:13,269 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (105.38) for latency ExtremeSparseL4U32
2025-08-07 01:05:13,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 11 minutes, 47 seconds)
2025-08-07 01:07:02,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:07:18,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 369.91974 ± 195.449
2025-08-07 01:07:18,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [553.49695, 315.81668, 541.51215, 442.98944, 295.3635, 500.95673, 406.67844, 574.2953, -63.72763, 131.81596]
2025-08-07 01:07:18,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:07:18,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (369.92) for latency ExtremeSparseL4U32
2025-08-07 01:07:18,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 3 hours, 9 minutes, 50 seconds)
2025-08-07 01:09:10,102 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:09:25,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 287.71442 ± 227.293
2025-08-07 01:09:25,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [688.6138, 77.910164, 228.32341, 153.75748, 279.84686, 86.374535, 148.20459, 635.7388, 55.497692, 522.8768]
2025-08-07 01:09:25,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:09:25,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 3 hours, 8 minutes, 36 seconds)
2025-08-07 01:11:14,897 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:11:30,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 293.25446 ± 173.596
2025-08-07 01:11:30,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [86.23819, 167.03052, 114.18803, 399.63583, 186.00513, 671.4024, 316.31158, 213.38754, 290.73978, 487.60568]
2025-08-07 01:11:30,702 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:11:30,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 3 hours, 6 minutes, 17 seconds)
2025-08-07 01:13:20,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:13:36,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 418.87061 ± 202.699
2025-08-07 01:13:36,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [628.44165, 613.2647, 194.73001, 296.24078, 489.86914, 203.93103, 674.62427, 316.6921, 126.07255, 644.8396]
2025-08-07 01:13:36,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:13:36,710 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (418.87) for latency ExtremeSparseL4U32
2025-08-07 01:13:36,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 3 hours, 4 minutes, 28 seconds)
2025-08-07 01:15:26,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:15:42,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 593.72412 ± 134.728
2025-08-07 01:15:42,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [652.2373, 642.85895, 675.96265, 341.31873, 677.64215, 502.5104, 761.4396, 592.3828, 714.33984, 376.54865]
2025-08-07 01:15:42,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:15:42,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (593.72) for latency ExtremeSparseL4U32
2025-08-07 01:15:42,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 3 hours, 2 minutes, 27 seconds)
2025-08-07 01:17:32,740 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:17:48,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 269.13007 ± 263.455
2025-08-07 01:17:48,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [241.57005, -305.5592, 786.88153, 278.72495, 420.19565, 167.9604, 247.61324, 99.08864, 444.55667, 310.26855]
2025-08-07 01:17:48,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:17:48,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 3 hours, 35 seconds)
2025-08-07 01:19:36,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:19:52,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 672.47009 ± 210.783
2025-08-07 01:19:52,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [800.4933, 812.0621, 709.1862, 297.20715, 735.6467, 225.4821, 868.44763, 747.68506, 792.08044, 736.4101]
2025-08-07 01:19:52,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:19:52,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (672.47) for latency ExtremeSparseL4U32
2025-08-07 01:19:52,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 57 minutes, 34 seconds)
2025-08-07 01:21:42,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:21:59,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 581.61627 ± 199.548
2025-08-07 01:21:59,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [530.6984, 378.9029, 740.9754, 264.1043, 666.0309, 658.32306, 407.08392, 901.53467, 438.11887, 830.3902]
2025-08-07 01:21:59,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:21:59,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 56 minutes, 10 seconds)
2025-08-07 01:23:51,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:24:07,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 578.50842 ± 238.767
2025-08-07 01:24:07,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [269.79575, 824.0046, 812.37506, 237.82884, 273.7837, 654.1946, 468.80984, 778.8387, 581.9739, 883.47955]
2025-08-07 01:24:07,146 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:24:07,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 54 minutes, 25 seconds)
2025-08-07 01:25:56,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:26:12,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 576.62000 ± 262.087
2025-08-07 01:26:12,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [339.99878, 383.86816, 188.00703, 957.8237, 428.7432, 665.3264, 915.79535, 937.2496, 504.76245, 444.62476]
2025-08-07 01:26:12,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:26:12,329 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 52 minutes, 9 seconds)
2025-08-07 01:28:02,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:28:17,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 654.95740 ± 136.084
2025-08-07 01:28:17,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [539.0574, 457.7383, 602.90643, 734.23193, 882.09705, 857.8379, 575.77814, 660.0993, 514.17706, 725.6503]
2025-08-07 01:28:17,907 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:28:17,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 49 minutes, 56 seconds)
2025-08-07 01:30:09,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:30:24,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 820.48181 ± 198.455
2025-08-07 01:30:24,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [890.85736, 898.62695, 863.3931, 792.2365, 875.97864, 239.06018, 854.4921, 950.97314, 895.3106, 943.8895]
2025-08-07 01:30:24,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:30:24,995 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (820.48) for latency ExtremeSparseL4U32
2025-08-07 01:30:25,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 48 minutes, 39 seconds)
2025-08-07 01:32:15,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:32:31,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 719.00586 ± 179.224
2025-08-07 01:32:31,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [747.94696, 230.87347, 838.85345, 664.88336, 832.1156, 843.44885, 822.09247, 607.6675, 797.26953, 804.9072]
2025-08-07 01:32:31,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:32:31,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 46 minutes, 18 seconds)
2025-08-07 01:34:21,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:34:39,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 755.03888 ± 137.909
2025-08-07 01:34:39,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [799.81104, 688.3472, 690.71826, 723.59924, 573.57214, 497.81894, 925.91144, 873.433, 906.378, 870.79987]
2025-08-07 01:34:39,049 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:34:39,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 44 minutes, 17 seconds)
2025-08-07 01:36:29,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:36:45,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 820.16833 ± 126.228
2025-08-07 01:36:45,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [909.6498, 466.26535, 841.7946, 861.4893, 908.0621, 862.74493, 815.24854, 907.64844, 871.9415, 756.83936]
2025-08-07 01:36:45,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:36:45,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 42 minutes, 28 seconds)
2025-08-07 01:38:35,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:38:51,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 838.69275 ± 177.387
2025-08-07 01:38:51,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1044.4257, 884.8214, 892.18066, 965.49115, 691.6014, 887.3046, 869.1209, 875.4103, 369.3329, 907.23816]
2025-08-07 01:38:51,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:38:51,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (838.69) for latency ExtremeSparseL4U32
2025-08-07 01:38:51,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 40 minutes, 32 seconds)
2025-08-07 01:40:42,338 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:40:58,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 677.44476 ± 349.768
2025-08-07 01:40:58,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [647.09235, 640.8899, 955.9575, 491.95996, 847.7605, -264.55713, 974.441, 931.2961, 671.9399, 877.668]
2025-08-07 01:40:58,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:40:58,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 38 minutes, 16 seconds)
2025-08-07 01:42:48,513 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:43:04,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 789.54675 ± 125.980
2025-08-07 01:43:04,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [682.46027, 614.6085, 873.1825, 848.97534, 892.4812, 877.0225, 625.72925, 1004.5023, 807.388, 669.1174]
2025-08-07 01:43:04,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:43:04,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 36 minutes, 4 seconds)
2025-08-07 01:44:54,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:45:10,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 879.34833 ± 73.232
2025-08-07 01:45:10,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [825.24243, 829.34973, 945.3851, 937.39056, 995.7698, 922.3966, 735.78015, 825.5809, 914.7708, 861.8177]
2025-08-07 01:45:10,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:45:10,256 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (879.35) for latency ExtremeSparseL4U32
2025-08-07 01:45:10,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 33 minutes, 35 seconds)
2025-08-07 01:47:01,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:47:16,822 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 836.81024 ± 112.722
2025-08-07 01:47:16,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [821.8654, 851.6462, 661.77057, 754.275, 920.0404, 954.43896, 870.4264, 947.7127, 952.85394, 633.0728]
2025-08-07 01:47:16,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:47:16,833 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 31 minutes, 33 seconds)
2025-08-07 01:49:06,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:49:22,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 703.12537 ± 186.846
2025-08-07 01:49:22,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [593.69696, 574.3777, 413.95703, 478.04282, 826.6061, 902.9461, 874.1388, 941.7406, 862.43005, 563.3174]
2025-08-07 01:49:22,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:49:22,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 29 minutes, 15 seconds)
2025-08-07 01:51:12,322 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:51:28,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 831.10919 ± 282.107
2025-08-07 01:51:28,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [801.74896, 1067.5579, 819.8655, 1014.39716, 824.1859, 922.7474, 991.1131, 25.754114, 977.70154, 866.02]
2025-08-07 01:51:28,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:51:28,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 26 minutes, 58 seconds)
2025-08-07 01:53:17,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:53:33,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 879.58203 ± 110.668
2025-08-07 01:53:33,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [795.66614, 856.154, 662.06573, 765.38464, 917.88403, 914.5195, 946.00183, 923.84467, 1087.5316, 926.7677]
2025-08-07 01:53:33,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:53:33,357 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (879.58) for latency ExtremeSparseL4U32
2025-08-07 01:53:33,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 24 minutes, 42 seconds)
2025-08-07 01:55:24,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:55:41,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 949.00745 ± 88.581
2025-08-07 01:55:41,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1136.3759, 939.34827, 832.18036, 951.31946, 880.6247, 911.47473, 888.7417, 947.0712, 1086.2834, 916.65454]
2025-08-07 01:55:41,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:55:41,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (949.01) for latency ExtremeSparseL4U32
2025-08-07 01:55:41,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 23 minutes, 6 seconds)
2025-08-07 01:57:32,647 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:57:48,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 749.48608 ± 164.581
2025-08-07 01:57:48,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [675.8276, 886.7429, 862.90936, 520.3087, 644.56537, 597.0323, 964.3252, 533.5475, 959.27264, 850.3298]
2025-08-07 01:57:48,430 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:57:48,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 21 minutes, 3 seconds)
2025-08-07 01:59:39,774 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:59:55,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 920.02100 ± 114.837
2025-08-07 01:59:55,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1065.3885, 717.4569, 1005.8547, 1025.4674, 886.10376, 988.61804, 887.0155, 897.71783, 999.9449, 726.64276]
2025-08-07 01:59:55,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:59:55,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 19 minutes, 18 seconds)
2025-08-07 02:01:44,533 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:02:00,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 888.38806 ± 64.591
2025-08-07 02:02:00,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [992.1812, 926.89185, 838.3266, 877.13525, 924.63525, 820.6169, 924.5336, 789.84033, 964.9407, 824.7795]
2025-08-07 02:02:00,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:02:00,208 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 16 minutes, 58 seconds)
2025-08-07 02:03:50,253 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:04:05,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 783.66638 ± 177.144
2025-08-07 02:04:05,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [921.5358, 810.5024, 892.0695, 799.87305, 1012.7183, 384.26337, 579.6238, 868.0161, 887.2863, 680.775]
2025-08-07 02:04:05,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:04:05,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 14 minutes, 55 seconds)
2025-08-07 02:05:54,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:06:09,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 898.52185 ± 63.421
2025-08-07 02:06:09,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [893.58954, 848.5125, 756.39325, 989.35706, 881.8752, 912.36554, 886.2933, 974.0998, 890.7789, 951.9539]
2025-08-07 02:06:09,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:06:09,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 11 minutes, 51 seconds)
2025-08-07 02:07:57,290 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:08:12,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 831.65106 ± 94.619
2025-08-07 02:08:12,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [749.5198, 903.274, 1002.232, 780.13824, 952.7327, 712.9507, 887.8674, 762.2288, 829.66077, 735.9071]
2025-08-07 02:08:12,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:08:12,742 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 9 minutes, 1 second)
2025-08-07 02:10:00,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:10:16,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 926.08380 ± 63.217
2025-08-07 02:10:16,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [987.061, 1000.871, 888.56165, 923.0533, 947.41327, 793.6348, 1013.5992, 872.3539, 900.82404, 933.4663]
2025-08-07 02:10:16,394 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:10:16,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 6 minutes, 14 seconds)
2025-08-07 02:12:03,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:12:20,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 953.12439 ± 87.907
2025-08-07 02:12:20,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [890.534, 959.6597, 1036.6289, 911.8346, 843.5722, 891.1506, 1017.83356, 1154.9247, 929.76587, 895.3396]
2025-08-07 02:12:20,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:12:20,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (953.12) for latency ExtremeSparseL4U32
2025-08-07 02:12:20,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 4 minutes, 3 seconds)
2025-08-07 02:14:09,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:14:26,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 951.07117 ± 66.673
2025-08-07 02:14:26,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1083.5171, 932.29694, 826.82605, 995.31415, 966.4405, 914.9796, 886.47394, 929.0194, 999.1231, 976.72064]
2025-08-07 02:14:26,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:14:26,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 2 hours, 2 minutes, 3 seconds)
2025-08-07 02:16:15,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:16:32,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1066.70398 ± 149.446
2025-08-07 02:16:32,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1023.57825, 745.4598, 1029.203, 948.6181, 1103.0669, 989.57385, 1230.4141, 1289.3114, 1199.1672, 1108.6473]
2025-08-07 02:16:32,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:16:32,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (1066.70) for latency ExtremeSparseL4U32
2025-08-07 02:16:32,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 2 hours, 24 seconds)
2025-08-07 02:18:21,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:18:37,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 993.40857 ± 113.889
2025-08-07 02:18:37,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [998.1991, 1015.5201, 1108.8485, 944.665, 890.9001, 1029.804, 844.6804, 828.49817, 1060.9796, 1211.9911]
2025-08-07 02:18:37,096 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:18:37,106 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 58 minutes, 37 seconds)
2025-08-07 02:20:25,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:20:40,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 968.28400 ± 116.341
2025-08-07 02:20:40,487 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1137.2954, 809.6168, 980.06006, 1017.7336, 736.2941, 994.71277, 970.36664, 915.32446, 1108.7418, 1012.69385]
2025-08-07 02:20:40,488 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:20:40,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 56 minutes, 29 seconds)
2025-08-07 02:22:27,958 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:22:43,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 945.84326 ± 122.673
2025-08-07 02:22:43,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [940.07635, 840.8705, 1066.5349, 1201.8108, 1025.8485, 777.1125, 863.18835, 830.3094, 1001.49695, 911.18427]
2025-08-07 02:22:43,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:22:43,458 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 54 minutes, 12 seconds)
2025-08-07 02:24:30,525 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:24:45,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1014.78339 ± 105.267
2025-08-07 02:24:45,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [996.0232, 995.20776, 954.6353, 1034.7711, 1043.7708, 1262.4792, 1039.2136, 965.4867, 814.5066, 1041.7395]
2025-08-07 02:24:45,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:24:45,934 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 51 minutes, 30 seconds)
2025-08-07 02:26:34,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:26:49,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1017.41034 ± 205.422
2025-08-07 02:26:49,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [842.7421, 939.4971, 926.4013, 991.4618, 1588.6752, 982.14355, 962.13763, 813.7557, 1068.7805, 1058.5083]
2025-08-07 02:26:49,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:26:49,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 49 minutes, 2 seconds)
2025-08-07 02:28:36,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:28:51,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1067.42517 ± 156.514
2025-08-07 02:28:51,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [911.0824, 904.9382, 925.0939, 1372.4459, 1151.1118, 1248.576, 987.3909, 1152.1526, 1113.1859, 908.27466]
2025-08-07 02:28:51,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:28:51,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (1067.43) for latency ExtremeSparseL4U32
2025-08-07 02:28:51,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 46 minutes, 34 seconds)
2025-08-07 02:30:40,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:30:55,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 932.14612 ± 96.949
2025-08-07 02:30:55,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [828.9389, 856.0172, 1015.55664, 726.07996, 1022.9616, 949.7662, 974.4199, 969.1004, 1059.5819, 919.0379]
2025-08-07 02:30:55,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:30:55,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 44 minutes, 32 seconds)
2025-08-07 02:32:42,038 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:32:57,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1090.31287 ± 188.094
2025-08-07 02:32:57,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1018.5657, 1026.5183, 1106.1052, 1291.3857, 967.7001, 1528.9897, 869.278, 1076.3776, 878.3636, 1139.8448]
2025-08-07 02:32:57,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:32:57,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (1090.31) for latency ExtremeSparseL4U32
2025-08-07 02:32:57,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 42 minutes, 19 seconds)
2025-08-07 02:34:45,802 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:35:01,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 999.89874 ± 74.727
2025-08-07 02:35:01,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1030.5205, 874.907, 892.73517, 1041.3107, 1111.1564, 950.1897, 1039.6318, 987.5641, 1094.5132, 976.45844]
2025-08-07 02:35:01,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:35:01,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 40 minutes, 29 seconds)
2025-08-07 02:36:48,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:37:03,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 885.80487 ± 228.006
2025-08-07 02:37:03,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [864.44916, 1001.35913, 1044.2, 816.61176, 968.4861, 506.37482, 981.5624, 448.64325, 1217.3411, 1009.0214]
2025-08-07 02:37:03,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:37:03,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 38 minutes, 17 seconds)
2025-08-07 02:38:49,705 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:39:05,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1012.22137 ± 159.546
2025-08-07 02:39:05,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1189.4451, 1009.0418, 864.5612, 1027.9353, 1395.9335, 976.7986, 919.34766, 859.55316, 1011.1036, 868.4945]
2025-08-07 02:39:05,021 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:39:05,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 36 minutes, 2 seconds)
2025-08-07 02:40:50,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:41:07,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1121.92163 ± 198.262
2025-08-07 02:41:07,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1477.1737, 1077.7343, 805.83484, 1205.1946, 1175.6632, 1170.2158, 1321.9125, 828.68665, 1185.8373, 970.96344]
2025-08-07 02:41:07,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:41:07,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (1121.92) for latency ExtremeSparseL4U32
2025-08-07 02:41:07,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 33 minutes, 49 seconds)
2025-08-07 02:42:53,476 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:43:08,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1085.96741 ± 260.934
2025-08-07 02:43:08,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [818.8274, 1291.8657, 1403.311, 853.938, 889.66833, 1020.3419, 1648.8928, 890.49005, 1076.9865, 965.35156]
2025-08-07 02:43:08,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:43:08,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 31 minutes, 43 seconds)
2025-08-07 02:44:57,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:45:12,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1101.74829 ± 192.594
2025-08-07 02:45:12,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1312.6017, 1201.2052, 1048.6592, 1315.9802, 921.0454, 1108.5817, 1013.09924, 1002.5875, 721.7136, 1372.0094]
2025-08-07 02:45:12,536 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:45:12,544 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 29 minutes, 39 seconds)
2025-08-07 02:47:01,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:47:17,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1063.43152 ± 195.296
2025-08-07 02:47:17,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1038.4854, 945.8789, 871.63617, 1170.9496, 1077.7354, 965.139, 1041.7222, 1598.7859, 985.42957, 938.55347]
2025-08-07 02:47:17,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:47:17,079 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 27 minutes, 53 seconds)
2025-08-07 02:49:04,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:49:19,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1089.08582 ± 223.218
2025-08-07 02:49:19,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1363.2238, 1318.7317, 1134.2426, 973.80334, 754.29346, 878.1173, 896.7522, 900.7297, 1383.2166, 1287.7472]
2025-08-07 02:49:19,911 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:49:19,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 26 minutes, 5 seconds)
2025-08-07 02:51:06,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:51:23,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1114.58362 ± 156.842
2025-08-07 02:51:23,826 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [965.8524, 1183.7859, 889.2651, 1185.4142, 1180.3546, 1106.0776, 1065.6178, 1483.9995, 973.0567, 1112.412]
2025-08-07 02:51:23,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:51:23,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 24 minutes, 15 seconds)
2025-08-07 02:53:11,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:53:27,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1105.46472 ± 191.399
2025-08-07 02:53:27,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1106.6808, 994.9422, 994.28674, 874.77026, 1062.2792, 1630.691, 1019.6641, 1096.8295, 1111.217, 1163.2861]
2025-08-07 02:53:27,297 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:53:27,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 22 minutes, 27 seconds)
2025-08-07 02:55:13,927 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:55:29,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1088.19666 ± 133.485
2025-08-07 02:55:29,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1024.582, 1045.4292, 1030.3428, 1363.0452, 1059.608, 1266.3856, 878.25934, 980.00195, 1158.3326, 1075.9794]
2025-08-07 02:55:29,295 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:55:29,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 20 minutes, 10 seconds)
2025-08-07 02:57:14,320 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:57:29,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1146.52112 ± 197.446
2025-08-07 02:57:29,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [999.5919, 912.2345, 1378.0607, 1252.8433, 912.14905, 1138.4672, 1356.7379, 1462.138, 924.4073, 1128.581]
2025-08-07 02:57:29,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:57:29,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (1146.52) for latency ExtremeSparseL4U32
2025-08-07 02:57:29,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 17 minutes, 36 seconds)
2025-08-07 02:59:16,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:59:31,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1318.54895 ± 302.384
2025-08-07 02:59:31,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1127.4655, 1868.9209, 1119.372, 898.8896, 1250.7783, 1449.5621, 1689.4673, 953.15155, 1569.0184, 1258.8639]
2025-08-07 02:59:31,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:59:31,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (1318.55) for latency ExtremeSparseL4U32
2025-08-07 02:59:31,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 15 minutes, 25 seconds)
2025-08-07 03:01:18,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:01:33,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1325.75659 ± 294.819
2025-08-07 03:01:33,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1650.3866, 1492.653, 1622.5785, 1030.3066, 1138.6615, 919.70874, 1564.3912, 1240.2579, 1683.3763, 915.24603]
2025-08-07 03:01:33,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:01:33,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (1325.76) for latency ExtremeSparseL4U32
2025-08-07 03:01:33,655 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 13 minutes, 10 seconds)
2025-08-07 03:03:20,483 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:03:35,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1138.53967 ± 168.173
2025-08-07 03:03:35,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1410.9316, 1173.1885, 1174.0718, 1155.5233, 1174.42, 982.0421, 948.8034, 952.098, 1430.6721, 983.6456]
2025-08-07 03:03:35,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:03:35,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 10 minutes, 59 seconds)
2025-08-07 03:05:22,644 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:05:37,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1237.64087 ± 247.019
2025-08-07 03:05:37,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1254.783, 1371.0144, 1372.5608, 935.42303, 1757.4056, 922.8507, 1203.7882, 1449.6792, 1061.2682, 1047.6357]
2025-08-07 03:05:37,867 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:05:37,874 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 8 minutes, 58 seconds)
2025-08-07 03:07:24,634 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:07:39,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1274.29785 ± 315.598
2025-08-07 03:07:39,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1008.277, 1637.5165, 1922.1263, 904.05115, 1477.8817, 1131.1289, 1078.4773, 1227.0698, 1418.6217, 937.82764]
2025-08-07 03:07:39,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:07:39,962 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 7 minutes, 7 seconds)
2025-08-07 03:09:29,166 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:09:44,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1408.92249 ± 333.619
2025-08-07 03:09:44,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1085.4747, 1779.0162, 1124.4996, 1056.3733, 1214.252, 1409.9877, 1235.5417, 1309.674, 1814.1698, 2060.2344]
2025-08-07 03:09:44,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:09:44,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (1408.92) for latency ExtremeSparseL4U32
2025-08-07 03:09:44,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 5 minutes, 23 seconds)
2025-08-07 03:11:30,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:11:45,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1359.23169 ± 498.849
2025-08-07 03:11:45,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1350.2466, 1170.4949, 1084.0157, 978.5404, 2206.4175, 1064.8259, 898.82623, 2343.52, 1580.933, 914.49713]
2025-08-07 03:11:45,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:11:45,898 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 3 minutes, 15 seconds)
2025-08-07 03:13:34,348 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:13:49,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1225.77466 ± 203.992
2025-08-07 03:13:49,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1106.4683, 963.1926, 1627.8591, 1097.696, 1490.6367, 1128.8544, 1019.9215, 1243.8201, 1179.0563, 1400.2399]
2025-08-07 03:13:49,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:13:49,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 1 minute, 23 seconds)
2025-08-07 03:15:36,489 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:15:51,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1280.86365 ± 327.327
2025-08-07 03:15:51,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1423.9891, 929.7916, 979.6471, 930.1664, 1304.8191, 1367.5046, 2093.7883, 1375.756, 1309.9188, 1093.2549]
2025-08-07 03:15:51,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:15:51,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 59 minutes, 21 seconds)
2025-08-07 03:17:38,569 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:17:55,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1190.55872 ± 197.921
2025-08-07 03:17:55,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [879.71545, 1018.8835, 1448.5626, 1233.9998, 928.3262, 1225.3558, 1467.5911, 1405.9464, 1165.0823, 1132.1241]
2025-08-07 03:17:55,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:17:55,519 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 57 minutes, 27 seconds)
2025-08-07 03:19:43,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:19:58,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1313.82996 ± 387.407
2025-08-07 03:19:58,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1253.2349, 1041.0757, 911.8906, 1256.7181, 1852.8391, 1176.0665, 1035.0925, 1051.829, 1350.8872, 2208.6667]
2025-08-07 03:19:58,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:19:58,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 55 minutes, 17 seconds)
2025-08-07 03:21:46,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:22:01,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1124.57935 ± 193.101
2025-08-07 03:22:01,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1571.9908, 971.5012, 918.09503, 1030.354, 955.2197, 1047.7711, 1002.29944, 1237.1884, 1231.5872, 1279.7865]
2025-08-07 03:22:01,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:22:01,718 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 53 minutes, 22 seconds)
2025-08-07 03:23:48,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:24:04,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1255.80176 ± 242.699
2025-08-07 03:24:04,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [956.3118, 1254.4423, 942.5148, 1302.5275, 1106.1913, 1580.9761, 1420.1477, 1246.7506, 1702.5664, 1045.5889]
2025-08-07 03:24:04,108 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:24:04,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 51 minutes, 12 seconds)
2025-08-07 03:25:50,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:26:05,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1423.31567 ± 288.650
2025-08-07 03:26:05,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1434.101, 1578.1176, 1508.0271, 1225.1495, 1240.8584, 1210.3724, 1053.6818, 1639.8258, 1237.1608, 2105.8608]
2025-08-07 03:26:05,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:26:05,607 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (1423.32) for latency ExtremeSparseL4U32
2025-08-07 03:26:05,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 49 minutes, 6 seconds)
2025-08-07 03:27:52,006 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:28:07,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1457.55981 ± 555.974
2025-08-07 03:28:07,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [895.4943, 2376.894, 2525.0793, 971.46405, 1094.153, 1150.3925, 1826.4888, 1135.2366, 1162.086, 1438.3105]
2025-08-07 03:28:07,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:28:07,302 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (1457.56) for latency ExtremeSparseL4U32
2025-08-07 03:28:07,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 46 minutes, 54 seconds)
2025-08-07 03:29:54,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:30:09,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1116.55786 ± 241.577
2025-08-07 03:30:09,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1020.4634, 931.9599, 1010.8589, 1776.0164, 1280.1676, 1037.8995, 937.4559, 1083.909, 1136.7793, 950.0692]
2025-08-07 03:30:09,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:30:09,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 44 minutes, 47 seconds)
2025-08-07 03:31:57,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:32:12,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1271.17798 ± 277.288
2025-08-07 03:32:12,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1794.0901, 1376.4708, 1352.3182, 1202.3428, 1105.6969, 1086.8145, 1145.8345, 960.7266, 963.5826, 1723.9016]
2025-08-07 03:32:12,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:32:12,534 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 42 minutes, 45 seconds)
2025-08-07 03:34:00,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:34:17,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1311.67590 ± 262.466
2025-08-07 03:34:17,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1427.8455, 1305.4589, 1326.0548, 1274.0524, 1103.7113, 1366.2787, 1256.278, 859.73157, 1241.298, 1956.0511]
2025-08-07 03:34:17,703 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:34:17,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 40 minutes, 54 seconds)
2025-08-07 03:36:04,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:36:21,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1612.47180 ± 559.390
2025-08-07 03:36:21,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1403.1387, 1447.1714, 2365.7136, 1384.8854, 1152.3563, 1038.5135, 1020.5296, 2804.5405, 1546.2405, 1961.6277]
2025-08-07 03:36:21,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:36:21,869 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (1612.47) for latency ExtremeSparseL4U32
2025-08-07 03:36:21,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 39 minutes, 1 second)
2025-08-07 03:38:11,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:38:28,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1265.60193 ± 325.021
2025-08-07 03:38:28,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1073.3976, 1229.0526, 1190.3754, 1528.5099, 1851.1847, 1005.90625, 1034.422, 1812.4064, 977.19415, 953.5715]
2025-08-07 03:38:28,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:38:28,771 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 37 minutes, 17 seconds)
2025-08-07 03:40:18,209 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:40:33,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1688.14771 ± 434.586
2025-08-07 03:40:33,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1809.7892, 1775.0593, 1098.6292, 1705.3743, 2659.1482, 2168.254, 1226.057, 1485.1942, 1501.8103, 1452.1597]
2025-08-07 03:40:33,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:40:33,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (1688.15) for latency ExtremeSparseL4U32
2025-08-07 03:40:33,649 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 35 minutes, 21 seconds)
2025-08-07 03:42:22,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:42:38,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1379.14172 ± 453.652
2025-08-07 03:42:38,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1010.4666, 909.8101, 2186.9792, 1038.2233, 1713.8096, 2034.2306, 1199.6921, 1675.8359, 1031.4264, 990.94257]
2025-08-07 03:42:38,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:42:38,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 33 minutes, 22 seconds)
2025-08-07 03:44:27,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:44:43,254 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1243.81323 ± 168.059
2025-08-07 03:44:43,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1375.0978, 1455.4288, 1192.1471, 1294.673, 973.12616, 947.2278, 1283.5458, 1142.536, 1364.9357, 1409.4141]
2025-08-07 03:44:43,255 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:44:43,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 31 minutes, 16 seconds)
2025-08-07 03:46:32,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:46:47,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1565.74500 ± 398.807
2025-08-07 03:46:47,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1218.5005, 1103.7451, 1444.0424, 1222.8802, 2295.104, 1688.923, 2086.2961, 1543.5652, 1138.2839, 1916.109]
2025-08-07 03:46:47,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:46:47,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 29 minutes, 12 seconds)
2025-08-07 03:48:36,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:48:52,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1337.60071 ± 213.669
2025-08-07 03:48:52,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1524.15, 1247.0468, 1252.2245, 1329.9341, 1558.8717, 1402.799, 1062.5046, 979.2518, 1721.4104, 1297.8146]
2025-08-07 03:48:52,351 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:48:52,365 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 27 minutes, 1 second)
2025-08-07 03:50:40,224 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:50:55,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1785.39880 ± 655.726
2025-08-07 03:50:55,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1190.2174, 973.5526, 2745.2996, 1709.2925, 2553.2712, 1283.1974, 1393.6199, 2412.6113, 1099.9501, 2492.975]
2025-08-07 03:50:55,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:50:55,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1226 [INFO]: New best (1785.40) for latency ExtremeSparseL4U32
2025-08-07 03:50:55,611 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 24 minutes, 52 seconds)
2025-08-07 03:52:45,683 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:53:02,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1710.55347 ± 613.180
2025-08-07 03:53:02,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2475.2214, 1131.7434, 926.2562, 2005.3517, 1592.2352, 2172.2024, 2225.537, 1060.263, 2538.8604, 977.86334]
2025-08-07 03:53:02,563 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:53:02,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 22 minutes, 53 seconds)
2025-08-07 03:54:51,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:55:06,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1526.92542 ± 525.331
2025-08-07 03:55:06,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1693.6293, 1501.1472, 2098.417, 1413.3411, 1203.2971, 1077.2428, 1135.7264, 2804.8733, 1020.85565, 1320.7241]
2025-08-07 03:55:06,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:55:06,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 20 minutes, 47 seconds)
2025-08-07 03:56:55,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:57:11,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1511.13696 ± 499.250
2025-08-07 03:57:11,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1116.3293, 1218.662, 1146.804, 2080.1619, 1310.9016, 1006.3318, 977.71027, 1923.8895, 2472.6184, 1857.9608]
2025-08-07 03:57:11,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:57:11,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 18 minutes, 42 seconds)
2025-08-07 03:58:59,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:59:16,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1446.74902 ± 248.776
2025-08-07 03:59:16,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1394.749, 1725.8445, 1371.4429, 1643.4865, 1462.7063, 1557.2555, 1090.9984, 1214.0692, 1887.0837, 1119.8542]
2025-08-07 03:59:16,942 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:59:16,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 16 minutes, 39 seconds)
2025-08-07 04:01:06,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:01:21,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1366.21509 ± 460.844
2025-08-07 04:01:21,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [934.306, 954.452, 2330.9126, 1168.7045, 1822.4182, 1877.9198, 1385.5363, 1246.9991, 980.90063, 960.0032]
2025-08-07 04:01:21,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:01:21,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 14 minutes, 36 seconds)
2025-08-07 04:03:09,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:03:25,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1522.61194 ± 324.157
2025-08-07 04:03:25,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1299.9075, 1511.8676, 1230.2421, 1660.844, 2216.9023, 1813.8351, 1520.2114, 1051.7095, 1684.0253, 1236.5739]
2025-08-07 04:03:25,371 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:03:25,382 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 12 minutes, 27 seconds)
2025-08-07 04:05:14,719 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:05:29,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1330.00928 ± 435.672
2025-08-07 04:05:29,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1155.4357, 2198.818, 1183.3041, 989.2364, 1212.3386, 1253.7416, 1029.7466, 1141.3875, 969.533, 2166.5518]
2025-08-07 04:05:29,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:05:30,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 10 minutes, 23 seconds)
2025-08-07 04:07:20,126 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:07:36,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1452.40112 ± 291.988
2025-08-07 04:07:36,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1620.124, 1071.8418, 1568.6389, 1254.8424, 1579.6882, 1040.6328, 1206.1016, 1713.8223, 2011.2335, 1457.0857]
2025-08-07 04:07:36,916 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:07:36,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 8 minutes, 20 seconds)
2025-08-07 04:09:25,653 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:09:40,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1480.43188 ± 426.990
2025-08-07 04:09:40,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1201.675, 1614.6749, 948.16327, 1643.806, 1522.6732, 1061.8042, 2221.1548, 1082.7496, 2192.0715, 1315.5459]
2025-08-07 04:09:40,996 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:09:41,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 6 minutes, 14 seconds)
2025-08-07 04:11:26,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:11:43,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1420.94604 ± 537.956
2025-08-07 04:11:43,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [957.0343, 902.4427, 1252.7141, 1117.3651, 1360.6282, 2807.6477, 1964.313, 1285.395, 1351.8373, 1210.0844]
2025-08-07 04:11:43,283 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:11:43,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 4 minutes, 8 seconds)
2025-08-07 04:13:30,289 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:13:45,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1322.14954 ± 260.115
2025-08-07 04:13:45,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [1070.706, 1076.5951, 1538.7039, 1170.1455, 1034.6351, 1208.581, 1799.5641, 1564.1935, 1161.8191, 1596.5516]
2025-08-07 04:13:45,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:13:45,670 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 2 minutes, 4 seconds)
2025-08-07 04:15:34,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:15:49,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1221 [DEBUG]: Total Reward: 1593.06348 ± 655.312
2025-08-07 04:15:49,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1222 [DEBUG]: All rewards: [2274.141, 1897.9594, 948.9212, 2703.084, 946.72125, 1096.5712, 781.0952, 2419.9563, 1307.2556, 1554.9298]
2025-08-07 04:15:49,788 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:15:49,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc15-halfcheetah):1251 [DEBUG]: Training session finished
