2025-08-07 00:48:19,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc10-halfcheetah/ExtremeSparseL4U32-bpql-mem32
2025-08-07 00:48:19,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc10-halfcheetah/ExtremeSparseL4U32-bpql-mem32
2025-08-07 00:48:19,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x153db4a49490>}
2025-08-07 00:48:19,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1111 [DEBUG]: using device: cuda
2025-08-07 00:48:19,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1133 [INFO]: Creating new trainer
2025-08-07 00:48:19,626 baseline-bpql-noiseperc10-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=209, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 00:48:19,627 baseline-bpql-noiseperc10-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 00:48:33,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1194 [DEBUG]: Starting training session...
2025-08-07 00:48:33,807 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 1/100
2025-08-07 00:50:13,709 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:50:29,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: -514.66168 ± 25.235
2025-08-07 00:50:29,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [-523.747, -487.86728, -541.3361, -553.56903, -505.049, -542.4379, -489.4447, -494.9537, -528.9178, -479.29413]
2025-08-07 00:50:29,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 00:50:29,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (-514.66) for latency ExtremeSparseL4U32
2025-08-07 00:50:29,812 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 11 minutes, 24 seconds)
2025-08-07 00:52:14,679 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:52:30,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: -220.33057 ± 66.616
2025-08-07 00:52:30,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [-331.71756, -203.74469, -111.96563, -271.0205, -123.597206, -285.58316, -244.60632, -254.16472, -188.52652, -188.3794]
2025-08-07 00:52:30,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 00:52:30,579 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (-220.33) for latency ExtremeSparseL4U32
2025-08-07 00:52:30,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 13 minutes, 22 seconds)
2025-08-07 00:54:15,876 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:54:31,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: -154.68085 ± 92.663
2025-08-07 00:54:31,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [-175.3735, -165.05365, -111.72465, 83.40126, -131.87003, -225.73198, -280.24973, -221.09209, -137.12735, -181.98677]
2025-08-07 00:54:31,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 00:54:31,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (-154.68) for latency ExtremeSparseL4U32
2025-08-07 00:54:31,693 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 12 minutes, 51 seconds)
2025-08-07 00:56:16,759 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:56:32,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: -141.05070 ± 68.075
2025-08-07 00:56:32,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [-209.66634, -28.687923, -278.234, -162.9619, -97.64702, -187.28142, -118.63399, -134.44975, -120.63224, -72.312515]
2025-08-07 00:56:32,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 00:56:32,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (-141.05) for latency ExtremeSparseL4U32
2025-08-07 00:56:32,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 11 minutes, 32 seconds)
2025-08-07 00:58:17,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:58:33,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: -123.37244 ± 66.194
2025-08-07 00:58:33,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [-187.70256, -169.94254, -17.116514, -73.97262, -164.27121, -177.56384, -161.1176, 5.2123065, -129.65622, -157.59346]
2025-08-07 00:58:33,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 00:58:33,529 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (-123.37) for latency ExtremeSparseL4U32
2025-08-07 00:58:33,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 9 minutes, 54 seconds)
2025-08-07 01:00:18,511 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:00:34,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2.61661 ± 63.719
2025-08-07 01:00:34,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [81.849495, -24.76903, 14.262057, 70.53777, -117.19863, 28.594326, 30.428366, 36.759136, 11.59206, -105.88949]
2025-08-07 01:00:34,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:00:34,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (2.62) for latency ExtremeSparseL4U32
2025-08-07 01:00:34,272 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 9 minutes, 23 seconds)
2025-08-07 01:02:19,454 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:02:35,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 138.77437 ± 259.818
2025-08-07 01:02:35,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [-444.50067, 278.8375, 125.95712, 369.9613, -267.85403, 162.29337, 260.4313, 297.02094, 334.9283, 270.66843]
2025-08-07 01:02:35,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:02:35,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (138.77) for latency ExtremeSparseL4U32
2025-08-07 01:02:35,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 7 minutes, 25 seconds)
2025-08-07 01:04:20,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:04:35,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 469.69159 ± 168.172
2025-08-07 01:04:35,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [581.3137, 657.82166, 237.27469, 99.395424, 405.63672, 582.38684, 556.94196, 494.04865, 606.75494, 475.34146]
2025-08-07 01:04:35,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:04:35,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (469.69) for latency ExtremeSparseL4U32
2025-08-07 01:04:35,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 5 minutes, 15 seconds)
2025-08-07 01:06:20,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:06:36,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 661.26770 ± 210.741
2025-08-07 01:06:36,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [569.592, 844.6831, 583.60004, 282.518, 667.44763, 853.51483, 887.6163, 785.0797, 826.3324, 312.29224]
2025-08-07 01:06:36,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:06:36,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (661.27) for latency ExtremeSparseL4U32
2025-08-07 01:06:36,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 3 hours, 3 minutes, 11 seconds)
2025-08-07 01:08:21,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:08:37,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 607.80597 ± 296.826
2025-08-07 01:08:37,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [812.9883, 650.8523, 940.017, 361.89877, 697.5474, 862.4739, 683.812, 792.11664, 375.35886, -99.005356]
2025-08-07 01:08:37,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:08:37,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 3 hours, 1 minute, 10 seconds)
2025-08-07 01:10:22,323 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:10:38,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 810.66199 ± 166.498
2025-08-07 01:10:38,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [861.73865, 748.589, 952.5976, 905.9267, 917.81683, 489.9669, 525.83234, 989.1348, 787.3848, 927.6322]
2025-08-07 01:10:38,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:10:38,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (810.66) for latency ExtremeSparseL4U32
2025-08-07 01:10:38,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 59 minutes, 11 seconds)
2025-08-07 01:12:23,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:12:38,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 943.52502 ± 82.733
2025-08-07 01:12:38,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [858.75037, 1086.2744, 868.9861, 827.60223, 873.8779, 935.2726, 1051.3584, 945.1827, 1005.1428, 982.8023]
2025-08-07 01:12:38,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:12:38,931 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (943.53) for latency ExtremeSparseL4U32
2025-08-07 01:12:38,936 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 57 minutes, 6 seconds)
2025-08-07 01:14:23,961 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:14:39,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 870.45038 ± 411.251
2025-08-07 01:14:39,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [908.4197, -292.62747, 1006.6203, 1312.513, 1144.2361, 1006.32, 883.12036, 784.5104, 976.84796, 974.5433]
2025-08-07 01:14:39,799 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:14:39,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 55 minutes, 9 seconds)
2025-08-07 01:16:24,726 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:16:40,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 951.45703 ± 98.840
2025-08-07 01:16:40,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1072.5483, 800.9284, 1034.3943, 985.3667, 1032.3888, 911.2188, 1000.5675, 761.93756, 903.9033, 1011.3162]
2025-08-07 01:16:40,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:16:40,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (951.46) for latency ExtremeSparseL4U32
2025-08-07 01:16:40,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 53 minutes, 8 seconds)
2025-08-07 01:18:25,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:18:41,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 920.60938 ± 127.235
2025-08-07 01:18:41,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1037.5269, 740.55835, 964.49384, 777.57184, 1060.3109, 997.30585, 743.13385, 1052.6823, 818.69714, 1013.8128]
2025-08-07 01:18:41,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:18:41,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 51 minutes, 3 seconds)
2025-08-07 01:20:26,158 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:20:41,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1011.24255 ± 203.768
2025-08-07 01:20:41,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [952.49414, 819.9109, 1059.3342, 1069.9116, 1078.5168, 1090.4542, 1025.459, 525.79895, 1164.033, 1326.5126]
2025-08-07 01:20:41,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:20:41,975 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (1011.24) for latency ExtremeSparseL4U32
2025-08-07 01:20:41,983 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 49 minutes, 2 seconds)
2025-08-07 01:22:26,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:22:42,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1094.84338 ± 138.395
2025-08-07 01:22:42,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1010.06, 1069.6935, 990.55255, 1000.6455, 1218.5125, 1141.4714, 1073.6757, 1167.5194, 876.41534, 1399.8883]
2025-08-07 01:22:42,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:22:42,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (1094.84) for latency ExtremeSparseL4U32
2025-08-07 01:22:42,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 47 minutes, 4 seconds)
2025-08-07 01:24:27,673 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:24:43,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1202.11426 ± 156.624
2025-08-07 01:24:43,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1336.4308, 1013.4203, 1243.523, 973.0781, 1313.8721, 1005.0514, 1321.4609, 1085.7979, 1418.1842, 1310.3228]
2025-08-07 01:24:43,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:24:43,520 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (1202.11) for latency ExtremeSparseL4U32
2025-08-07 01:24:43,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 45 minutes, 1 second)
2025-08-07 01:26:28,092 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:26:44,050 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 977.40741 ± 237.632
2025-08-07 01:26:44,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1018.9684, 1041.2153, 1049.8196, 1150.1329, 581.19006, 489.92877, 1019.8878, 979.3851, 1284.8253, 1158.7205]
2025-08-07 01:26:44,051 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:26:44,058 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 42 minutes, 56 seconds)
2025-08-07 01:28:28,998 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:28:44,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1017.50018 ± 178.353
2025-08-07 01:28:44,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1034.874, 919.9528, 658.5268, 936.81616, 1377.6, 989.66034, 956.2362, 1028.8583, 1064.1953, 1208.2819]
2025-08-07 01:28:44,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:28:44,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 40 minutes, 58 seconds)
2025-08-07 01:30:30,155 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:30:46,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1049.75842 ± 94.249
2025-08-07 01:30:46,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1066.8668, 1191.3301, 1157.9498, 1004.8649, 1008.659, 1082.0883, 1049.3842, 1074.1687, 1039.0724, 823.20013]
2025-08-07 01:30:46,069 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:30:46,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 39 minutes, 4 seconds)
2025-08-07 01:32:30,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:32:46,745 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1139.42334 ± 103.941
2025-08-07 01:32:46,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1034.7576, 1147.3701, 1035.0497, 1235.2446, 1022.10815, 1225.6774, 1138.9545, 1038.4946, 1165.2906, 1351.2863]
2025-08-07 01:32:46,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:32:46,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 37 minutes)
2025-08-07 01:34:31,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:34:47,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1232.36938 ± 150.411
2025-08-07 01:34:47,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1215.7994, 1608.9811, 1164.5406, 1301.1804, 1099.5925, 1089.6514, 1116.7036, 1202.805, 1160.3484, 1364.0902]
2025-08-07 01:34:47,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:34:47,576 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (1232.37) for latency ExtremeSparseL4U32
2025-08-07 01:34:47,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 35 minutes, 2 seconds)
2025-08-07 01:36:32,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:36:48,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1076.03491 ± 175.815
2025-08-07 01:36:48,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1087.7756, 1016.3669, 1290.83, 845.28796, 1086.4519, 1040.1635, 986.1099, 1384.0333, 1227.7631, 795.56683]
2025-08-07 01:36:48,426 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:36:48,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 33 minutes, 6 seconds)
2025-08-07 01:38:33,075 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:38:48,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1110.99231 ± 192.398
2025-08-07 01:38:48,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1241.6017, 1046.481, 1299.5138, 1206.4525, 1154.7692, 1060.1731, 1173.2517, 1271.8401, 1062.2166, 593.6226]
2025-08-07 01:38:48,970 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:38:48,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 31 minutes, 1 second)
2025-08-07 01:40:33,751 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:40:49,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1245.98157 ± 303.106
2025-08-07 01:40:49,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1129.4086, 1394.7556, 2048.4172, 1143.0103, 1123.2819, 1061.1362, 1456.1437, 1032.2556, 1058.0482, 1013.3586]
2025-08-07 01:40:49,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:40:49,629 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (1245.98) for latency ExtremeSparseL4U32
2025-08-07 01:40:49,636 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 28 minutes, 52 seconds)
2025-08-07 01:42:34,389 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:42:50,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1225.76636 ± 137.813
2025-08-07 01:42:50,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1116.157, 1226.7831, 1360.5991, 1116.4257, 1135.4393, 1177.6422, 1135.156, 1121.2406, 1308.915, 1559.3054]
2025-08-07 01:42:50,177 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:42:50,189 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 26 minutes, 50 seconds)
2025-08-07 01:44:35,002 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:44:50,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1259.55396 ± 169.448
2025-08-07 01:44:50,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1262.1046, 1281.4696, 1474.7234, 1190.527, 1317.8895, 1059.634, 1607.0511, 1250.8585, 1032.4744, 1118.8068]
2025-08-07 01:44:50,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:44:50,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (1259.55) for latency ExtremeSparseL4U32
2025-08-07 01:44:50,853 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 24 minutes, 47 seconds)
2025-08-07 01:46:35,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:46:51,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1165.70923 ± 251.386
2025-08-07 01:46:51,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1099.2496, 1103.3822, 1884.9215, 1084.9243, 1102.9618, 1154.4163, 1108.1829, 1073.8182, 1169.3982, 875.8371]
2025-08-07 01:46:51,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:46:51,551 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 22 minutes, 44 seconds)
2025-08-07 01:48:36,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:48:52,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1275.87817 ± 243.886
2025-08-07 01:48:52,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1488.6869, 1131.0975, 1725.0446, 960.3023, 1272.9396, 1160.5579, 1046.389, 1193.9999, 1136.359, 1643.4039]
2025-08-07 01:48:52,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:48:52,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (1275.88) for latency ExtremeSparseL4U32
2025-08-07 01:48:52,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 20 minutes, 43 seconds)
2025-08-07 01:50:36,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:50:52,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1135.94312 ± 398.571
2025-08-07 01:50:52,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [946.97986, 1047.3656, 349.06137, 1186.3352, 1021.3268, 1564.9912, 1103.3519, 1046.2034, 1982.3345, 1111.482]
2025-08-07 01:50:52,567 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:50:52,575 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 18 minutes, 40 seconds)
2025-08-07 01:52:37,461 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:52:53,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1274.36096 ± 213.839
2025-08-07 01:52:53,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1197.7227, 1283.3014, 1411.5536, 1342.888, 1258.3663, 883.36316, 1602.5455, 1228.1635, 980.8497, 1554.8556]
2025-08-07 01:52:53,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:52:53,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 16 minutes, 43 seconds)
2025-08-07 01:54:38,198 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:54:54,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1384.21655 ± 307.598
2025-08-07 01:54:54,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1722.9084, 1197.8903, 1512.7495, 1250.4064, 1106.1421, 1087.504, 1245.067, 1016.9598, 1794.5782, 1907.9608]
2025-08-07 01:54:54,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:54:54,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (1384.22) for latency ExtremeSparseL4U32
2025-08-07 01:54:54,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 14 minutes, 42 seconds)
2025-08-07 01:56:38,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:56:54,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1354.51868 ± 297.723
2025-08-07 01:56:54,847 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1158.3239, 1706.9578, 1281.7239, 1031.5315, 1932.9426, 1185.2181, 1719.0526, 1120.162, 1109.5369, 1299.7372]
2025-08-07 01:56:54,848 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:56:54,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 12 minutes, 43 seconds)
2025-08-07 01:58:39,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:58:55,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1264.70044 ± 189.628
2025-08-07 01:58:55,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1329.9358, 1173.6201, 1324.3274, 1771.3556, 1335.6215, 1123.4076, 1116.7045, 1098.2412, 1192.8052, 1180.9857]
2025-08-07 01:58:55,620 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:58:55,628 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 10 minutes, 45 seconds)
2025-08-07 02:00:40,452 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:00:56,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1253.19116 ± 215.798
2025-08-07 02:00:56,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1041.7609, 1479.6798, 1484.5415, 1092.0476, 1049.1024, 1511.9341, 1569.6492, 1073.5123, 1041.0577, 1188.625]
2025-08-07 02:00:56,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:00:56,380 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 8 minutes, 48 seconds)
2025-08-07 02:02:41,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:02:57,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1237.87634 ± 160.106
2025-08-07 02:02:57,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1041.9425, 1188.6945, 1088.8815, 1230.2417, 1317.748, 1341.7587, 1034.2587, 1524.4852, 1158.0933, 1452.6603]
2025-08-07 02:02:57,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:02:57,411 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 6 minutes, 50 seconds)
2025-08-07 02:04:42,187 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:04:57,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1213.03979 ± 246.089
2025-08-07 02:04:57,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1177.7539, 1031.4193, 1507.4984, 1130.055, 1189.0128, 1086.1735, 1826.8, 973.1095, 1144.1752, 1064.4008]
2025-08-07 02:04:57,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:04:57,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 4 minutes, 48 seconds)
2025-08-07 02:06:42,780 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:06:58,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1249.58276 ± 288.518
2025-08-07 02:06:58,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1195.7426, 1261.4087, 1134.3043, 1252.0891, 1522.8828, 849.01654, 1042.0018, 1022.6412, 1269.0372, 1946.7029]
2025-08-07 02:06:58,541 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:06:58,548 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 2 hours, 2 minutes, 45 seconds)
2025-08-07 02:08:43,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:08:59,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1220.47607 ± 204.016
2025-08-07 02:08:59,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1247.4824, 932.6059, 1156.9044, 1070.494, 1166.7091, 1767.1744, 1269.2507, 1214.0692, 1185.1383, 1194.9324]
2025-08-07 02:08:59,386 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:08:59,395 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 2 hours, 45 seconds)
2025-08-07 02:10:44,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:11:00,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1172.54016 ± 201.232
2025-08-07 02:11:00,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1108.0148, 922.1466, 1331.6406, 1108.4879, 1081.0326, 1635.0883, 1042.5427, 1095.3344, 1376.5, 1024.6133]
2025-08-07 02:11:00,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:11:00,376 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 58 minutes, 47 seconds)
2025-08-07 02:12:45,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:13:01,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1225.79907 ± 136.719
2025-08-07 02:13:01,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1085.3429, 1271.1073, 1508.1403, 1152.0466, 1046.5922, 1170.085, 1169.6748, 1287.3446, 1159.1426, 1408.5137]
2025-08-07 02:13:01,140 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:13:01,147 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 56 minutes, 43 seconds)
2025-08-07 02:14:46,152 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:15:02,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1403.94507 ± 324.621
2025-08-07 02:15:02,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1144.2316, 1300.2515, 1173.5936, 1075.3704, 1854.9347, 1889.92, 1552.6191, 1167.8438, 1056.6113, 1824.0748]
2025-08-07 02:15:02,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:15:02,065 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (1403.95) for latency ExtremeSparseL4U32
2025-08-07 02:15:02,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 54 minutes, 46 seconds)
2025-08-07 02:16:46,921 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:17:02,738 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1374.80432 ± 324.392
2025-08-07 02:17:02,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1618.8673, 1390.3567, 1273.4197, 1407.075, 1176.689, 1371.06, 2205.6711, 966.5766, 1182.3152, 1156.0114]
2025-08-07 02:17:02,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:17:02,748 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 52 minutes, 47 seconds)
2025-08-07 02:18:47,640 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:19:03,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1399.54797 ± 425.414
2025-08-07 02:19:03,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1247.2206, 2260.2058, 1205.2708, 972.56647, 1154.8049, 1634.8164, 2070.857, 1370.3977, 1040.2494, 1039.0911]
2025-08-07 02:19:03,537 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:19:03,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 50 minutes, 45 seconds)
2025-08-07 02:20:48,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:21:04,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1374.24072 ± 277.820
2025-08-07 02:21:04,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1023.4486, 1643.8947, 1671.6511, 1288.5526, 1115.5098, 1504.6108, 1851.1486, 1452.4778, 1156.564, 1034.5498]
2025-08-07 02:21:04,591 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:21:04,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 48 minutes, 45 seconds)
2025-08-07 02:22:49,598 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:23:05,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1253.39917 ± 365.682
2025-08-07 02:23:05,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [508.69464, 1226.9882, 1315.5165, 868.685, 1785.0319, 1702.6978, 1599.9967, 1235.6841, 1115.3945, 1175.3018]
2025-08-07 02:23:05,434 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:23:05,441 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 46 minutes, 45 seconds)
2025-08-07 02:24:50,336 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:25:06,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1512.46375 ± 306.911
2025-08-07 02:25:06,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1807.8328, 1269.584, 1139.1786, 1362.7041, 1781.1238, 1562.6227, 1160.5973, 2145.9282, 1334.1093, 1560.9563]
2025-08-07 02:25:06,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:25:06,202 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (1512.46) for latency ExtremeSparseL4U32
2025-08-07 02:25:06,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 44 minutes, 43 seconds)
2025-08-07 02:26:51,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:27:07,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1276.49634 ± 153.408
2025-08-07 02:27:07,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1462.1904, 1326.1334, 1219.918, 1296.5942, 1285.5615, 1055.2631, 1051.9875, 1450.583, 1485.8792, 1130.8538]
2025-08-07 02:27:07,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:27:07,037 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 42 minutes, 43 seconds)
2025-08-07 02:28:51,785 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:29:07,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1387.15857 ± 272.860
2025-08-07 02:29:07,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1682.4592, 1693.3987, 1071.2733, 1107.2744, 1109.8859, 1641.197, 1106.7687, 1760.3416, 1241.5996, 1457.3866]
2025-08-07 02:29:07,772 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:29:07,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 40 minutes, 42 seconds)
2025-08-07 02:30:52,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:31:08,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1289.25415 ± 428.193
2025-08-07 02:31:08,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1067.1154, 1045.5243, 1202.1362, 1051.4629, 1463.7823, 1091.1586, 1065.2323, 1279.6967, 1109.1011, 2517.33]
2025-08-07 02:31:08,711 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:31:08,720 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 38 minutes, 40 seconds)
2025-08-07 02:32:53,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:33:09,660 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1359.21216 ± 296.368
2025-08-07 02:33:09,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1477.6854, 1428.3838, 1077.9274, 1097.5149, 2120.3066, 1104.5233, 1353.7743, 1390.2659, 1110.9034, 1430.8368]
2025-08-07 02:33:09,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:33:09,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 36 minutes, 40 seconds)
2025-08-07 02:34:54,526 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:35:10,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1156.15222 ± 483.767
2025-08-07 02:35:10,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1914.2836, 1361.58, 1185.1774, 1079.033, 1122.6083, 1315.6144, -112.33873, 1290.1357, 1373.9501, 1031.4781]
2025-08-07 02:35:10,334 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:35:10,342 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 34 minutes, 38 seconds)
2025-08-07 02:36:55,339 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:37:11,185 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1419.10681 ± 184.872
2025-08-07 02:37:11,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1757.0641, 1454.9618, 1317.6732, 1264.9713, 1244.5787, 1191.3086, 1672.0507, 1521.6327, 1255.2795, 1511.5491]
2025-08-07 02:37:11,186 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:37:11,197 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 32 minutes, 38 seconds)
2025-08-07 02:38:56,134 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:39:12,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1474.59888 ± 166.063
2025-08-07 02:39:12,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1447.1295, 1629.6296, 1470.1124, 1257.8496, 1756.2549, 1466.4675, 1442.394, 1376.7405, 1687.1183, 1212.293]
2025-08-07 02:39:12,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:39:12,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 30 minutes, 38 seconds)
2025-08-07 02:40:57,019 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:41:12,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1548.92505 ± 354.551
2025-08-07 02:41:12,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1313.2825, 1288.5553, 1636.537, 1059.9208, 1843.4769, 1968.0293, 1652.0488, 1483.1558, 2169.3242, 1074.9185]
2025-08-07 02:41:12,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:41:12,908 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (1548.93) for latency ExtremeSparseL4U32
2025-08-07 02:41:12,914 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 28 minutes, 36 seconds)
2025-08-07 02:42:57,860 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:43:13,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1661.38354 ± 506.125
2025-08-07 02:43:13,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1434.1481, 2538.3538, 1125.14, 1589.7347, 1314.8026, 2012.2043, 2129.718, 1090.1862, 2263.2773, 1116.2699]
2025-08-07 02:43:13,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:43:13,712 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (1661.38) for latency ExtremeSparseL4U32
2025-08-07 02:43:13,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 26 minutes, 34 seconds)
2025-08-07 02:44:58,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:45:14,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1565.55542 ± 373.795
2025-08-07 02:45:14,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1704.5538, 2342.3965, 1804.4713, 1250.9288, 1502.6436, 1265.7761, 1228.652, 2021.4384, 1154.7878, 1379.905]
2025-08-07 02:45:14,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:45:14,445 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 24 minutes, 34 seconds)
2025-08-07 02:46:59,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:47:15,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1354.10889 ± 308.643
2025-08-07 02:47:15,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1343.1847, 1157.8337, 1463.8806, 1197.4908, 1101.9233, 2198.182, 1485.9694, 1142.4933, 1170.9706, 1279.1598]
2025-08-07 02:47:15,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:47:15,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 22 minutes, 33 seconds)
2025-08-07 02:49:00,286 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:49:16,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1691.74683 ± 489.321
2025-08-07 02:49:16,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1933.4331, 1539.0669, 1693.7463, 1239.1768, 1476.9413, 2323.31, 1180.7444, 1053.8795, 2683.9631, 1793.2064]
2025-08-07 02:49:16,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:49:16,287 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (1691.75) for latency ExtremeSparseL4U32
2025-08-07 02:49:16,298 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 20 minutes, 33 seconds)
2025-08-07 02:51:01,121 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:51:17,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1487.99341 ± 447.202
2025-08-07 02:51:17,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1609.1396, 2496.4297, 1916.6708, 1168.4489, 1138.1655, 1121.2434, 1101.9027, 1849.4658, 1134.2854, 1344.1819]
2025-08-07 02:51:17,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:51:17,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 18 minutes, 32 seconds)
2025-08-07 02:53:01,935 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:53:17,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1445.53967 ± 396.447
2025-08-07 02:53:17,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1207.4862, 1473.202, 1058.6536, 1056.3643, 1870.3113, 1414.753, 1874.3428, 1075.843, 2245.5876, 1178.8524]
2025-08-07 02:53:17,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:53:17,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 16 minutes, 31 seconds)
2025-08-07 02:55:02,616 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:55:18,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1598.23254 ± 364.355
2025-08-07 02:55:18,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1939.4484, 1784.6642, 1200.1907, 1467.6174, 2377.1853, 1761.2867, 1393.7732, 1359.0304, 1080.4758, 1618.6527]
2025-08-07 02:55:18,539 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:55:18,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 14 minutes, 30 seconds)
2025-08-07 02:57:03,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:57:19,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1335.57593 ± 171.910
2025-08-07 02:57:19,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1619.4729, 1310.8259, 1554.9908, 1200.566, 1121.4282, 1282.6539, 1325.4313, 1461.6558, 1059.3967, 1419.339]
2025-08-07 02:57:19,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:57:19,582 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 12 minutes, 31 seconds)
2025-08-07 02:59:04,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:59:20,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1399.22388 ± 549.112
2025-08-07 02:59:20,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2065.9167, 1083.805, 2281.2761, 231.81523, 1394.9451, 1527.3573, 1082.3535, 1814.0516, 1262.2225, 1248.4961]
2025-08-07 02:59:20,355 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:59:20,364 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 10 minutes, 28 seconds)
2025-08-07 03:01:04,399 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:01:20,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1770.46802 ± 589.780
2025-08-07 03:01:20,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2188.6138, 1333.7407, 1023.391, 2445.1348, 2313.7341, 1883.7202, 1794.1982, 1041.8049, 2630.847, 1049.4943]
2025-08-07 03:01:20,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:01:20,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (1770.47) for latency ExtremeSparseL4U32
2025-08-07 03:01:20,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 8 minutes, 20 seconds)
2025-08-07 03:03:03,829 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:03:19,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1351.30774 ± 274.492
2025-08-07 03:03:19,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [970.1141, 1418.5393, 1102.78, 1348.8237, 1275.9353, 1726.0916, 1269.6017, 1864.5342, 1031.8253, 1504.832]
2025-08-07 03:03:19,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:03:19,669 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 6 minutes, 12 seconds)
2025-08-07 03:05:03,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:05:18,781 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1513.24768 ± 288.604
2025-08-07 03:05:18,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1138.9163, 1209.2666, 1505.6342, 2188.0022, 1355.9762, 1450.9861, 1368.5023, 1721.4274, 1458.7699, 1734.9957]
2025-08-07 03:05:18,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:05:18,789 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 4 minutes, 1 second)
2025-08-07 03:07:01,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:07:17,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1615.31152 ± 641.949
2025-08-07 03:07:17,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1429.9487, 1275.3119, 1219.2909, 2453.0676, 2847.628, 2380.396, 1111.0585, 1032.1067, 1011.62756, 1392.679]
2025-08-07 03:07:17,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:07:17,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 1 minute, 46 seconds)
2025-08-07 03:09:00,664 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:09:16,360 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1387.97717 ± 414.092
2025-08-07 03:09:16,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1047.6831, 1602.949, 1010.3242, 1061.796, 1974.732, 1145.2021, 1423.8448, 1197.4923, 1134.1863, 2281.561]
2025-08-07 03:09:16,361 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:09:16,370 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 59 minutes, 36 seconds)
2025-08-07 03:10:59,367 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:11:15,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1739.66699 ± 444.585
2025-08-07 03:11:15,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1410.5583, 1708.9303, 1791.4148, 1340.6083, 2524.7441, 1983.0554, 1366.4646, 1558.0908, 2500.7178, 1212.0846]
2025-08-07 03:11:15,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:11:15,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 57 minutes, 31 seconds)
2025-08-07 03:12:58,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:13:14,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1329.04907 ± 169.331
2025-08-07 03:13:14,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1378.2303, 1069.405, 1321.2808, 1437.8374, 1264.6022, 1192.6945, 1299.7554, 1133.8114, 1608.4259, 1584.4482]
2025-08-07 03:13:14,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:13:14,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 55 minutes, 28 seconds)
2025-08-07 03:14:56,924 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:15:12,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1473.97827 ± 352.351
2025-08-07 03:15:12,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1393.52, 1344.5131, 1543.0546, 1036.831, 1176.5345, 1412.7095, 1603.5942, 1829.986, 1114.3259, 2284.7139]
2025-08-07 03:15:12,681 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:15:12,692 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 53 minutes, 27 seconds)
2025-08-07 03:16:55,794 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:17:11,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1527.48462 ± 450.780
2025-08-07 03:17:11,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1247.7007, 1343.7765, 2116.5232, 1121.2499, 1087.6389, 2104.1082, 1081.1014, 1990.7438, 2070.6414, 1111.3636]
2025-08-07 03:17:11,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:17:11,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 51 minutes, 28 seconds)
2025-08-07 03:18:54,407 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:19:10,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1555.20679 ± 435.654
2025-08-07 03:19:10,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1500.1687, 1110.9868, 1568.7703, 2756.735, 1721.5035, 1555.9296, 1405.7748, 1222.1998, 1434.6453, 1275.3536]
2025-08-07 03:19:10,078 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:19:10,085 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 49 minutes, 28 seconds)
2025-08-07 03:20:53,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:21:08,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1490.65894 ± 519.900
2025-08-07 03:21:08,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1304.5837, 1334.5282, 2292.5464, 1136.6443, 1243.5616, 1167.8403, 2716.914, 1144.5394, 1296.0641, 1269.3668]
2025-08-07 03:21:08,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:21:08,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 47 minutes, 29 seconds)
2025-08-07 03:22:51,779 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:23:07,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1556.14880 ± 370.539
2025-08-07 03:23:07,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1754.8618, 1099.88, 2397.3894, 1784.8655, 1449.2504, 1667.4968, 1382.018, 1111.9756, 1673.88, 1239.871]
2025-08-07 03:23:07,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:23:07,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 45 minutes, 30 seconds)
2025-08-07 03:24:50,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:25:05,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1342.62488 ± 161.385
2025-08-07 03:25:05,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1251.2162, 1196.0719, 1494.9684, 1332.696, 1182.6298, 1323.894, 1467.3457, 1533.674, 1575.2994, 1068.4534]
2025-08-07 03:25:05,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:25:05,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 43 minutes, 30 seconds)
2025-08-07 03:26:48,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:27:04,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1230.54272 ± 457.057
2025-08-07 03:27:04,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1421.7577, 2030.3568, 1051.3491, 1818.596, 1180.7661, 1296.1118, 1153.2097, 869.74445, 286.7413, 1196.7944]
2025-08-07 03:27:04,192 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:27:04,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 41 minutes, 29 seconds)
2025-08-07 03:28:46,603 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:29:02,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1570.64380 ± 563.703
2025-08-07 03:29:02,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1223.2489, 1211.9119, 2567.4204, 1175.8031, 1180.1438, 2551.8198, 1192.8385, 1050.6116, 1488.3898, 2064.2498]
2025-08-07 03:29:02,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:29:02,132 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 39 minutes, 28 seconds)
2025-08-07 03:30:44,462 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:31:00,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1434.00024 ± 329.165
2025-08-07 03:31:00,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1149.4221, 1072.6469, 1553.205, 2050.3135, 1392.9966, 1737.5397, 1459.3215, 1070.5652, 1781.4281, 1072.5643]
2025-08-07 03:31:00,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:31:00,151 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 37 minutes, 27 seconds)
2025-08-07 03:32:42,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:32:58,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1410.39722 ± 353.408
2025-08-07 03:32:58,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1180.3976, 1495.5688, 1052.3223, 2292.6091, 1601.2527, 1112.3832, 1428.6797, 1131.9248, 1205.5347, 1603.3011]
2025-08-07 03:32:58,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:32:58,251 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 35 minutes, 26 seconds)
2025-08-07 03:34:40,501 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:34:56,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1457.87341 ± 316.664
2025-08-07 03:34:56,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2159.5151, 1540.9928, 1657.9596, 1087.2949, 1393.4532, 1170.29, 1736.7086, 1281.332, 1096.7559, 1454.4323]
2025-08-07 03:34:56,017 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:34:56,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 33 minutes, 26 seconds)
2025-08-07 03:36:38,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:36:53,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1600.82471 ± 442.197
2025-08-07 03:36:53,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [2444.8938, 1415.3737, 1606.2423, 1121.2936, 2112.0312, 1179.8693, 2126.4897, 1178.4268, 1494.9694, 1328.6559]
2025-08-07 03:36:53,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:36:53,841 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 31 minutes, 26 seconds)
2025-08-07 03:38:36,011 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:38:51,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1560.26428 ± 414.272
2025-08-07 03:38:51,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1400.3317, 2205.949, 2256.627, 1521.471, 1078.1183, 1373.1722, 1118.0907, 1505.2628, 1972.4475, 1171.1721]
2025-08-07 03:38:51,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:38:51,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 29 minutes, 28 seconds)
2025-08-07 03:40:35,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:40:50,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1552.20264 ± 480.340
2025-08-07 03:40:50,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1638.7728, 1824.42, 1271.1622, 1851.9688, 1029.7051, 1674.0812, 2700.1924, 1315.7885, 1092.0825, 1123.8529]
2025-08-07 03:40:50,717 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:40:50,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 27 minutes, 33 seconds)
2025-08-07 03:42:33,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:42:49,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1375.33716 ± 192.175
2025-08-07 03:42:49,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1740.5198, 1293.7045, 1248.8734, 1735.9702, 1323.554, 1333.0114, 1122.1005, 1326.9108, 1363.6495, 1265.0785]
2025-08-07 03:42:49,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:42:49,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 25 minutes, 37 seconds)
2025-08-07 03:44:32,905 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:44:48,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 2019.46167 ± 758.989
2025-08-07 03:44:48,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1616.1893, 1489.9208, 3101.102, 1624.2681, 3092.6086, 2036.8793, 1408.4584, 1249.192, 3217.6943, 1358.3038]
2025-08-07 03:44:48,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:44:48,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1226 [INFO]: New best (2019.46) for latency ExtremeSparseL4U32
2025-08-07 03:44:48,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 23 minutes, 41 seconds)
2025-08-07 03:46:31,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:46:47,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1334.66626 ± 411.217
2025-08-07 03:46:47,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1173.5233, 1137.6667, 1165.8562, 2494.486, 1202.6918, 1088.8082, 1097.1185, 1109.6759, 1590.232, 1286.6035]
2025-08-07 03:46:47,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:46:47,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 21 minutes, 46 seconds)
2025-08-07 03:48:30,849 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:48:46,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1628.80188 ± 652.469
2025-08-07 03:48:46,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1185.1125, 1204.4268, 1357.5891, 1215.5579, 1329.0397, 1141.582, 2689.681, 2885.1992, 2189.6404, 1090.1906]
2025-08-07 03:48:46,466 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:48:46,479 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 19 minutes, 49 seconds)
2025-08-07 03:50:29,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:50:45,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1324.71680 ± 227.491
2025-08-07 03:50:45,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1541.9341, 1429.9668, 1286.6799, 1255.0369, 1661.7228, 1457.9174, 1501.0511, 1097.9979, 869.93854, 1144.9225]
2025-08-07 03:50:45,304 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:50:45,318 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 17 minutes, 50 seconds)
2025-08-07 03:52:28,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:52:44,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1783.78394 ± 435.892
2025-08-07 03:52:44,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1258.7404, 1472.4951, 2788.0776, 1737.7454, 1729.0511, 2050.766, 2159.1936, 1837.1351, 1340.5314, 1464.1045]
2025-08-07 03:52:44,425 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:52:44,437 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 15 minutes, 51 seconds)
2025-08-07 03:54:27,739 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:54:43,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1865.70544 ± 427.173
2025-08-07 03:54:43,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1856.65, 1468.958, 1673.0227, 1536.7037, 1979.7584, 2640.5535, 1540.5385, 1551.2554, 1718.0232, 2691.5898]
2025-08-07 03:54:43,447 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:54:43,457 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 13 minutes, 53 seconds)
2025-08-07 03:56:26,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:56:42,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1417.65894 ± 314.810
2025-08-07 03:56:42,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1158.0292, 1132.3766, 940.6717, 1600.3538, 1857.4498, 1732.448, 1424.3723, 1262.7354, 1878.8264, 1189.3265]
2025-08-07 03:56:42,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:56:42,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 11 minutes, 54 seconds)
2025-08-07 03:58:25,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:58:41,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1577.24731 ± 615.587
2025-08-07 03:58:41,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1102.7725, 2169.8132, 1122.1378, 1122.5267, 1343.5725, 1378.1088, 1782.7109, 1357.098, 1236.0028, 3157.73]
2025-08-07 03:58:41,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:58:41,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes, 55 seconds)
2025-08-07 04:00:24,994 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:00:40,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1655.77673 ± 459.060
2025-08-07 04:00:40,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1785.8319, 1409.6489, 2531.835, 1518.7577, 1296.9114, 1220.2931, 1430.6995, 1788.4049, 2428.8381, 1146.547]
2025-08-07 04:00:40,723 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:00:40,732 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 56 seconds)
2025-08-07 04:02:24,144 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:02:39,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1628.61731 ± 315.745
2025-08-07 04:02:39,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1362.7571, 1968.892, 1534.0001, 1905.2572, 1411.3511, 1594.2719, 2174.5908, 1394.8962, 1100.1926, 1839.9635]
2025-08-07 04:02:39,776 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:02:39,786 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 57 seconds)
2025-08-07 04:04:23,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:04:38,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1343.29370 ± 519.464
2025-08-07 04:04:38,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1375.518, 1230.5082, 1474.6431, 28.19182, 1842.464, 1412.2994, 1141.0121, 1156.6626, 1984.7538, 1786.885]
2025-08-07 04:04:38,910 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:04:38,919 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 58 seconds)
2025-08-07 04:06:22,262 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:06:37,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1529.79822 ± 278.099
2025-08-07 04:06:37,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1534.9305, 1481.6669, 1167.4633, 1649.8907, 1263.7814, 1697.6511, 1122.1154, 2107.0095, 1591.1819, 1682.2917]
2025-08-07 04:06:37,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:06:37,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 59 seconds)
2025-08-07 04:08:21,205 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:08:36,721 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1221 [DEBUG]: Total Reward: 1490.75354 ± 334.634
2025-08-07 04:08:36,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1222 [DEBUG]: All rewards: [1544.3295, 1268.9352, 1264.7734, 2352.1777, 1153.9325, 1175.4988, 1616.2902, 1631.14, 1557.5199, 1342.9381]
2025-08-07 04:08:36,722 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:08:36,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc10-halfcheetah):1251 [DEBUG]: Training session finished
