2025-08-07 00:47:38,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc5-halfcheetah/ExtremeSparseL4U32-bpql-mem32
2025-08-07 00:47:38,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc5-halfcheetah/ExtremeSparseL4U32-bpql-mem32
2025-08-07 00:47:38,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x1459ed179550>}
2025-08-07 00:47:38,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1111 [DEBUG]: using device: cuda
2025-08-07 00:47:38,984 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1133 [INFO]: Creating new trainer
2025-08-07 00:47:39,001 baseline-bpql-noiseperc5-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=209, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 00:47:39,001 baseline-bpql-noiseperc5-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 00:47:40,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1194 [DEBUG]: Starting training session...
2025-08-07 00:47:40,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 1/100
2025-08-07 00:49:18,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:49:33,760 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: -499.55460 ± 105.852
2025-08-07 00:49:33,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [-567.3977, -518.25757, -563.2075, -520.6044, -236.95433, -563.96686, -541.11926, -476.50308, -387.1556, -620.3792]
2025-08-07 00:49:33,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 00:49:33,765 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (-499.55) for latency ExtremeSparseL4U32
2025-08-07 00:49:33,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 7 minutes, 27 seconds)
2025-08-07 00:51:17,410 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:51:32,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: -233.97488 ± 30.791
2025-08-07 00:51:32,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [-215.15099, -244.86635, -237.84744, -197.76137, -243.20625, -169.97574, -266.87164, -252.16235, -279.63016, -232.27641]
2025-08-07 00:51:32,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 00:51:32,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (-233.97) for latency ExtremeSparseL4U32
2025-08-07 00:51:32,976 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 10 minutes, 6 seconds)
2025-08-07 00:53:16,736 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:53:32,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: -180.40631 ± 98.009
2025-08-07 00:53:32,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [35.05391, -109.747345, -59.6045, -247.02861, -249.00362, -266.0275, -217.98714, -265.04095, -250.79808, -173.87929]
2025-08-07 00:53:32,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 00:53:32,199 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (-180.41) for latency ExtremeSparseL4U32
2025-08-07 00:53:32,218 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 9 minutes, 42 seconds)
2025-08-07 00:55:16,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:55:31,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: -124.09536 ± 76.903
2025-08-07 00:55:31,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [-137.12236, -51.34618, -101.777985, -251.8174, -203.36014, -181.3382, -34.5534, -83.572136, -7.7828846, -188.28299]
2025-08-07 00:55:31,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 00:55:31,588 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (-124.10) for latency ExtremeSparseL4U32
2025-08-07 00:55:31,622 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 8 minutes, 34 seconds)
2025-08-07 00:57:15,480 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:57:30,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 86.24815 ± 157.156
2025-08-07 00:57:30,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [249.15631, 244.16412, 108.72905, -157.51836, -73.89264, -95.875435, 305.3773, 7.313118, 44.62269, 230.40541]
2025-08-07 00:57:30,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 00:57:30,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (86.25) for latency ExtremeSparseL4U32
2025-08-07 00:57:30,973 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 7 minutes, 4 seconds)
2025-08-07 00:59:14,920 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:59:30,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 45.68433 ± 89.593
2025-08-07 00:59:30,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [177.143, 112.00824, -87.146065, 186.19447, 7.7036066, 8.730523, -2.483294, -58.085052, 6.0078096, 106.770096]
2025-08-07 00:59:30,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 00:59:30,515 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 6 minutes, 58 seconds)
2025-08-07 01:01:14,578 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:01:30,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 256.08710 ± 237.869
2025-08-07 01:01:30,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [380.0176, 370.0824, 321.87418, 193.78183, 191.02304, -420.31155, 332.377, 432.1124, 382.47272, 377.44165]
2025-08-07 01:01:30,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:01:30,042 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (256.09) for latency ExtremeSparseL4U32
2025-08-07 01:01:30,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 5 minutes, 5 seconds)
2025-08-07 01:03:14,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:03:29,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 377.15094 ± 288.070
2025-08-07 01:03:29,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [140.47916, 617.44775, 342.53674, 449.62952, -262.43518, 156.38521, 677.2809, 331.18912, 649.5592, 669.43726]
2025-08-07 01:03:29,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:03:29,645 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (377.15) for latency ExtremeSparseL4U32
2025-08-07 01:03:29,657 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 3 minutes, 12 seconds)
2025-08-07 01:05:13,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:05:28,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 825.37488 ± 143.141
2025-08-07 01:05:28,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [882.35406, 807.91156, 773.4343, 976.5654, 872.2673, 922.29395, 876.7154, 453.42105, 950.42053, 738.3656]
2025-08-07 01:05:28,950 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:05:28,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (825.37) for latency ExtremeSparseL4U32
2025-08-07 01:05:28,977 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 3 hours, 1 minute, 11 seconds)
2025-08-07 01:07:11,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:07:27,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 965.46582 ± 60.692
2025-08-07 01:07:27,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [929.681, 969.66534, 881.34, 1000.3194, 910.49115, 924.7828, 1014.1387, 909.10645, 1062.8883, 1052.2456]
2025-08-07 01:07:27,241 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:07:27,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (965.47) for latency ExtremeSparseL4U32
2025-08-07 01:07:27,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 58 minutes, 52 seconds)
2025-08-07 01:09:09,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:09:25,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 921.45007 ± 132.547
2025-08-07 01:09:25,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [866.21155, 999.6988, 798.697, 1047.418, 1060.4202, 952.5034, 980.4424, 1025.5177, 606.8154, 876.7767]
2025-08-07 01:09:25,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:09:25,451 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 56 minutes, 29 seconds)
2025-08-07 01:11:07,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:11:23,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 968.78796 ± 68.857
2025-08-07 01:11:23,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [969.7953, 928.4096, 960.1448, 935.05725, 934.05237, 992.22003, 977.9879, 1149.3917, 868.3004, 972.5203]
2025-08-07 01:11:23,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:11:23,200 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (968.79) for latency ExtremeSparseL4U32
2025-08-07 01:11:23,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 53 minutes, 59 seconds)
2025-08-07 01:13:05,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:13:20,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 975.67297 ± 335.409
2025-08-07 01:13:20,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [133.17393, 1073.5127, 1044.5554, 1036.0497, 1578.6046, 900.01697, 985.9495, 1045.85, 1086.5485, 872.4689]
2025-08-07 01:13:20,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:13:20,766 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (975.67) for latency ExtremeSparseL4U32
2025-08-07 01:13:20,784 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 51 minutes, 25 seconds)
2025-08-07 01:15:02,858 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:15:18,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1003.62488 ± 125.545
2025-08-07 01:15:18,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [749.17865, 1113.0405, 991.37714, 971.32855, 908.005, 953.7098, 1076.1626, 1014.4396, 1004.9962, 1254.0105]
2025-08-07 01:15:18,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:15:18,257 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (1003.62) for latency ExtremeSparseL4U32
2025-08-07 01:15:18,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 48 minutes, 55 seconds)
2025-08-07 01:17:00,894 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:17:16,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1011.11963 ± 61.254
2025-08-07 01:17:16,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1173.6809, 984.7244, 1019.2037, 1028.674, 959.8663, 954.8683, 1029.6792, 954.3887, 986.6225, 1019.4881]
2025-08-07 01:17:16,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:17:16,366 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (1011.12) for latency ExtremeSparseL4U32
2025-08-07 01:17:16,400 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 46 minutes, 55 seconds)
2025-08-07 01:18:59,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:19:14,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1207.55103 ± 201.215
2025-08-07 01:19:14,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1494.7947, 1399.0709, 1079.3308, 1361.274, 1037.1366, 983.06384, 1019.75604, 1046.7288, 1522.6622, 1131.6923]
2025-08-07 01:19:14,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:19:14,601 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (1207.55) for latency ExtremeSparseL4U32
2025-08-07 01:19:14,617 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 44 minutes, 57 seconds)
2025-08-07 01:20:57,313 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:21:12,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1156.99573 ± 177.522
2025-08-07 01:21:12,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1029.9016, 1033.3125, 1068.6234, 1212.6995, 1578.0513, 1083.8923, 1199.4121, 1009.4347, 996.004, 1358.6261]
2025-08-07 01:21:12,824 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:21:12,834 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 43 minutes, 7 seconds)
2025-08-07 01:22:55,428 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:23:10,953 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1180.73059 ± 483.709
2025-08-07 01:23:10,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1098.1334, 1638.505, -160.66547, 1628.3104, 1269.1414, 1457.9031, 1129.4879, 1367.3296, 1162.6504, 1216.5109]
2025-08-07 01:23:10,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:23:10,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 41 minutes, 18 seconds)
2025-08-07 01:24:53,624 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:25:09,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1260.96204 ± 250.732
2025-08-07 01:25:09,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1071.2865, 1455.7932, 1033.1654, 1145.4656, 1265.0201, 1146.6293, 1748.3855, 1585.9897, 896.0164, 1261.8688]
2025-08-07 01:25:09,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:25:09,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (1260.96) for latency ExtremeSparseL4U32
2025-08-07 01:25:09,086 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 39 minutes, 31 seconds)
2025-08-07 01:26:51,817 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:27:07,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1275.18042 ± 128.567
2025-08-07 01:27:07,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1122.4845, 1343.8602, 1271.4615, 1161.7533, 1525.175, 1279.6691, 1078.8346, 1418.928, 1311.2179, 1238.4207]
2025-08-07 01:27:07,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:27:07,190 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (1275.18) for latency ExtremeSparseL4U32
2025-08-07 01:27:07,213 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 37 minutes, 33 seconds)
2025-08-07 01:28:49,891 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:29:05,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1223.81226 ± 213.058
2025-08-07 01:29:05,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [990.5401, 1770.3247, 1110.1384, 1270.664, 1032.1046, 1342.1444, 1129.9014, 1303.099, 1205.1118, 1084.0948]
2025-08-07 01:29:05,321 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:29:05,340 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 35 minutes, 33 seconds)
2025-08-07 01:30:48,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:31:03,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1340.51001 ± 187.010
2025-08-07 01:31:03,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1177.2712, 1336.5391, 1484.2827, 1714.5317, 1524.2554, 1137.5645, 1451.0902, 1139.6259, 1274.3347, 1165.6047]
2025-08-07 01:31:03,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:31:03,517 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (1340.51) for latency ExtremeSparseL4U32
2025-08-07 01:31:03,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 33 minutes, 34 seconds)
2025-08-07 01:32:45,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:33:01,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1147.71118 ± 128.527
2025-08-07 01:33:01,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [894.8377, 1158.3873, 1077.5387, 1012.5235, 1208.9861, 1223.542, 1096.4232, 1359.7678, 1153.1726, 1291.934]
2025-08-07 01:33:01,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:33:01,215 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 31 minutes, 29 seconds)
2025-08-07 01:34:43,229 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:34:58,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1256.19019 ± 206.533
2025-08-07 01:34:58,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1436.5309, 1144.6971, 763.57025, 1375.6035, 1533.6156, 1375.4192, 1165.4751, 1382.8428, 1169.7471, 1214.4004]
2025-08-07 01:34:58,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:34:58,543 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 29 minutes, 19 seconds)
2025-08-07 01:36:40,535 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:36:55,805 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1230.47290 ± 128.251
2025-08-07 01:36:55,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1269.4431, 1164.7294, 1197.3795, 1285.4333, 1224.1249, 1559.8387, 1029.9594, 1167.3265, 1190.4032, 1216.0906]
2025-08-07 01:36:55,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:36:55,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 27 minutes, 9 seconds)
2025-08-07 01:38:37,825 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:38:53,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1319.71973 ± 179.498
2025-08-07 01:38:53,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1566.5016, 1139.222, 1196.18, 1530.1877, 1210.7495, 1614.7424, 1181.5736, 1284.4847, 1102.318, 1371.2377]
2025-08-07 01:38:53,099 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:38:53,122 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 24 minutes, 59 seconds)
2025-08-07 01:40:35,183 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:40:50,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1401.65979 ± 183.785
2025-08-07 01:40:50,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1761.0107, 1132.3864, 1413.5043, 1255.0322, 1262.8623, 1501.0089, 1448.2406, 1600.2405, 1445.2273, 1197.0851]
2025-08-07 01:40:50,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:40:50,583 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (1401.66) for latency ExtremeSparseL4U32
2025-08-07 01:40:50,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 22 minutes, 51 seconds)
2025-08-07 01:42:32,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:42:47,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1357.99768 ± 298.564
2025-08-07 01:42:47,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1136.8973, 1785.7834, 1287.8666, 1246.7717, 2073.1929, 1190.4604, 1240.8989, 1312.5651, 1116.4193, 1189.1226]
2025-08-07 01:42:47,836 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:42:47,866 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 20 minutes, 47 seconds)
2025-08-07 01:44:29,946 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:44:45,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1151.33960 ± 355.514
2025-08-07 01:44:45,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1202.2148, 1170.3832, 1257.0953, 1164.6185, 1123.254, 1548.3224, 1287.0945, 1113.9282, 1477.4904, 168.99371]
2025-08-07 01:44:45,184 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:44:45,210 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 18 minutes, 50 seconds)
2025-08-07 01:46:27,273 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:46:42,665 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1349.19226 ± 164.892
2025-08-07 01:46:42,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1195.4976, 1356.161, 1272.3646, 1700.2095, 1266.359, 1519.1393, 1128.8674, 1199.918, 1442.8424, 1410.5631]
2025-08-07 01:46:42,666 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:46:42,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 16 minutes, 56 seconds)
2025-08-07 01:48:24,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:48:40,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1419.04016 ± 257.060
2025-08-07 01:48:40,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1342.7664, 1173.209, 1442.8629, 1218.0734, 1373.0857, 2135.724, 1247.0765, 1453.011, 1459.5503, 1345.0424]
2025-08-07 01:48:40,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:48:40,056 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (1419.04) for latency ExtremeSparseL4U32
2025-08-07 01:48:40,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 14 minutes, 59 seconds)
2025-08-07 01:50:22,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:50:37,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1416.06909 ± 403.878
2025-08-07 01:50:37,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [919.3699, 1063.3773, 1122.4231, 1142.9052, 1538.1958, 1328.3677, 2394.0461, 1395.1705, 1506.8414, 1749.9933]
2025-08-07 01:50:37,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:50:37,584 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 13 minutes, 3 seconds)
2025-08-07 01:52:19,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:52:34,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1338.74414 ± 166.980
2025-08-07 01:52:34,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1203.3853, 1508.1223, 1411.594, 1466.5634, 1104.5948, 1199.4913, 1273.7057, 1307.4054, 1683.342, 1229.2375]
2025-08-07 01:52:34,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:52:35,004 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 11 minutes, 7 seconds)
2025-08-07 01:54:17,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:54:32,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1381.30029 ± 308.558
2025-08-07 01:54:32,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1494.0978, 1192.8757, 1147.291, 1322.0581, 1501.1083, 2236.1597, 1198.5638, 1179.4597, 1240.3431, 1301.0457]
2025-08-07 01:54:32,498 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:54:32,528 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 9 minutes, 12 seconds)
2025-08-07 01:56:14,615 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:56:29,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1519.09985 ± 340.412
2025-08-07 01:56:29,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1320.1036, 1506.9319, 1393.7238, 1993.8099, 1360.8229, 1159.1176, 2244.172, 1728.0939, 1257.8197, 1226.4028]
2025-08-07 01:56:29,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:56:29,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (1519.10) for latency ExtremeSparseL4U32
2025-08-07 01:56:30,005 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 7 minutes, 15 seconds)
2025-08-07 01:58:12,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:58:27,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1432.47583 ± 335.362
2025-08-07 01:58:27,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1395.5577, 1126.6813, 1314.1616, 2309.5823, 1107.8336, 1234.6888, 1430.4753, 1584.1826, 1213.292, 1608.3021]
2025-08-07 01:58:27,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:58:27,422 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 5 minutes, 18 seconds)
2025-08-07 02:00:09,392 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:00:24,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1588.60144 ± 340.354
2025-08-07 02:00:24,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1224.11, 2016.7654, 1612.6842, 1801.7837, 1808.8975, 2120.534, 1160.5222, 1697.6227, 1229.47, 1213.6251]
2025-08-07 02:00:24,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:00:24,706 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (1588.60) for latency ExtremeSparseL4U32
2025-08-07 02:00:24,715 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 3 minutes, 17 seconds)
2025-08-07 02:02:06,701 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:02:22,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1643.67944 ± 382.915
2025-08-07 02:02:22,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1317.5449, 1450.985, 1499.7611, 1209.2885, 2480.5645, 1587.6859, 2200.9985, 1719.7983, 1338.2471, 1631.9209]
2025-08-07 02:02:22,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:02:22,107 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (1643.68) for latency ExtremeSparseL4U32
2025-08-07 02:02:22,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 1 minute, 20 seconds)
2025-08-07 02:04:04,193 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:04:19,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1475.45068 ± 460.428
2025-08-07 02:04:19,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1210.9215, 1281.1588, 1619.1581, 1503.2253, 1941.3217, 1166.515, 1243.1868, 2608.9114, 944.147, 1235.9615]
2025-08-07 02:04:19,570 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:04:19,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 59 minutes, 22 seconds)
2025-08-07 02:06:01,608 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:06:17,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1454.29993 ± 257.021
2025-08-07 02:06:17,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1290.8378, 1615.6157, 1209.1674, 1777.6193, 1190.8093, 1210.0808, 1528.4355, 1545.2404, 1223.6892, 1951.5029]
2025-08-07 02:06:17,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:06:17,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 57 minutes, 24 seconds)
2025-08-07 02:07:59,082 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:08:14,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1486.96704 ± 190.715
2025-08-07 02:08:14,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1570.3552, 1468.9155, 1714.7119, 1163.5823, 1624.0842, 1220.8998, 1539.3035, 1785.7324, 1391.0343, 1391.0525]
2025-08-07 02:08:14,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:08:14,491 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 55 minutes, 27 seconds)
2025-08-07 02:09:56,467 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:10:11,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1585.50122 ± 388.753
2025-08-07 02:10:11,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1310.4426, 1361.1337, 1386.2596, 2371.04, 1752.2919, 1537.6891, 2245.4153, 1212.2726, 1305.0784, 1373.3899]
2025-08-07 02:10:11,787 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:10:11,816 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 53 minutes, 30 seconds)
2025-08-07 02:11:53,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:12:09,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1399.33435 ± 256.105
2025-08-07 02:12:09,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1200.3704, 1519.6531, 1410.1707, 1989.395, 1695.6443, 1177.8145, 1174.3602, 1370.2434, 1171.2788, 1284.4143]
2025-08-07 02:12:09,308 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:12:09,333 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 51 minutes, 34 seconds)
2025-08-07 02:13:51,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:14:06,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1648.09802 ± 418.691
2025-08-07 02:14:06,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1856.8008, 1201.798, 1435.1505, 1489.6068, 1198.3026, 1368.1382, 1358.9554, 2423.6853, 1838.9698, 2309.5725]
2025-08-07 02:14:06,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:14:06,790 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (1648.10) for latency ExtremeSparseL4U32
2025-08-07 02:14:06,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 49 minutes, 36 seconds)
2025-08-07 02:15:48,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:16:04,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1654.67383 ± 385.051
2025-08-07 02:16:04,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [2141.9944, 2468.437, 1356.2999, 1351.7904, 1271.3158, 1235.0189, 1892.1326, 1505.3794, 1674.9514, 1649.417]
2025-08-07 02:16:04,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:16:04,167 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (1654.67) for latency ExtremeSparseL4U32
2025-08-07 02:16:04,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 47 minutes, 38 seconds)
2025-08-07 02:17:46,276 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:18:01,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1529.04370 ± 422.527
2025-08-07 02:18:01,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1677.0325, 1382.7935, 1371.0946, 1194.8718, 1842.9235, 2637.9045, 1193.4738, 1295.1398, 1199.0542, 1496.1508]
2025-08-07 02:18:01,656 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:18:01,708 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 45 minutes, 41 seconds)
2025-08-07 02:19:43,798 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:19:59,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1813.05103 ± 526.192
2025-08-07 02:19:59,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1383.5975, 2765.7817, 1383.769, 1207.1337, 2390.2102, 2495.0156, 1349.0385, 1559.7386, 1951.6112, 1644.6128]
2025-08-07 02:19:59,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:19:59,073 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (1813.05) for latency ExtremeSparseL4U32
2025-08-07 02:19:59,091 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 43 minutes, 45 seconds)
2025-08-07 02:21:41,127 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:21:56,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1486.51770 ± 529.707
2025-08-07 02:21:56,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1968.0656, 1911.9543, 2245.4211, 1304.4182, 1405.0801, 1450.0796, 1672.8982, 1207.7612, 196.84605, 1502.6521]
2025-08-07 02:21:56,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:21:56,433 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 41 minutes, 45 seconds)
2025-08-07 02:23:38,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:23:53,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1509.82275 ± 337.936
2025-08-07 02:23:53,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1527.9674, 1358.3491, 1238.658, 2146.4214, 2027.8275, 1197.5227, 1553.299, 1693.062, 1083.1615, 1271.9596]
2025-08-07 02:23:53,783 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:23:53,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 39 minutes, 47 seconds)
2025-08-07 02:25:35,913 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:25:51,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1685.43787 ± 311.864
2025-08-07 02:25:51,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1796.412, 1508.4103, 1530.0984, 2481.7788, 1356.9858, 1808.527, 1314.2548, 1716.9707, 1605.6808, 1735.2606]
2025-08-07 02:25:51,317 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:25:51,326 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 37 minutes, 51 seconds)
2025-08-07 02:27:33,378 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:27:48,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1779.18140 ± 529.741
2025-08-07 02:27:48,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1602.8796, 2115.332, 1251.9932, 1337.9216, 2006.4639, 1154.0941, 2206.2703, 2002.3925, 2881.943, 1232.5231]
2025-08-07 02:27:48,775 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:27:48,791 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 35 minutes, 53 seconds)
2025-08-07 02:29:30,804 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:29:46,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1646.70801 ± 411.325
2025-08-07 02:29:46,242 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1630.4298, 2414.5852, 1612.7982, 2317.457, 1347.1224, 1210.0673, 1160.9995, 1315.7698, 1804.9719, 1652.8789]
2025-08-07 02:29:46,243 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:29:46,261 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 33 minutes, 56 seconds)
2025-08-07 02:31:28,250 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:31:43,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1478.85449 ± 218.988
2025-08-07 02:31:43,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1617.9645, 1169.7555, 1189.063, 1394.4781, 1668.8414, 1517.947, 1429.5134, 1350.4347, 1506.672, 1943.875]
2025-08-07 02:31:43,562 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:31:43,573 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 31 minutes, 59 seconds)
2025-08-07 02:33:25,668 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:33:41,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1462.93042 ± 268.164
2025-08-07 02:33:41,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1545.1981, 1920.2064, 1149.3754, 1494.9076, 1548.0042, 1160.6788, 1376.7329, 1910.309, 1169.8916, 1354.0]
2025-08-07 02:33:41,064 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:33:41,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 30 minutes, 2 seconds)
2025-08-07 02:35:23,139 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:35:38,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1647.20862 ± 544.154
2025-08-07 02:35:38,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1158.8538, 1782.8153, 1287.4666, 1744.4187, 1348.6226, 1258.268, 1908.3555, 3099.0928, 1262.4358, 1621.757]
2025-08-07 02:35:38,538 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:35:38,554 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 28 minutes, 5 seconds)
2025-08-07 02:37:20,565 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:37:35,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1393.85278 ± 196.021
2025-08-07 02:37:35,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1436.5845, 1487.7744, 1592.4835, 1204.6006, 1262.7649, 1127.7704, 1519.268, 1762.1187, 1394.1964, 1150.9669]
2025-08-07 02:37:35,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:37:36,003 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 26 minutes, 7 seconds)
2025-08-07 02:39:18,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:39:33,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1518.37964 ± 252.114
2025-08-07 02:39:33,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1295.4465, 1646.325, 1285.4551, 1246.0121, 1221.1565, 1977.16, 1874.29, 1474.3247, 1634.8494, 1528.7761]
2025-08-07 02:39:33,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:39:33,500 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 24 minutes, 10 seconds)
2025-08-07 02:41:15,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:41:31,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1564.83252 ± 338.122
2025-08-07 02:41:31,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1822.0958, 1264.2809, 2028.4467, 1258.7502, 2141.2998, 1208.8806, 1507.3999, 1827.7627, 1313.9458, 1275.4636]
2025-08-07 02:41:31,089 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:41:31,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 22 minutes, 15 seconds)
2025-08-07 02:43:13,212 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:43:28,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1877.17346 ± 334.124
2025-08-07 02:43:28,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1676.683, 1675.3744, 2377.619, 2513.4268, 1507.2356, 1441.007, 1953.2358, 1765.1245, 1810.653, 2051.3748]
2025-08-07 02:43:28,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:43:28,610 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (1877.17) for latency ExtremeSparseL4U32
2025-08-07 02:43:28,654 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 20 minutes, 18 seconds)
2025-08-07 02:45:10,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:45:26,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1619.60291 ± 269.609
2025-08-07 02:45:26,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1793.676, 1198.0641, 1447.0212, 1541.1791, 1865.9271, 2049.349, 1427.063, 1341.8439, 1565.2761, 1966.6302]
2025-08-07 02:45:26,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:45:26,105 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 18 minutes, 20 seconds)
2025-08-07 02:47:08,168 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:47:23,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1743.97498 ± 452.986
2025-08-07 02:47:23,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [2080.0574, 2767.7473, 1325.1038, 1277.9197, 1349.0514, 1797.5846, 1419.8153, 2124.6624, 1460.0583, 1837.7496]
2025-08-07 02:47:23,577 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:47:23,626 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 16 minutes, 23 seconds)
2025-08-07 02:49:05,687 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:49:21,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1738.80273 ± 546.291
2025-08-07 02:49:21,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1322.889, 1908.6903, 1300.0654, 1424.6825, 1614.8679, 2433.1167, 1758.2551, 3010.776, 1266.7871, 1347.8965]
2025-08-07 02:49:21,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:49:21,110 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 14 minutes, 25 seconds)
2025-08-07 02:51:03,163 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:51:18,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1717.11292 ± 377.788
2025-08-07 02:51:18,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1531.0745, 2036.386, 1403.0232, 1666.7327, 1974.5659, 2505.8613, 1981.4192, 1396.8124, 1480.1544, 1195.1]
2025-08-07 02:51:18,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:51:18,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 12 minutes, 26 seconds)
2025-08-07 02:53:00,524 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:53:15,837 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1799.25940 ± 496.064
2025-08-07 02:53:15,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [2661.0618, 2528.935, 1390.0935, 1530.2826, 1508.0376, 1858.467, 2051.6626, 2062.5632, 1264.4131, 1137.0782]
2025-08-07 02:53:15,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:53:15,875 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 10 minutes, 27 seconds)
2025-08-07 02:54:58,031 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:55:13,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1649.34546 ± 397.058
2025-08-07 02:55:13,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1467.4543, 1336.047, 1920.2035, 2704.3855, 1422.0698, 1265.5035, 1671.8431, 1479.5874, 1731.3147, 1495.0475]
2025-08-07 02:55:13,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:55:13,448 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 8 minutes, 31 seconds)
2025-08-07 02:56:55,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:57:10,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1631.40295 ± 267.969
2025-08-07 02:57:10,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1705.8784, 1359.5758, 1515.4749, 1644.6359, 2141.9712, 1364.6136, 1747.7542, 1969.408, 1639.8975, 1224.8188]
2025-08-07 02:57:10,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:57:10,814 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 6 minutes, 32 seconds)
2025-08-07 02:58:52,882 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:59:08,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1798.24731 ± 486.224
2025-08-07 02:59:08,149 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [2088.1057, 1398.3126, 2337.4128, 1641.4476, 1652.3535, 1255.0605, 1587.7566, 1430.8917, 1657.3467, 2933.7842]
2025-08-07 02:59:08,150 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:59:08,178 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 4 minutes, 34 seconds)
2025-08-07 03:00:50,267 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:01:05,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1612.30444 ± 402.978
2025-08-07 03:01:05,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1586.1648, 1275.8676, 1346.7877, 1401.0437, 1451.1594, 1336.4048, 2431.836, 2369.6511, 1426.8986, 1497.2295]
2025-08-07 03:01:05,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:01:05,546 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 2 minutes, 36 seconds)
2025-08-07 03:02:47,633 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:03:02,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1728.37183 ± 416.141
2025-08-07 03:03:02,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1702.3844, 1660.8098, 1429.6384, 2891.1526, 1693.2592, 1648.4357, 1331.5227, 1525.9332, 1503.4143, 1897.1678]
2025-08-07 03:03:02,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:03:02,966 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 39 seconds)
2025-08-07 03:04:45,087 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:05:00,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2159.01489 ± 675.885
2025-08-07 03:05:00,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1290.3788, 1736.76, 2338.7358, 3311.2146, 1734.5355, 1603.9642, 2427.8994, 1376.2667, 3039.43, 2730.9626]
2025-08-07 03:05:00,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:05:00,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (2159.01) for latency ExtremeSparseL4U32
2025-08-07 03:05:00,521 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 58 minutes, 42 seconds)
2025-08-07 03:06:42,599 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:06:57,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1701.91992 ± 831.476
2025-08-07 03:06:57,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1519.1115, 3282.4358, 1783.7997, 1197.4626, 1438.9987, 2125.3455, 1740.071, 1237.2231, 2651.9214, 42.82964]
2025-08-07 03:06:57,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:06:58,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 56 minutes, 45 seconds)
2025-08-07 03:08:40,076 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:08:55,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2041.77307 ± 732.863
2025-08-07 03:08:55,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1708.2965, 3539.0486, 1439.3458, 1619.0966, 3399.1985, 1803.5142, 1782.7157, 1971.8838, 1768.9705, 1385.6598]
2025-08-07 03:08:55,307 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:08:55,327 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 54 minutes, 48 seconds)
2025-08-07 03:10:37,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:10:52,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2078.89282 ± 892.624
2025-08-07 03:10:52,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1456.1514, 2096.0498, 1169.9948, 3816.8728, 2458.3025, 1839.8003, 3604.6794, 1524.9451, 1360.9923, 1461.1412]
2025-08-07 03:10:52,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:10:52,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 52 minutes, 51 seconds)
2025-08-07 03:12:34,795 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:12:50,045 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2016.36267 ± 645.742
2025-08-07 03:12:50,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [2288.5142, 1397.1034, 1338.4102, 1783.5651, 2437.6538, 1635.4569, 3273.9646, 2866.8176, 1866.4442, 1275.7001]
2025-08-07 03:12:50,046 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:12:50,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 50 minutes, 52 seconds)
2025-08-07 03:14:32,114 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:14:47,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1900.54724 ± 661.868
2025-08-07 03:14:47,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1248.5509, 1550.2131, 2765.3982, 2309.1348, 2661.125, 1388.6213, 2994.1462, 1267.5576, 1420.5369, 1400.188]
2025-08-07 03:14:47,512 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:14:47,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 48 minutes, 55 seconds)
2025-08-07 03:16:29,560 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:16:44,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2318.21655 ± 701.112
2025-08-07 03:16:44,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1417.9928, 1741.637, 2321.0046, 3325.5183, 1902.3978, 3333.049, 2057.9248, 2573.7896, 3095.2793, 1413.5736]
2025-08-07 03:16:44,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:16:44,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (2318.22) for latency ExtremeSparseL4U32
2025-08-07 03:16:44,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 46 minutes, 57 seconds)
2025-08-07 03:18:26,963 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:18:42,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1782.68945 ± 409.448
2025-08-07 03:18:42,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1375.9879, 2040.8824, 1662.245, 1546.3002, 1367.1875, 2116.2747, 1931.8538, 1337.3103, 1734.928, 2713.9268]
2025-08-07 03:18:42,362 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:18:42,377 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 45 minutes)
2025-08-07 03:20:24,291 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:20:39,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2096.95581 ± 740.229
2025-08-07 03:20:39,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1489.3057, 1511.9813, 1390.0911, 1853.7498, 2631.2373, 1686.8955, 1195.9485, 2810.4104, 3226.7996, 3173.1377]
2025-08-07 03:20:39,625 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:20:39,661 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 43 minutes, 2 seconds)
2025-08-07 03:22:21,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:22:37,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1867.86292 ± 462.799
2025-08-07 03:22:37,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1684.33, 1651.1683, 1455.6974, 3076.0999, 1686.1652, 2282.5427, 1752.1896, 1525.2001, 1984.074, 1581.1616]
2025-08-07 03:22:37,016 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:22:37,028 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 41 minutes, 5 seconds)
2025-08-07 03:24:19,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:24:34,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1868.96484 ± 656.044
2025-08-07 03:24:34,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1289.0853, 1608.7731, 1380.8643, 2629.9055, 2668.8367, 3129.617, 1350.0444, 1964.1665, 1375.0101, 1293.3467]
2025-08-07 03:24:34,352 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:24:34,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 39 minutes, 7 seconds)
2025-08-07 03:26:16,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:26:31,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1908.65759 ± 801.739
2025-08-07 03:26:31,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3232.091, 1255.3276, 2080.788, 1254.0865, 1313.9124, 1984.8362, 1343.9237, 1343.8485, 1709.4893, 3568.2732]
2025-08-07 03:26:31,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:26:31,612 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 37 minutes, 9 seconds)
2025-08-07 03:28:13,686 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:28:28,932 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1741.97290 ± 606.169
2025-08-07 03:28:28,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1774.9282, 1821.0676, 3494.7058, 1644.3959, 1443.467, 1437.6848, 1276.8595, 1408.3922, 1627.267, 1490.9617]
2025-08-07 03:28:28,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:28:28,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 35 minutes, 11 seconds)
2025-08-07 03:30:10,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:30:26,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1984.73560 ± 802.486
2025-08-07 03:30:26,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1282.1039, 1526.3082, 2638.3467, 1896.0966, 2991.8674, 3741.8757, 1313.9437, 1511.6914, 1491.133, 1453.989]
2025-08-07 03:30:26,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:30:26,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 33 minutes, 14 seconds)
2025-08-07 03:32:08,145 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:32:23,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1747.35352 ± 865.322
2025-08-07 03:32:23,402 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3465.909, 1624.2528, 1749.0516, 625.9009, 1324.3667, 2028.4376, 2300.1511, 2279.8237, 198.87436, 1876.7683]
2025-08-07 03:32:23,403 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:32:23,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 31 minutes, 16 seconds)
2025-08-07 03:34:05,442 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:34:20,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1859.99048 ± 746.688
2025-08-07 03:34:20,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1214.5927, 1484.0552, 1244.4275, 2807.3337, 1348.7394, 1671.6558, 1685.6904, 2079.7197, 1422.1381, 3641.5525]
2025-08-07 03:34:20,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:34:20,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 29 minutes, 19 seconds)
2025-08-07 03:36:02,861 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:36:18,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2518.92334 ± 894.505
2025-08-07 03:36:18,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [2422.4077, 2885.5889, 1333.3676, 2065.7188, 3254.8843, 3762.072, 1513.1782, 3937.638, 2560.1924, 1454.1868]
2025-08-07 03:36:18,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:36:18,123 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (2518.92) for latency ExtremeSparseL4U32
2025-08-07 03:36:18,137 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 27 minutes, 22 seconds)
2025-08-07 03:38:00,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:38:15,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2103.84033 ± 742.912
2025-08-07 03:38:15,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1500.1113, 1725.4745, 2012.5333, 2017.783, 3561.1921, 1427.8876, 2105.1306, 1445.9447, 1770.6086, 3471.7354]
2025-08-07 03:38:15,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:38:15,574 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 25 minutes, 25 seconds)
2025-08-07 03:39:57,650 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:40:13,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2135.16406 ± 811.678
2025-08-07 03:40:13,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3481.248, 2626.5225, 1526.1863, 1417.9293, 2367.2585, 1756.421, 1454.2081, 1975.5266, 1179.4862, 3566.8542]
2025-08-07 03:40:13,060 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:40:13,074 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 23 minutes, 28 seconds)
2025-08-07 03:41:55,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:42:10,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1886.32129 ± 727.977
2025-08-07 03:42:10,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [2016.0781, 594.73285, 2385.1306, 2084.427, 1300.1615, 1847.6548, 1424.2075, 1835.7882, 1826.8749, 3548.158]
2025-08-07 03:42:10,429 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:42:10,450 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 21 minutes, 31 seconds)
2025-08-07 03:43:52,477 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:44:07,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2910.44800 ± 881.619
2025-08-07 03:44:07,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3557.926, 3821.92, 1614.0032, 3129.1257, 1458.2368, 3293.1836, 3816.831, 3842.7188, 2467.1516, 2103.3823]
2025-08-07 03:44:07,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:44:07,828 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1226 [INFO]: New best (2910.45) for latency ExtremeSparseL4U32
2025-08-07 03:44:07,835 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 19 minutes, 34 seconds)
2025-08-07 03:45:49,883 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:46:05,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2140.15454 ± 627.839
2025-08-07 03:46:05,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [2558.7852, 1480.5485, 2305.571, 1644.3528, 3018.5964, 1221.1, 1626.7245, 2073.107, 2255.284, 3217.476]
2025-08-07 03:46:05,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:46:05,172 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 17 minutes, 36 seconds)
2025-08-07 03:47:47,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:48:02,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2444.43750 ± 1027.753
2025-08-07 03:48:02,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1563.7709, 2318.2722, 1617.3342, 2233.4094, 3858.4573, 891.94, 3459.1682, 3370.0303, 1400.4529, 3731.5386]
2025-08-07 03:48:02,418 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:48:02,438 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 15 minutes, 38 seconds)
2025-08-07 03:49:44,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:49:59,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1968.18909 ± 717.718
2025-08-07 03:49:59,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1538.9869, 1925.4556, 1874.981, 1181.9042, 3650.4294, 1792.7343, 2380.318, 2669.3872, 1408.6714, 1259.0212]
2025-08-07 03:49:59,864 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:49:59,884 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 13 minutes, 41 seconds)
2025-08-07 03:51:41,926 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:51:57,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2118.13062 ± 557.214
2025-08-07 03:51:57,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1688.1758, 1550.8075, 1875.6194, 2667.6392, 3401.3416, 2386.4749, 1700.461, 1549.0266, 2101.6619, 2260.0981]
2025-08-07 03:51:57,314 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:51:57,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 11 minutes, 44 seconds)
2025-08-07 03:53:39,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:53:54,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1737.92017 ± 437.916
2025-08-07 03:53:54,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [1466.883, 1150.8333, 1845.7334, 1537.2115, 2498.1313, 1587.8754, 1429.9066, 2555.2126, 1841.0636, 1466.3514]
2025-08-07 03:53:54,646 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:53:54,675 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes, 46 seconds)
2025-08-07 03:55:36,614 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:55:51,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2068.75928 ± 725.271
2025-08-07 03:55:51,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [2690.4695, 1861.4233, 1599.9902, 2593.863, 1323.4685, 2682.8018, 1379.2507, 1221.8871, 3531.441, 1802.9958]
2025-08-07 03:55:51,878 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:55:51,899 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 49 seconds)
2025-08-07 03:57:33,971 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:57:49,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2507.77197 ± 1282.088
2025-08-07 03:57:49,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3619.246, 3667.228, 1380.4445, 3447.8755, 3164.7427, 3539.741, 1644.5806, 561.38983, 3640.1284, 412.34274]
2025-08-07 03:57:49,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:57:49,398 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 52 seconds)
2025-08-07 03:59:31,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:59:46,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2204.69434 ± 1101.571
2025-08-07 03:59:46,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [2441.1118, 1372.2231, 3928.3545, 1298.8895, 1272.9136, 1378.5582, 1451.8645, 3848.867, 1394.4835, 3659.6785]
2025-08-07 03:59:46,793 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:59:46,811 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 54 seconds)
2025-08-07 04:01:28,896 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:01:44,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 1960.51929 ± 684.080
2025-08-07 04:01:44,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [3795.8394, 1481.2496, 1928.237, 2220.6318, 1567.3562, 1506.5308, 1474.4595, 1480.3107, 1793.3545, 2357.224]
2025-08-07 04:01:44,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:01:44,310 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 57 seconds)
2025-08-07 04:03:26,316 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:03:41,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1221 [DEBUG]: Total Reward: 2605.93555 ± 925.476
2025-08-07 04:03:41,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1222 [DEBUG]: All rewards: [4045.1877, 1258.9711, 1376.5939, 2189.823, 3087.49, 2489.7358, 1833.8823, 3995.4648, 3064.2554, 2717.9512]
2025-08-07 04:03:41,713 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:03:41,728 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc5-halfcheetah):1251 [DEBUG]: Training session finished
