2025-08-07 00:48:27,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noiseperc25-halfcheetah/ExtremeSparseL4U32-bpql-mem32
2025-08-07 00:48:27,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noiseperc25-halfcheetah/ExtremeSparseL4U32-bpql-mem32
2025-08-07 00:48:27,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x15274e6ec550>}
2025-08-07 00:48:27,117 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1111 [DEBUG]: using device: cuda
2025-08-07 00:48:27,143 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1133 [INFO]: Creating new trainer
2025-08-07 00:48:27,161 baseline-bpql-noiseperc25-halfcheetah:113 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=209, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-08-07 00:48:27,161 baseline-bpql-noiseperc25-halfcheetah:114 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-08-07 00:48:28,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1194 [DEBUG]: Starting training session...
2025-08-07 00:48:28,652 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 1/100
2025-08-07 00:50:07,937 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:50:23,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: -336.74078 ± 30.363
2025-08-07 00:50:23,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [-390.86407, -334.6229, -339.2661, -362.75232, -325.95807, -361.49066, -337.4994, -277.9034, -299.1897, -337.86118]
2025-08-07 00:50:23,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 00:50:23,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (-336.74) for latency ExtremeSparseL4U32
2025-08-07 00:50:23,532 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 3 hours, 9 minutes, 33 seconds)
2025-08-07 00:52:07,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:52:23,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: -294.30255 ± 63.075
2025-08-07 00:52:23,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [-275.83713, -341.0457, -343.34344, -196.14456, -287.33008, -312.19382, -341.17426, -243.81627, -401.1006, -201.03987]
2025-08-07 00:52:23,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 00:52:23,054 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (-294.30) for latency ExtremeSparseL4U32
2025-08-07 00:52:23,061 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 3 hours, 11 minutes, 26 seconds)
2025-08-07 00:54:06,281 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:54:21,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: -258.38733 ± 69.729
2025-08-07 00:54:21,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [-201.9962, -214.93661, -214.02556, -180.94327, -273.13174, -287.96518, -417.59393, -230.113, -341.88947, -221.2784]
2025-08-07 00:54:21,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 00:54:21,808 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (-258.39) for latency ExtremeSparseL4U32
2025-08-07 00:54:21,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 3 hours, 10 minutes, 18 seconds)
2025-08-07 00:56:05,071 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:56:20,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: -246.94077 ± 72.490
2025-08-07 00:56:20,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [-155.33492, -169.2094, -268.54672, -295.73224, -240.46048, -132.43008, -233.37732, -359.76456, -338.2471, -276.30502]
2025-08-07 00:56:20,819 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 00:56:20,820 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (-246.94) for latency ExtremeSparseL4U32
2025-08-07 00:56:20,827 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 3 hours, 8 minutes, 52 seconds)
2025-08-07 00:58:03,465 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 00:58:19,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: -103.19664 ± 112.520
2025-08-07 00:58:19,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [-83.407776, -307.14722, -78.23036, -39.40834, -74.8275, -325.14697, 35.747627, -93.312454, -50.942966, -15.29045]
2025-08-07 00:58:19,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 00:58:19,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (-103.20) for latency ExtremeSparseL4U32
2025-08-07 00:58:19,067 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 3 hours, 6 minutes, 57 seconds)
2025-08-07 01:00:01,589 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:00:17,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: -63.96069 ± 98.874
2025-08-07 01:00:17,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [-45.385887, 19.945204, 36.622555, -60.184235, -293.0176, -87.747116, 87.14288, -84.64008, -92.24933, -120.093254]
2025-08-07 01:00:17,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:00:17,270 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (-63.96) for latency ExtremeSparseL4U32
2025-08-07 01:00:17,282 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 3 hours, 6 minutes, 2 seconds)
2025-08-07 01:01:59,758 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:02:15,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: -51.91661 ± 110.533
2025-08-07 01:02:15,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [-20.460823, -164.28099, -132.63815, 132.78319, 100.87471, -69.306366, 0.81385493, -254.08286, -39.246372, -73.6223]
2025-08-07 01:02:15,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:02:15,203 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (-51.92) for latency ExtremeSparseL4U32
2025-08-07 01:02:15,211 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 3 hours, 3 minutes, 33 seconds)
2025-08-07 01:03:57,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:04:13,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: -190.60075 ± 80.124
2025-08-07 01:04:13,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [-213.36961, -353.74976, -223.86826, -41.604183, -207.8196, -85.98945, -161.77815, -186.38629, -201.47473, -229.96747]
2025-08-07 01:04:13,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:04:13,390 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 3 hours, 1 minute, 24 seconds)
2025-08-07 01:05:55,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:06:11,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: -22.32275 ± 96.612
2025-08-07 01:06:11,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [-54.01624, -80.15023, -175.66637, -46.30578, -50.513115, -95.526146, 107.09363, -26.40296, 20.13187, 178.12787]
2025-08-07 01:06:11,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:06:11,496 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (-22.32) for latency ExtremeSparseL4U32
2025-08-07 01:06:11,504 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 2 hours, 59 minutes, 10 seconds)
2025-08-07 01:07:53,992 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:08:09,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: -0.38041 ± 147.108
2025-08-07 01:08:09,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [37.772232, 22.505013, 186.66443, 62.728474, 74.743996, -124.940445, -376.14285, 108.93158, -21.937424, 25.870949]
2025-08-07 01:08:09,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:08:09,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (-0.38) for latency ExtremeSparseL4U32
2025-08-07 01:08:09,444 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 2 hours, 57 minutes, 6 seconds)
2025-08-07 01:09:51,957 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:10:07,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 88.02663 ± 86.341
2025-08-07 01:10:07,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [190.99759, 141.15334, -68.13624, 59.977844, 200.67476, 54.581875, -48.45175, 142.03894, 110.957886, 96.47201]
2025-08-07 01:10:07,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:10:07,586 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (88.03) for latency ExtremeSparseL4U32
2025-08-07 01:10:07,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 2 hours, 55 minutes, 7 seconds)
2025-08-07 01:11:50,174 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:12:05,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 49.30359 ± 118.261
2025-08-07 01:12:05,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [62.294113, 340.73566, -94.942894, 68.8127, -61.827507, 8.750375, 8.749457, 160.17372, 20.179571, -19.889318]
2025-08-07 01:12:05,801 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:12:05,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 2 hours, 53 minutes, 14 seconds)
2025-08-07 01:13:48,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:14:03,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 159.26549 ± 81.272
2025-08-07 01:14:03,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [234.80104, 282.02792, 209.45888, 207.76117, 183.97534, 156.98817, 46.02859, 182.74095, 49.103016, 39.769863]
2025-08-07 01:14:03,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:14:03,873 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (159.27) for latency ExtremeSparseL4U32
2025-08-07 01:14:03,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 2 hours, 51 minutes, 14 seconds)
2025-08-07 01:15:46,460 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:16:02,039 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 165.55647 ± 126.188
2025-08-07 01:16:02,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [230.10036, 191.16458, 56.328484, 18.095144, 107.42044, 84.53057, 397.54538, 355.54257, 23.400978, 191.4361]
2025-08-07 01:16:02,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:16:02,040 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (165.56) for latency ExtremeSparseL4U32
2025-08-07 01:16:02,048 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 2 hours, 49 minutes, 17 seconds)
2025-08-07 01:17:44,540 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:18:00,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 243.43176 ± 63.181
2025-08-07 01:18:00,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [181.88219, 350.5254, 246.72731, 275.6989, 250.63863, 162.47122, 272.09378, 331.40698, 200.8656, 162.00766]
2025-08-07 01:18:00,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:18:00,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (243.43) for latency ExtremeSparseL4U32
2025-08-07 01:18:00,111 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 2 hours, 47 minutes, 21 seconds)
2025-08-07 01:19:42,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:19:57,522 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 233.50919 ± 126.349
2025-08-07 01:19:57,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [347.73816, 413.44968, 251.79727, 383.82565, -8.202074, 234.0927, 111.236, 290.90033, 189.24751, 121.00656]
2025-08-07 01:19:57,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:19:57,531 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 2 hours, 45 minutes, 10 seconds)
2025-08-07 01:21:39,561 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:21:54,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 270.68671 ± 205.637
2025-08-07 01:21:54,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [-177.35915, 490.57132, 423.84775, 420.93253, 313.7285, 210.0962, 317.68762, 260.50656, 463.67508, -16.819212]
2025-08-07 01:21:54,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:21:54,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (270.69) for latency ExtremeSparseL4U32
2025-08-07 01:21:54,890 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 2 hours, 42 minutes, 58 seconds)
2025-08-07 01:23:36,900 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:23:52,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 294.99677 ± 125.340
2025-08-07 01:23:52,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [471.38968, 187.30652, 348.7264, 239.39034, 304.58328, 488.79883, 202.99844, 412.5863, 98.52185, 195.66606]
2025-08-07 01:23:52,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:23:52,292 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (295.00) for latency ExtremeSparseL4U32
2025-08-07 01:23:52,299 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 2 hours, 40 minutes, 50 seconds)
2025-08-07 01:25:34,369 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:25:49,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 159.14381 ± 236.905
2025-08-07 01:25:49,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [111.53728, -88.85672, -218.2668, 249.16376, 178.99715, 443.9141, -140.73242, 156.34225, 507.5313, 391.80823]
2025-08-07 01:25:49,893 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:25:49,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 2 hours, 38 minutes, 43 seconds)
2025-08-07 01:27:32,022 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:27:47,549 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 335.09860 ± 116.348
2025-08-07 01:27:47,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [282.2047, 546.2756, 379.10938, 438.93, 233.72598, 346.6645, 464.9872, 203.54832, 165.99767, 289.54263]
2025-08-07 01:27:47,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:27:47,550 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (335.10) for latency ExtremeSparseL4U32
2025-08-07 01:27:47,556 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 2 hours, 36 minutes, 39 seconds)
2025-08-07 01:29:29,587 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:29:45,103 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 384.28656 ± 111.405
2025-08-07 01:29:45,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [365.13864, 322.10736, 318.14655, 450.1919, 544.0281, 406.77402, 600.3829, 247.68285, 252.07484, 336.33826]
2025-08-07 01:29:45,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:29:45,104 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (384.29) for latency ExtremeSparseL4U32
2025-08-07 01:29:45,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 2 hours, 34 minutes, 43 seconds)
2025-08-07 01:31:27,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:31:42,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 265.76877 ± 113.497
2025-08-07 01:31:42,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [146.5858, 427.1705, 343.86108, 313.57925, 164.83322, 404.01004, 351.91998, 90.58175, 152.83899, 262.30728]
2025-08-07 01:31:42,373 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:31:42,381 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 2 hours, 32 minutes, 44 seconds)
2025-08-07 01:33:24,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:33:39,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 384.19135 ± 106.839
2025-08-07 01:33:39,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [578.9159, 449.63696, 540.09595, 268.81177, 406.58545, 370.0429, 288.3207, 382.51913, 246.18802, 310.797]
2025-08-07 01:33:39,778 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:33:39,782 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 2 hours, 30 minutes, 47 seconds)
2025-08-07 01:35:21,698 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:35:37,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 334.04810 ± 113.494
2025-08-07 01:35:37,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [228.49779, 318.26535, 176.27061, 136.93375, 436.66675, 403.7251, 415.55826, 429.50476, 482.58984, 312.46887]
2025-08-07 01:35:37,081 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:35:37,090 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 2 hours, 28 minutes, 45 seconds)
2025-08-07 01:37:18,978 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:37:34,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 314.61896 ± 173.293
2025-08-07 01:37:34,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [456.79495, 88.23205, 323.93726, 329.37256, 462.34177, -64.44917, 451.20654, 506.87683, 353.83267, 238.0439]
2025-08-07 01:37:34,509 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:37:34,523 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 2 hours, 26 minutes, 44 seconds)
2025-08-07 01:39:16,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:39:31,990 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 364.48761 ± 127.929
2025-08-07 01:39:31,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [233.20183, 477.5102, 553.2379, 285.29364, 263.0989, 304.83954, 305.65582, 271.5644, 618.9142, 331.55984]
2025-08-07 01:39:31,991 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:39:32,000 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 2 hours, 24 minutes, 45 seconds)
2025-08-07 01:41:14,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:41:29,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 427.00888 ± 173.870
2025-08-07 01:41:29,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [406.44574, 42.84266, 226.00166, 450.2121, 574.8391, 638.05597, 496.45328, 577.18964, 527.9702, 330.0784]
2025-08-07 01:41:29,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:41:29,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (427.01) for latency ExtremeSparseL4U32
2025-08-07 01:41:29,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 2 hours, 22 minutes, 53 seconds)
2025-08-07 01:43:11,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:43:27,029 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 443.85440 ± 61.870
2025-08-07 01:43:27,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [362.2634, 515.98895, 462.76202, 458.4292, 418.92844, 428.30893, 373.64172, 403.57922, 435.28577, 579.3566]
2025-08-07 01:43:27,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:43:27,030 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (443.85) for latency ExtremeSparseL4U32
2025-08-07 01:43:27,035 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 2 hours, 20 minutes, 56 seconds)
2025-08-07 01:45:09,008 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:45:24,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 388.35062 ± 116.915
2025-08-07 01:45:24,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [382.27734, 338.60822, 476.7536, 576.19305, 484.97916, 158.66422, 458.50574, 235.05943, 402.9257, 369.5396]
2025-08-07 01:45:24,475 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:45:24,485 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 2 hours, 19 minutes, 1 second)
2025-08-07 01:47:06,435 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:47:21,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 385.52893 ± 67.243
2025-08-07 01:47:21,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [404.44785, 375.7354, 492.21582, 365.2939, 359.9427, 326.8159, 420.7695, 311.31522, 503.626, 295.12714]
2025-08-07 01:47:21,810 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:47:21,815 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 2 hours, 17 minutes, 2 seconds)
2025-08-07 01:49:03,818 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:49:19,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 461.75333 ± 88.097
2025-08-07 01:49:19,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [410.24622, 635.80066, 622.5887, 362.3292, 416.39258, 454.86258, 402.89944, 467.5553, 432.85614, 412.00235]
2025-08-07 01:49:19,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:49:19,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (461.75) for latency ExtremeSparseL4U32
2025-08-07 01:49:19,225 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 2 hours, 15 minutes, 3 seconds)
2025-08-07 01:51:01,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:51:16,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 459.46378 ± 131.786
2025-08-07 01:51:16,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [540.0879, 659.7843, 253.77672, 431.76724, 570.1559, 580.8295, 511.59735, 271.68362, 453.23163, 321.72342]
2025-08-07 01:51:16,592 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:51:16,605 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 2 hours, 13 minutes, 3 seconds)
2025-08-07 01:52:58,621 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:53:14,112 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 546.14679 ± 98.825
2025-08-07 01:53:14,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [580.1391, 642.2849, 513.2988, 652.865, 561.3155, 513.4099, 546.73645, 677.1737, 333.5057, 440.73816]
2025-08-07 01:53:14,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:53:14,113 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (546.15) for latency ExtremeSparseL4U32
2025-08-07 01:53:14,119 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 2 hours, 11 minutes, 6 seconds)
2025-08-07 01:54:56,222 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:55:11,596 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 491.32266 ± 98.827
2025-08-07 01:55:11,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [560.1963, 611.7465, 387.45532, 564.58234, 358.10797, 580.75885, 425.3278, 619.4484, 396.55255, 409.05032]
2025-08-07 01:55:11,597 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:55:11,606 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 2 hours, 9 minutes, 10 seconds)
2025-08-07 01:56:53,571 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:57:08,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 407.72015 ± 106.755
2025-08-07 01:57:08,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [469.68948, 485.82626, 463.5037, 207.51482, 466.5882, 457.43765, 302.13464, 565.5458, 272.05963, 386.90164]
2025-08-07 01:57:08,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:57:08,949 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 2 hours, 7 minutes, 12 seconds)
2025-08-07 01:58:50,909 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 01:59:06,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 430.42032 ± 125.220
2025-08-07 01:59:06,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [567.6813, 417.25745, 590.7052, 351.57602, 554.4479, 536.7698, 328.6415, 403.0599, 174.5074, 379.5565]
2025-08-07 01:59:06,464 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 01:59:06,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 2 hours, 5 minutes, 16 seconds)
2025-08-07 02:00:48,432 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:01:03,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 397.14044 ± 122.644
2025-08-07 02:01:03,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [476.06042, 217.36464, 250.40831, 415.15204, 589.5718, 353.0127, 407.0734, 601.1573, 314.78375, 346.82025]
2025-08-07 02:01:03,954 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:01:03,964 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 2 hours, 3 minutes, 20 seconds)
2025-08-07 02:02:45,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:03:01,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 405.85678 ± 112.621
2025-08-07 02:03:01,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [412.75043, 246.728, 290.018, 321.64615, 573.61786, 485.37497, 395.6671, 520.4247, 535.12695, 277.2138]
2025-08-07 02:03:01,354 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:03:01,363 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 2 hours, 1 minute, 21 seconds)
2025-08-07 02:04:43,280 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:04:58,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 496.42285 ± 67.202
2025-08-07 02:04:58,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [597.63806, 453.43045, 484.19073, 574.1102, 528.68066, 558.8215, 462.02304, 357.48492, 458.93402, 488.91446]
2025-08-07 02:04:58,631 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:04:58,639 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 1 hour, 59 minutes, 21 seconds)
2025-08-07 02:06:40,593 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:06:56,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 421.54437 ± 122.757
2025-08-07 02:06:56,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [229.42764, 367.72385, 433.96646, 532.03796, 456.88638, 456.5804, 326.6222, 267.86588, 667.15576, 477.17715]
2025-08-07 02:06:56,164 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:06:56,171 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 1 hour, 57 minutes, 26 seconds)
2025-08-07 02:08:38,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:08:53,558 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 538.42413 ± 76.186
2025-08-07 02:08:53,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [484.50708, 439.3417, 642.4558, 603.05524, 605.1416, 554.28326, 416.03235, 552.0349, 471.59717, 615.79175]
2025-08-07 02:08:53,559 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:08:53,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 1 hour, 55 minutes, 27 seconds)
2025-08-07 02:10:35,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:10:50,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 547.84802 ± 122.881
2025-08-07 02:10:50,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [604.69763, 746.3147, 470.86896, 597.7528, 394.61517, 681.1275, 643.23755, 490.57642, 514.02856, 335.2608]
2025-08-07 02:10:50,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:10:50,843 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (547.85) for latency ExtremeSparseL4U32
2025-08-07 02:10:50,850 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 1 hour, 53 minutes, 27 seconds)
2025-08-07 02:12:32,823 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:12:48,230 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 419.06445 ± 96.825
2025-08-07 02:12:48,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [295.50705, 437.05026, 305.58795, 554.7153, 516.69293, 468.28476, 405.48615, 287.5389, 543.8932, 375.8883]
2025-08-07 02:12:48,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:12:48,239 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 1 hour, 51 minutes, 30 seconds)
2025-08-07 02:14:30,214 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:14:45,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 533.24164 ± 102.813
2025-08-07 02:14:45,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [512.97656, 316.45218, 517.6209, 654.8987, 632.76385, 584.57733, 424.297, 594.2043, 461.48795, 633.1379]
2025-08-07 02:14:45,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:14:45,700 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 1 hour, 49 minutes, 35 seconds)
2025-08-07 02:16:27,658 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:16:43,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 618.54480 ± 78.445
2025-08-07 02:16:43,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [720.83014, 543.8607, 692.23444, 662.16254, 587.6789, 657.5737, 696.6418, 488.74036, 506.21027, 629.5149]
2025-08-07 02:16:43,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:16:43,034 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (618.54) for latency ExtremeSparseL4U32
2025-08-07 02:16:43,044 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 1 hour, 47 minutes, 35 seconds)
2025-08-07 02:18:25,063 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:18:40,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 655.37543 ± 64.431
2025-08-07 02:18:40,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [582.82227, 638.1014, 700.181, 699.8727, 667.91315, 723.18317, 617.8759, 518.74146, 665.8854, 739.17755]
2025-08-07 02:18:40,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:18:40,409 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (655.38) for latency ExtremeSparseL4U32
2025-08-07 02:18:40,419 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 1 hour, 45 minutes, 37 seconds)
2025-08-07 02:20:22,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:20:37,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 541.11530 ± 81.683
2025-08-07 02:20:37,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [632.4046, 460.0708, 617.5179, 566.0885, 432.31683, 583.43335, 494.48215, 402.63757, 594.02136, 628.17975]
2025-08-07 02:20:37,969 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:20:37,979 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 1 hour, 43 minutes, 43 seconds)
2025-08-07 02:22:19,947 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:22:35,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 558.55048 ± 126.631
2025-08-07 02:22:35,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [857.88055, 370.00223, 584.84155, 473.23355, 526.512, 601.0184, 540.1703, 666.6484, 503.1446, 462.05298]
2025-08-07 02:22:35,268 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:22:35,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 1 hour, 41 minutes, 45 seconds)
2025-08-07 02:24:17,265 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:24:32,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 440.33115 ± 124.493
2025-08-07 02:24:32,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [157.74736, 628.1195, 459.23868, 494.41077, 315.44412, 442.4494, 457.3595, 385.89642, 555.5344, 507.11115]
2025-08-07 02:24:32,792 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:24:32,797 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 1 hour, 39 minutes, 48 seconds)
2025-08-07 02:26:14,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:26:30,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 659.51770 ± 150.414
2025-08-07 02:26:30,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [982.60736, 651.76, 787.07574, 647.3466, 506.27988, 377.76096, 652.3984, 623.23334, 684.82275, 681.89136]
2025-08-07 02:26:30,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:26:30,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (659.52) for latency ExtremeSparseL4U32
2025-08-07 02:26:30,319 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 1 hour, 37 minutes, 52 seconds)
2025-08-07 02:28:12,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:28:27,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 664.59412 ± 108.465
2025-08-07 02:28:27,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [519.4008, 571.59033, 650.44025, 809.11835, 696.25385, 584.17126, 549.9879, 794.9782, 833.26044, 636.74036]
2025-08-07 02:28:27,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:28:27,796 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (664.59) for latency ExtremeSparseL4U32
2025-08-07 02:28:27,806 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 1 hour, 35 minutes, 56 seconds)
2025-08-07 02:30:09,777 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:30:25,324 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 578.03888 ± 126.648
2025-08-07 02:30:25,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [515.0055, 544.4279, 627.9139, 446.22528, 516.6325, 387.68225, 562.1221, 603.84753, 732.96985, 843.5619]
2025-08-07 02:30:25,325 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:30:25,335 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 1 hour, 33 minutes, 58 seconds)
2025-08-07 02:32:07,234 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:32:22,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 632.29529 ± 101.936
2025-08-07 02:32:22,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [649.8367, 463.98306, 473.4205, 698.29486, 531.00775, 672.6732, 701.27515, 645.5653, 697.9428, 788.9531]
2025-08-07 02:32:22,724 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:32:22,735 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 1 hour, 32 minutes, 2 seconds)
2025-08-07 02:34:04,632 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:34:19,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 819.82306 ± 113.633
2025-08-07 02:34:19,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [798.67334, 847.9398, 896.3283, 1016.36993, 793.71063, 650.4672, 818.22546, 911.60736, 609.2757, 855.6327]
2025-08-07 02:34:19,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:34:19,959 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (819.82) for latency ExtremeSparseL4U32
2025-08-07 02:34:19,968 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 1 hour, 30 minutes, 1 second)
2025-08-07 02:36:01,939 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:36:17,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 693.79346 ± 78.819
2025-08-07 02:36:17,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [703.90326, 646.9205, 815.0069, 637.75867, 546.6775, 699.79175, 741.3261, 744.9135, 613.23505, 788.4013]
2025-08-07 02:36:17,486 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:36:17,494 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 1 hour, 28 minutes, 4 seconds)
2025-08-07 02:37:59,391 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:38:14,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 601.61243 ± 76.903
2025-08-07 02:38:14,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [567.8884, 615.923, 733.6731, 509.2859, 582.0911, 658.50635, 711.94635, 480.5135, 589.921, 566.37573]
2025-08-07 02:38:14,923 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:38:14,929 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 1 hour, 26 minutes, 6 seconds)
2025-08-07 02:39:56,980 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:40:12,473 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 663.87415 ± 93.278
2025-08-07 02:40:12,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [693.90155, 506.895, 709.13824, 571.2218, 688.21954, 684.4736, 858.1388, 568.7315, 714.81757, 643.2034]
2025-08-07 02:40:12,474 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:40:12,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 1 hour, 24 minutes, 9 seconds)
2025-08-07 02:41:54,552 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:42:10,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 791.99957 ± 121.706
2025-08-07 02:42:10,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [815.69507, 932.53076, 700.7105, 943.2907, 675.40875, 892.7836, 819.77637, 672.91266, 573.7118, 893.1755]
2025-08-07 02:42:10,059 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:42:10,068 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 1 hour, 22 minutes, 13 seconds)
2025-08-07 02:43:52,080 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:44:07,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 712.64471 ± 110.971
2025-08-07 02:44:07,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [890.7932, 771.2615, 803.2955, 567.1665, 573.8815, 709.84265, 582.43536, 848.1587, 728.32275, 651.2896]
2025-08-07 02:44:07,436 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:44:07,443 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 1 hour, 20 minutes, 17 seconds)
2025-08-07 02:45:49,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:46:04,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 742.07147 ± 131.780
2025-08-07 02:46:04,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [981.1968, 778.28986, 546.7328, 717.5006, 645.73413, 751.4117, 788.7713, 858.6581, 530.2815, 822.13885]
2025-08-07 02:46:04,938 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:46:04,944 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 1 hour, 18 minutes, 19 seconds)
2025-08-07 02:47:46,889 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:48:02,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 773.06104 ± 141.763
2025-08-07 02:48:02,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [788.12726, 751.34424, 707.86816, 903.3721, 768.90985, 927.55304, 879.4114, 408.85037, 868.26373, 726.9101]
2025-08-07 02:48:02,266 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:48:02,275 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 1 hour, 16 minutes, 21 seconds)
2025-08-07 02:49:44,368 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:49:59,880 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 778.09485 ± 153.230
2025-08-07 02:49:59,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [899.6906, 987.58496, 592.1019, 665.8898, 784.81854, 1021.2672, 778.55786, 803.95776, 726.7788, 520.3006]
2025-08-07 02:49:59,881 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:49:59,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 1 hour, 14 minutes, 24 seconds)
2025-08-07 02:51:41,863 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:51:57,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 554.24640 ± 91.259
2025-08-07 02:51:57,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [499.85297, 587.8038, 542.23114, 648.6505, 481.65552, 527.014, 717.93494, 594.07806, 576.73206, 366.51105]
2025-08-07 02:51:57,249 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:51:57,260 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 1 hour, 12 minutes, 25 seconds)
2025-08-07 02:53:39,288 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:53:54,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 865.97559 ± 87.255
2025-08-07 02:53:54,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1040.5051, 790.5301, 999.9053, 793.95215, 831.8048, 777.5656, 795.8131, 850.37854, 861.4883, 917.8135]
2025-08-07 02:53:54,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:53:54,838 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (865.98) for latency ExtremeSparseL4U32
2025-08-07 02:53:54,854 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 1 hour, 10 minutes, 29 seconds)
2025-08-07 02:55:36,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:55:52,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 795.01105 ± 167.247
2025-08-07 02:55:52,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [868.31934, 784.7673, 658.86163, 813.62177, 902.8772, 696.27527, 686.9408, 757.9676, 571.8221, 1208.6575]
2025-08-07 02:55:52,413 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:55:52,420 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 1 hour, 8 minutes, 32 seconds)
2025-08-07 02:57:34,406 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:57:49,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 780.34265 ± 146.562
2025-08-07 02:57:49,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [910.58014, 741.03143, 453.5761, 1027.7333, 716.7504, 833.2833, 904.2, 718.3962, 761.59174, 736.2837]
2025-08-07 02:57:49,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:57:49,763 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 1 hour, 6 minutes, 34 seconds)
2025-08-07 02:59:31,839 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 02:59:47,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 908.13806 ± 172.931
2025-08-07 02:59:47,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1159.7106, 1090.2924, 731.61945, 902.52106, 747.36786, 673.31665, 1066.8633, 715.4774, 918.8096, 1075.4027]
2025-08-07 02:59:47,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 02:59:47,343 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (908.14) for latency ExtremeSparseL4U32
2025-08-07 02:59:47,353 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 1 hour, 4 minutes, 37 seconds)
2025-08-07 03:01:29,383 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:01:44,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 948.87256 ± 138.398
2025-08-07 03:01:44,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1026.0548, 900.727, 998.65454, 839.30475, 1198.7573, 848.72974, 896.79517, 707.2624, 936.3068, 1136.1332]
2025-08-07 03:01:44,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:01:44,877 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (948.87) for latency ExtremeSparseL4U32
2025-08-07 03:01:44,887 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 1 hour, 2 minutes, 40 seconds)
2025-08-07 03:03:26,933 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:03:42,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 910.57245 ± 91.851
2025-08-07 03:03:42,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [919.99133, 1065.5281, 897.9753, 854.18463, 1043.0543, 751.86676, 909.0799, 809.89966, 968.9528, 885.19135]
2025-08-07 03:03:42,431 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:03:42,440 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 43 seconds)
2025-08-07 03:05:24,527 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:05:39,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 887.77966 ± 146.129
2025-08-07 03:05:39,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [900.50085, 1127.7577, 937.6815, 630.4875, 721.62494, 962.98175, 964.4291, 877.0673, 723.2994, 1031.967]
2025-08-07 03:05:39,981 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:05:39,989 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 58 minutes, 45 seconds)
2025-08-07 03:07:22,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:07:37,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 800.95557 ± 184.900
2025-08-07 03:07:37,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [467.72058, 1012.2029, 1077.751, 981.6756, 819.2011, 710.5653, 916.27527, 735.28406, 614.2463, 674.6337]
2025-08-07 03:07:37,471 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:07:37,482 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 56 minutes, 48 seconds)
2025-08-07 03:09:19,421 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:09:34,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1047.96375 ± 137.504
2025-08-07 03:09:34,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1060.6349, 1107.1815, 950.7873, 985.96295, 1283.715, 894.3985, 1181.6312, 846.38635, 1210.5493, 958.3918]
2025-08-07 03:09:34,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:09:34,757 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (1047.96) for latency ExtremeSparseL4U32
2025-08-07 03:09:34,769 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 54 minutes, 49 seconds)
2025-08-07 03:11:16,756 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:11:32,284 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 903.56946 ± 154.397
2025-08-07 03:11:32,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [956.59686, 994.9787, 927.13837, 904.4272, 947.0566, 1039.5748, 503.46722, 942.0847, 760.82245, 1059.548]
2025-08-07 03:11:32,285 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:11:32,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 52 minutes, 51 seconds)
2025-08-07 03:13:14,263 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:13:29,696 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1024.12219 ± 135.778
2025-08-07 03:13:29,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1131.679, 899.01636, 998.5303, 960.40424, 1112.9336, 941.6962, 1097.393, 1203.7998, 1158.1968, 737.5715]
2025-08-07 03:13:29,697 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:13:29,707 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 50 minutes, 53 seconds)
2025-08-07 03:15:11,694 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:15:27,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1108.69275 ± 159.845
2025-08-07 03:15:27,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1232.2109, 1100.3552, 1061.3285, 1193.7548, 1033.7026, 1233.9404, 731.59924, 1130.9225, 1350.6638, 1018.4488]
2025-08-07 03:15:27,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:15:27,220 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (1108.69) for latency ExtremeSparseL4U32
2025-08-07 03:15:27,231 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 48 minutes, 56 seconds)
2025-08-07 03:17:09,216 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:17:24,730 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1002.48761 ± 95.430
2025-08-07 03:17:24,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [958.3807, 1073.0758, 917.23566, 849.5569, 1134.1664, 1031.3835, 888.1478, 1035.9935, 989.7036, 1147.2312]
2025-08-07 03:17:24,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:17:24,741 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 46 minutes, 58 seconds)
2025-08-07 03:19:06,671 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:19:22,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1085.62231 ± 132.087
2025-08-07 03:19:22,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [967.6811, 1292.3333, 1274.3749, 1008.9084, 1040.2616, 1260.8578, 927.3933, 979.7003, 1001.94415, 1102.7684]
2025-08-07 03:19:22,219 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:19:22,232 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 45 minutes, 2 seconds)
2025-08-07 03:21:04,181 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:21:19,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 867.69904 ± 184.044
2025-08-07 03:21:19,733 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1089.1741, 944.69025, 996.9798, 962.06885, 637.7299, 663.89606, 808.04987, 828.4733, 1154.521, 591.40674]
2025-08-07 03:21:19,734 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:21:19,746 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 43 minutes, 4 seconds)
2025-08-07 03:23:01,731 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:23:17,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 898.56506 ± 179.910
2025-08-07 03:23:17,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1019.4138, 899.0959, 936.9966, 909.68756, 1023.79974, 886.76965, 500.16684, 814.9898, 766.6918, 1228.0385]
2025-08-07 03:23:17,264 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:23:17,274 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 41 minutes, 7 seconds)
2025-08-07 03:24:59,372 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:25:14,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 960.35706 ± 198.259
2025-08-07 03:25:14,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [904.02563, 1037.1068, 693.0639, 1096.6002, 1004.38464, 1221.0175, 1011.29443, 932.1299, 537.9468, 1166.0009]
2025-08-07 03:25:14,821 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:25:14,832 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 39 minutes, 10 seconds)
2025-08-07 03:26:56,851 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:27:12,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 986.49609 ± 70.913
2025-08-07 03:27:12,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1091.7803, 941.2892, 1086.922, 877.11993, 1023.6821, 1022.8535, 1012.2622, 885.67084, 974.8112, 948.5698]
2025-08-07 03:27:12,404 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:27:12,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 37 minutes, 13 seconds)
2025-08-07 03:28:54,564 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:29:10,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1206.24194 ± 175.731
2025-08-07 03:29:10,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1184.5771, 1311.2194, 1325.4178, 1114.018, 837.01196, 1064.9952, 1316.6302, 1099.6708, 1314.3696, 1494.5103]
2025-08-07 03:29:10,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:29:10,047 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (1206.24) for latency ExtremeSparseL4U32
2025-08-07 03:29:10,057 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 35 minutes, 16 seconds)
2025-08-07 03:30:52,131 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:31:07,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1115.16577 ± 159.880
2025-08-07 03:31:07,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1119.0625, 828.33813, 1141.2686, 1442.3552, 1212.9008, 985.887, 956.15497, 1088.3347, 1153.1848, 1224.171]
2025-08-07 03:31:07,667 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:31:07,680 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 33 minutes, 18 seconds)
2025-08-07 03:32:50,014 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:33:05,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 993.28381 ± 195.070
2025-08-07 03:33:05,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1125.3121, 1070.7235, 1151.3463, 758.90656, 754.1449, 850.8881, 1020.39355, 868.8984, 917.3457, 1414.8794]
2025-08-07 03:33:05,416 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:33:05,424 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 31 minutes, 22 seconds)
2025-08-07 03:34:48,415 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:35:03,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1049.76062 ± 132.519
2025-08-07 03:35:03,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1126.8501, 951.05206, 902.4753, 818.18243, 1005.6173, 1061.1875, 1026.8796, 1182.5175, 1133.6093, 1289.2352]
2025-08-07 03:35:03,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:35:03,972 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 29 minutes, 27 seconds)
2025-08-07 03:36:46,922 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:37:02,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1029.15808 ± 190.358
2025-08-07 03:37:02,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [987.0916, 923.2583, 1311.9103, 786.1844, 944.1103, 960.86053, 1184.3234, 1032.2255, 1371.562, 790.0556]
2025-08-07 03:37:02,484 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:37:02,492 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 27 minutes, 32 seconds)
2025-08-07 03:38:46,201 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:39:01,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1086.28149 ± 250.503
2025-08-07 03:39:01,754 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1187.9707, 385.4633, 1130.6506, 1330.6659, 1286.2947, 1208.6106, 1152.1919, 1075.102, 1089.5082, 1016.35754]
2025-08-07 03:39:01,755 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:39:01,767 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 25 minutes, 38 seconds)
2025-08-07 03:40:45,643 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:41:01,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 941.30176 ± 129.652
2025-08-07 03:41:01,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1110.0504, 871.1699, 982.36285, 942.42377, 736.11523, 771.06177, 1138.9644, 956.04614, 846.59204, 1058.2312]
2025-08-07 03:41:01,013 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:41:01,025 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 23 minutes, 44 seconds)
2025-08-07 03:42:44,714 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:43:00,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1224.82751 ± 124.072
2025-08-07 03:43:00,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1067.7764, 1372.5562, 1164.2234, 975.6927, 1174.4504, 1217.5857, 1327.5029, 1292.0557, 1364.8049, 1291.6266]
2025-08-07 03:43:00,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:43:00,294 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (1224.83) for latency ExtremeSparseL4U32
2025-08-07 03:43:00,305 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 21 minutes, 48 seconds)
2025-08-07 03:44:44,093 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:44:59,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1286.60706 ± 141.717
2025-08-07 03:44:59,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1141.4042, 1291.5669, 1370.4562, 1492.5941, 1304.3052, 1257.068, 1149.1493, 1531.1848, 1063.2079, 1265.1339]
2025-08-07 03:44:59,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:44:59,505 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1226 [INFO]: New best (1286.61) for latency ExtremeSparseL4U32
2025-08-07 03:44:59,516 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 19 minutes, 51 seconds)
2025-08-07 03:46:43,311 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:46:58,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1237.76147 ± 128.963
2025-08-07 03:46:58,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1301.3483, 1136.072, 1107.705, 1155.7583, 1387.6599, 1245.736, 1274.4227, 1512.1356, 1179.4178, 1077.3591]
2025-08-07 03:46:58,682 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:46:58,690 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 17 minutes, 53 seconds)
2025-08-07 03:48:42,506 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:48:58,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1111.89136 ± 133.499
2025-08-07 03:48:58,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [859.2635, 939.60846, 1230.2947, 1297.2648, 1133.2583, 1132.9049, 1166.1309, 1192.4944, 1194.3569, 973.33655]
2025-08-07 03:48:58,009 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:48:58,020 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 15 minutes, 54 seconds)
2025-08-07 03:50:41,846 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:50:57,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1212.44043 ± 168.532
2025-08-07 03:50:57,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1088.0397, 1429.3789, 1147.2545, 849.0414, 1228.4106, 1328.734, 1053.3726, 1310.8579, 1317.6805, 1371.6333]
2025-08-07 03:50:57,182 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:50:57,191 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 13 minutes, 54 seconds)
2025-08-07 03:52:41,100 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:52:56,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1235.03662 ± 167.315
2025-08-07 03:52:56,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1136.6567, 1271.1493, 857.8637, 1242.0382, 1240.4452, 1265.2097, 1287.0778, 1405.056, 1520.9563, 1123.9144]
2025-08-07 03:52:56,568 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:52:56,580 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 11 minutes, 55 seconds)
2025-08-07 03:54:40,387 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:54:55,844 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1249.05554 ± 158.482
2025-08-07 03:54:55,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1242.7511, 1657.9884, 1377.9694, 1128.1157, 1071.8832, 1232.1329, 1140.7388, 1159.4032, 1262.5377, 1217.0355]
2025-08-07 03:54:55,845 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:54:55,852 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 9 minutes, 56 seconds)
2025-08-07 03:56:39,619 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:56:55,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1212.77307 ± 151.717
2025-08-07 03:56:55,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1443.3411, 1084.0104, 1367.7474, 1167.5233, 1156.1095, 1259.8574, 1249.5327, 1232.7982, 866.9293, 1299.8807]
2025-08-07 03:56:55,098 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:56:55,109 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 7 minutes, 57 seconds)
2025-08-07 03:58:38,902 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 03:58:54,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1247.56348 ± 96.342
2025-08-07 03:58:54,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1282.0177, 1278.7236, 1107.27, 1347.8049, 1206.3542, 1359.8044, 1378.8113, 1249.178, 1172.0735, 1093.5964]
2025-08-07 03:58:54,384 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 03:58:54,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 5 minutes, 57 seconds)
2025-08-07 04:00:38,115 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:00:53,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1205.29419 ± 163.388
2025-08-07 04:00:53,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1158.5006, 1072.7451, 1118.9469, 1399.5884, 920.71643, 1387.6083, 1160.6565, 1242.1101, 1111.707, 1480.3629]
2025-08-07 04:00:53,651 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:00:53,663 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 3 minutes, 58 seconds)
2025-08-07 04:02:37,393 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:02:52,940 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1241.07935 ± 108.245
2025-08-07 04:02:52,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1408.7883, 1181.1211, 1151.6554, 1374.1859, 1375.465, 1095.2081, 1196.747, 1303.7115, 1130.4279, 1193.4835]
2025-08-07 04:02:52,941 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:02:52,951 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 1 minute, 59 seconds)
2025-08-07 04:04:36,737 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-08-07 04:04:52,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1221 [DEBUG]: Total Reward: 1239.01331 ± 109.648
2025-08-07 04:04:52,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1222 [DEBUG]: All rewards: [1033.5208, 1309.4823, 1194.5364, 1269.3679, 1418.7253, 1356.7788, 1285.8741, 1236.4886, 1102.6395, 1182.7184]
2025-08-07 04:04:52,138 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-08-07 04:04:52,154 latency_env.delayed_mdp:training_loop(baseline-bpql-noiseperc25-halfcheetah):1251 [DEBUG]: Training session finished
