2025-05-13 09:06:23,227 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-bpql-mda-mem4
2025-05-13 09:06:23,227 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-bpql-mda-mem4
2025-05-13 09:06:23,227 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x14ee49b09510>}
2025-05-13 09:06:23,227 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1111 [DEBUG]: using device: cuda
2025-05-13 09:06:23,232 baseline-bpql-mda-noisy-halfcheetah:91 [WARNING]: args.assumed_delay != args.horizon: 4 != 32
2025-05-13 09:06:23,232 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1133 [INFO]: Creating new trainer
2025-05-13 09:06:23,248 baseline-bpql-mda-noisy-halfcheetah:119 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-05-13 09:06:23,248 baseline-bpql-mda-noisy-halfcheetah:120 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-13 09:06:23,254 baseline-bpql-mda-noisy-halfcheetah:149 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Identity()
  (net_rec): GRU(6, 384, batch_first=True)
)
2025-05-13 09:06:24,031 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1194 [DEBUG]: Starting training session...
2025-05-13 09:06:24,031 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 1/100
2025-05-13 09:10:06,543 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:10:19,192 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -389.01123 ± 10.592
2025-05-13 09:10:19,192 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-389.31833, -395.85785, -398.20685, -393.6978, -386.7521, -388.86072, -394.33392, -390.68512, -393.577, -358.82272]
2025-05-13 09:10:19,192 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:10:19,192 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (-389.01) for latency ExtremeSparseL4U32
2025-05-13 09:10:19,199 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 6 hours, 28 minutes, 1 second)
2025-05-13 09:14:05,314 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:14:18,106 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -63.24020 ± 58.836
2025-05-13 09:14:18,106 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-115.375336, -59.310993, -19.252724, 29.007004, -37.901707, -191.30678, -11.114728, -80.81699, -97.37715, -48.95256]
2025-05-13 09:14:18,107 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:14:18,107 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (-63.24) for latency ExtremeSparseL4U32
2025-05-13 09:14:18,112 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 6 hours, 27 minutes, 9 seconds)
2025-05-13 09:18:03,949 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:18:16,637 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 404.63232 ± 112.594
2025-05-13 09:18:16,637 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [243.05959, 452.43567, 516.09326, 318.4478, 249.21498, 545.6842, 519.7366, 393.3902, 301.40512, 506.85608]
2025-05-13 09:18:16,637 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:18:16,637 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (404.63) for latency ExtremeSparseL4U32
2025-05-13 09:18:16,642 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 6 hours, 24 minutes, 1 second)
2025-05-13 09:22:02,853 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:22:15,559 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1188.73425 ± 270.615
2025-05-13 09:22:15,559 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [803.91455, 892.136, 1126.3387, 991.5721, 1745.1211, 1080.8142, 1218.3616, 1518.5265, 1172.2365, 1338.3209]
2025-05-13 09:22:15,559 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:22:15,559 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (1188.73) for latency ExtremeSparseL4U32
2025-05-13 09:22:15,566 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 6 hours, 20 minutes, 36 seconds)
2025-05-13 09:26:01,297 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:26:13,959 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1439.85181 ± 429.680
2025-05-13 09:26:13,959 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1806.774, 1654.5874, 1536.7217, 1750.9482, 1414.9591, 1692.6409, 292.89578, 1092.7356, 1476.678, 1679.5775]
2025-05-13 09:26:13,959 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:26:13,960 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (1439.85) for latency ExtremeSparseL4U32
2025-05-13 09:26:13,967 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 6 hours, 16 minutes, 48 seconds)
2025-05-13 09:29:58,632 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:30:10,949 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1715.87988 ± 188.205
2025-05-13 09:30:10,950 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1657.0293, 1850.2292, 1700.3445, 1616.1608, 1280.1927, 1987.9006, 1836.2444, 1893.1503, 1604.1844, 1733.362]
2025-05-13 09:30:10,950 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:30:10,950 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (1715.88) for latency ExtremeSparseL4U32
2025-05-13 09:30:10,958 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 6 hours, 13 minutes, 25 seconds)
2025-05-13 09:33:55,380 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:34:07,540 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1830.05505 ± 315.176
2025-05-13 09:34:07,540 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2368.0066, 2083.802, 1941.8547, 1703.5668, 1122.6368, 1836.4666, 1778.4728, 1566.4285, 1879.5398, 2019.7765]
2025-05-13 09:34:07,540 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:34:07,540 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (1830.06) for latency ExtremeSparseL4U32
2025-05-13 09:34:07,546 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 6 hours, 8 minutes, 43 seconds)
2025-05-13 09:37:51,030 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:38:03,416 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1866.24048 ± 249.358
2025-05-13 09:38:03,416 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1523.1082, 1997.4247, 1952.0374, 1996.9481, 1773.8062, 1453.9221, 2153.0332, 1585.2311, 2127.7935, 2099.1018]
2025-05-13 09:38:03,416 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:38:03,416 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (1866.24) for latency ExtremeSparseL4U32
2025-05-13 09:38:03,419 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 6 hours, 3 minutes, 56 seconds)
2025-05-13 09:41:47,173 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:41:59,339 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1853.87573 ± 669.161
2025-05-13 09:41:59,339 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2337.5593, 2061.8816, 688.58154, 2548.332, 558.179, 2117.6755, 2440.835, 1960.3489, 1559.7101, 2265.656]
2025-05-13 09:41:59,339 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:41:59,343 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 5 hours, 59 minutes, 4 seconds)
2025-05-13 09:45:41,505 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:45:53,570 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1960.47107 ± 311.432
2025-05-13 09:45:53,570 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2142.1777, 2247.7195, 1419.9817, 1542.9832, 2240.045, 1966.6005, 1596.7196, 2205.9128, 1940.3616, 2302.2092]
2025-05-13 09:45:53,570 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:45:53,570 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (1960.47) for latency ExtremeSparseL4U32
2025-05-13 09:45:53,577 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 5 hours, 53 minutes, 52 seconds)
2025-05-13 09:49:35,831 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:49:48,084 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1957.00073 ± 707.504
2025-05-13 09:49:48,084 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2658.4956, 2119.3403, 185.23909, 1973.469, 2398.576, 2193.9849, 2049.0037, 2385.096, 2458.2683, 1148.5361]
2025-05-13 09:49:48,084 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:49:48,090 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 5 hours, 49 minutes, 12 seconds)
2025-05-13 09:53:30,504 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:53:42,863 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2304.17310 ± 506.706
2025-05-13 09:53:42,864 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2182.7327, 2702.326, 1557.7626, 2696.9062, 3012.051, 2544.3074, 1590.4828, 1628.598, 2673.012, 2453.5513]
2025-05-13 09:53:42,864 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:53:42,864 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (2304.17) for latency ExtremeSparseL4U32
2025-05-13 09:53:42,869 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 5 hours, 44 minutes, 45 seconds)
2025-05-13 09:57:25,488 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:57:37,742 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2828.53955 ± 227.727
2025-05-13 09:57:37,742 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2576.4255, 2484.0486, 3103.354, 2650.735, 3275.0613, 2888.5498, 2864.3127, 2688.6182, 2859.0298, 2895.2593]
2025-05-13 09:57:37,742 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:57:37,742 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (2828.54) for latency ExtremeSparseL4U32
2025-05-13 09:57:37,750 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 5 hours, 40 minutes, 33 seconds)
2025-05-13 10:01:20,583 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:01:32,904 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2563.52173 ± 300.237
2025-05-13 10:01:32,904 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2471.924, 2315.1821, 2714.4617, 2531.0295, 2523.4797, 2144.0032, 3106.09, 2236.4133, 2551.2104, 3041.422]
2025-05-13 10:01:32,904 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:01:32,909 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 5 hours, 36 minutes, 25 seconds)
2025-05-13 10:05:15,091 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:05:27,445 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2432.47852 ± 760.004
2025-05-13 10:05:27,445 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2381.2883, 2658.7595, 2157.05, 3230.2288, 1238.3008, 932.0239, 2983.158, 3279.1145, 2474.9724, 2989.8892]
2025-05-13 10:05:27,445 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:05:27,450 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 5 hours, 32 minutes, 35 seconds)
2025-05-13 10:09:09,560 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:09:21,800 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2331.56836 ± 680.685
2025-05-13 10:09:21,800 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2527.2593, 3046.0781, 1128.1881, 3148.2527, 1163.0713, 2124.2576, 2719.179, 2206.7039, 2959.0725, 2293.6216]
2025-05-13 10:09:21,800 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:09:21,806 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 5 hours, 28 minutes, 38 seconds)
2025-05-13 10:13:03,897 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:13:16,307 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2671.91602 ± 858.403
2025-05-13 10:13:16,307 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2045.747, 3466.1365, 2719.9312, 2770.733, 2548.008, 457.4892, 3109.8835, 2698.1726, 3508.4207, 3394.6377]
2025-05-13 10:13:16,307 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:13:16,313 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 5 hours, 24 minutes, 39 seconds)
2025-05-13 10:16:58,558 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:17:10,747 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2597.86157 ± 664.365
2025-05-13 10:17:10,747 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2562.1602, 2290.656, 2744.6777, 2475.4966, 2696.9282, 2962.1582, 3052.4407, 3365.132, 3012.15, 816.81555]
2025-05-13 10:17:10,747 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:17:10,751 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 5 hours, 20 minutes, 37 seconds)
2025-05-13 10:20:53,679 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:21:05,965 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2992.88916 ± 440.932
2025-05-13 10:21:05,966 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2839.5957, 3379.5452, 3047.0728, 3119.6348, 1956.8572, 3505.5637, 3296.3914, 3117.386, 2474.5771, 3192.2659]
2025-05-13 10:21:05,966 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:21:05,966 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (2992.89) for latency ExtremeSparseL4U32
2025-05-13 10:21:05,971 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 5 hours, 16 minutes, 43 seconds)
2025-05-13 10:24:49,568 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:25:01,820 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2998.73315 ± 387.094
2025-05-13 10:25:01,820 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3073.8508, 2962.069, 3465.7993, 3213.0342, 2070.4358, 3150.837, 3095.5479, 3455.4817, 2728.9753, 2771.3035]
2025-05-13 10:25:01,820 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:25:01,820 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (2998.73) for latency ExtremeSparseL4U32
2025-05-13 10:25:01,824 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 5 hours, 13 minutes, 9 seconds)
2025-05-13 10:28:45,472 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:28:57,639 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2459.24951 ± 922.703
2025-05-13 10:28:57,639 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3155.7478, 2729.8562, 3057.0632, 2387.311, 3319.1265, 2538.6882, 1852.2379, 135.24335, 3399.0383, 2018.1814]
2025-05-13 10:28:57,640 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:28:57,645 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 5 hours, 9 minutes, 38 seconds)
2025-05-13 10:32:41,265 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:32:53,303 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3031.84058 ± 284.671
2025-05-13 10:32:53,303 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2870.9158, 3255.8118, 2763.8337, 2826.4705, 3058.1282, 2854.5225, 2989.7236, 3410.4763, 2687.2249, 3601.2986]
2025-05-13 10:32:53,303 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:32:53,303 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (3031.84) for latency ExtremeSparseL4U32
2025-05-13 10:32:53,307 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 5 hours, 6 minutes, 1 second)
2025-05-13 10:36:36,594 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:36:48,844 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2870.85986 ± 794.728
2025-05-13 10:36:48,844 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3378.2637, 3318.9875, 2474.4487, 2500.2295, 3384.588, 810.14026, 3523.961, 3492.5825, 2569.1326, 3256.2644]
2025-05-13 10:36:48,844 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:36:48,850 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 5 hours, 2 minutes, 22 seconds)
2025-05-13 10:40:32,130 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:40:44,400 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3018.90479 ± 932.680
2025-05-13 10:40:44,400 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3317.1584, 3954.7747, 3174.8445, 3817.319, 472.96704, 3385.302, 2860.0847, 2556.2651, 3446.4788, 3203.8542]
2025-05-13 10:40:44,400 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:40:44,404 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 4 hours, 58 minutes, 32 seconds)
2025-05-13 10:44:27,893 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:44:40,148 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3355.71948 ± 369.867
2025-05-13 10:44:40,149 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3340.6606, 3953.2405, 3398.6868, 3417.9568, 3417.3853, 3232.9524, 2439.8057, 3416.4817, 3712.3445, 3227.6812]
2025-05-13 10:44:40,149 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:44:40,149 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (3355.72) for latency ExtremeSparseL4U32
2025-05-13 10:44:40,155 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 4 hours, 54 minutes, 34 seconds)
2025-05-13 10:48:22,592 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:48:34,772 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3014.63745 ± 608.224
2025-05-13 10:48:34,772 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2796.2673, 2920.2434, 2757.2449, 3281.9563, 3034.9692, 3965.2388, 2892.6045, 3505.0352, 1545.4894, 3447.3254]
2025-05-13 10:48:34,772 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:48:34,777 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 4 hours, 50 minutes, 21 seconds)
2025-05-13 10:52:17,424 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:52:29,756 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2970.60107 ± 522.257
2025-05-13 10:52:29,757 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3083.1797, 2893.5386, 2511.6096, 1840.7947, 2722.8337, 3330.5999, 2985.655, 3680.2932, 3696.4133, 2961.0945]
2025-05-13 10:52:29,757 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:52:29,762 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 4 hours, 46 minutes, 16 seconds)
2025-05-13 10:56:12,891 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:56:25,284 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3115.75049 ± 811.467
2025-05-13 10:56:25,284 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4062.1167, 2670.2754, 3150.4634, 4160.872, 3701.1118, 3314.3203, 2674.4648, 2742.728, 1227.1354, 3454.0176]
2025-05-13 10:56:25,284 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:56:25,288 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 4 hours, 42 minutes, 20 seconds)
2025-05-13 11:00:08,070 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:00:20,253 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3250.46777 ± 391.009
2025-05-13 11:00:20,254 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3292.9717, 2984.7732, 3414.2117, 2422.1519, 3654.2395, 2933.9126, 3096.3882, 3460.0066, 3882.9172, 3363.1035]
2025-05-13 11:00:20,254 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:00:20,259 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 4 hours, 38 minutes, 17 seconds)
2025-05-13 11:04:02,925 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:04:15,184 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3172.21143 ± 350.988
2025-05-13 11:04:15,185 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3545.1206, 3778.9973, 2773.9197, 3078.344, 3603.2344, 3089.0432, 2607.3345, 2998.7297, 3050.7622, 3196.6287]
2025-05-13 11:04:15,185 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:04:15,192 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 4 hours, 34 minutes, 10 seconds)
2025-05-13 11:07:58,344 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:08:10,816 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3246.96631 ± 583.410
2025-05-13 11:08:10,817 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3616.884, 3545.9326, 1692.907, 3603.7622, 3610.0066, 3326.177, 3760.56, 2798.751, 3130.685, 3383.9976]
2025-05-13 11:08:10,817 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:08:10,824 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 4 hours, 30 minutes, 29 seconds)
2025-05-13 11:11:53,608 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:12:05,846 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3500.30420 ± 542.369
2025-05-13 11:12:05,846 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4108.7627, 2393.3884, 2832.011, 4099.453, 3043.9905, 3858.6382, 3826.2441, 3696.881, 3728.0754, 3415.5984]
2025-05-13 11:12:05,846 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:12:05,846 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (3500.30) for latency ExtremeSparseL4U32
2025-05-13 11:12:05,854 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 4 hours, 26 minutes, 34 seconds)
2025-05-13 11:15:48,549 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:16:00,670 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3202.63135 ± 635.814
2025-05-13 11:16:00,670 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3285.9666, 2645.351, 2688.735, 4045.7954, 3894.429, 1849.0798, 2952.7463, 3556.212, 3503.9482, 3604.0525]
2025-05-13 11:16:00,670 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:16:00,676 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 4 hours, 22 minutes, 30 seconds)
2025-05-13 11:19:43,679 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:19:55,832 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3515.13818 ± 472.685
2025-05-13 11:19:55,832 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3340.2405, 3652.7517, 2967.276, 3801.623, 3766.9106, 3482.406, 2412.2153, 3993.5735, 4016.9812, 3717.4055]
2025-05-13 11:19:55,832 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:19:55,832 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (3515.14) for latency ExtremeSparseL4U32
2025-05-13 11:19:55,836 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 4 hours, 18 minutes, 37 seconds)
2025-05-13 11:23:38,814 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:23:50,844 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3209.50171 ± 560.480
2025-05-13 11:23:50,844 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2997.202, 3378.4548, 3696.5493, 3152.599, 2148.529, 3325.0742, 3948.3506, 3481.5994, 3678.3289, 2288.3303]
2025-05-13 11:23:50,844 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:23:50,850 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 4 hours, 14 minutes, 43 seconds)
2025-05-13 11:27:34,105 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:27:46,390 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2753.36328 ± 860.463
2025-05-13 11:27:46,390 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3689.7727, 3936.3235, 2320.722, 1473.3616, 3172.6223, 1747.1635, 3609.5574, 3364.5903, 1810.7754, 2408.744]
2025-05-13 11:27:46,390 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:27:46,399 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 4 hours, 10 minutes, 47 seconds)
2025-05-13 11:31:23,892 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:31:35,962 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3281.53003 ± 534.661
2025-05-13 11:31:35,962 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2898.4023, 3618.827, 3372.4941, 3626.627, 2546.4602, 4086.2744, 2431.571, 3917.9006, 2908.3481, 3408.3967]
2025-05-13 11:31:35,962 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:31:35,973 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 4 hours, 5 minutes, 43 seconds)
2025-05-13 11:35:09,658 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:35:21,704 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2919.71631 ± 978.483
2025-05-13 11:35:21,705 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3717.412, 2790.6125, 2695.4277, 4260.71, 3809.6177, 3052.4607, 1796.0477, 3472.9194, 754.2296, 2847.7263]
2025-05-13 11:35:21,705 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:35:21,711 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 3 hours, 59 minutes, 56 seconds)
2025-05-13 11:39:03,565 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:39:15,860 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3595.88550 ± 363.089
2025-05-13 11:39:15,860 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3168.2642, 3665.9197, 3798.9314, 3767.9053, 3737.7852, 4030.8005, 3180.1433, 4044.654, 2908.4143, 3656.039]
2025-05-13 11:39:15,860 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:39:15,860 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (3595.89) for latency ExtremeSparseL4U32
2025-05-13 11:39:15,867 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 3 hours, 55 minutes, 52 seconds)
2025-05-13 11:42:58,234 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:43:10,505 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3593.87427 ± 431.952
2025-05-13 11:43:10,505 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3650.6, 4120.9917, 2861.972, 4232.569, 3275.1475, 4030.6223, 3681.1582, 3575.157, 3491.4, 3019.1267]
2025-05-13 11:43:10,505 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:43:10,513 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 3 hours, 51 minutes, 55 seconds)
2025-05-13 11:46:52,578 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:47:05,011 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3570.69409 ± 448.654
2025-05-13 11:47:05,011 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2871.054, 3650.3635, 3558.2793, 3661.9556, 2890.7869, 4313.1606, 3171.4927, 3988.2527, 3634.3108, 3967.2869]
2025-05-13 11:47:05,011 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:47:05,020 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 47 minutes, 51 seconds)
2025-05-13 11:50:42,531 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:50:54,604 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3456.57275 ± 1122.884
2025-05-13 11:50:54,605 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3964.7493, 390.5251, 3481.8545, 3255.9373, 4623.826, 4491.707, 3630.1748, 4089.5417, 3373.9915, 3263.4167]
2025-05-13 11:50:54,605 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:50:54,615 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 44 minutes)
2025-05-13 11:54:31,591 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:54:43,893 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3387.71826 ± 1047.989
2025-05-13 11:54:43,893 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3588.8481, 4203.552, 3504.9053, 3540.9692, 3700.927, 4404.0312, 4196.101, 3181.7688, 3048.974, 507.10947]
2025-05-13 11:54:43,893 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:54:43,906 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 40 minutes, 49 seconds)
2025-05-13 11:58:18,577 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:58:30,574 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3192.92261 ± 843.157
2025-05-13 11:58:30,574 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1570.458, 3480.025, 3280.7466, 4033.8438, 3457.8025, 1582.975, 3876.3745, 3817.532, 3239.0605, 3590.4077]
2025-05-13 11:58:30,574 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:58:30,584 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 35 minutes, 32 seconds)
2025-05-13 12:02:04,492 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:02:16,653 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3413.82495 ± 861.890
2025-05-13 12:02:16,653 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2774.0203, 4442.1943, 4059.0662, 3897.5125, 4011.9192, 1570.5785, 3167.9321, 3748.74, 4044.9824, 2421.3066]
2025-05-13 12:02:16,653 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:02:16,661 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 3 hours, 30 minutes, 7 seconds)
2025-05-13 12:05:52,800 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:06:04,890 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3745.51709 ± 517.317
2025-05-13 12:06:04,891 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3942.0186, 3019.0222, 3813.329, 3529.2522, 3878.498, 2948.6716, 4274.353, 3260.5342, 4549.465, 4240.029]
2025-05-13 12:06:04,891 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:06:04,891 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (3745.52) for latency ExtremeSparseL4U32
2025-05-13 12:06:04,899 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 3 hours, 25 minutes, 10 seconds)
2025-05-13 12:09:43,341 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:09:55,595 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3772.68091 ± 506.400
2025-05-13 12:09:55,595 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4253.088, 4278.394, 3390.8406, 3658.2551, 4126.823, 4108.5645, 3682.4663, 4010.7515, 2502.809, 3714.8157]
2025-05-13 12:09:55,595 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:09:55,595 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (3772.68) for latency ExtremeSparseL4U32
2025-05-13 12:09:55,602 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 3 hours, 21 minutes, 34 seconds)
2025-05-13 12:13:34,657 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:13:46,873 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3518.34644 ± 889.573
2025-05-13 12:13:46,873 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3906.1223, 3125.2263, 4047.6724, 3473.905, 4069.165, 4208.927, 3929.2783, 3967.8718, 1035.3217, 3419.974]
2025-05-13 12:13:46,873 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:13:46,882 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 3 hours, 18 minutes, 6 seconds)
2025-05-13 12:17:25,843 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:17:38,205 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3683.25073 ± 479.367
2025-05-13 12:17:38,205 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3645.0056, 3387.7466, 4270.7266, 4323.174, 3378.2874, 2707.993, 3946.433, 3600.2107, 4182.0117, 3390.9219]
2025-05-13 12:17:38,205 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:17:38,211 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 3 hours, 15 minutes, 5 seconds)
2025-05-13 12:21:16,591 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:21:28,616 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3581.50146 ± 489.633
2025-05-13 12:21:28,616 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3070.3257, 2875.8975, 3713.8752, 3897.4639, 2757.5183, 4294.671, 3808.1765, 3902.7017, 3498.341, 3996.0469]
2025-05-13 12:21:28,616 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:21:28,627 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 3 hours, 11 minutes, 59 seconds)
2025-05-13 12:25:04,468 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:25:16,575 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3442.29565 ± 639.828
2025-05-13 12:25:16,576 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3040.6602, 4011.8496, 3826.1719, 3872.514, 2267.6267, 3921.3306, 4022.6326, 3239.8542, 3840.8865, 2379.43]
2025-05-13 12:25:16,576 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:25:16,581 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 3 hours, 8 minutes, 6 seconds)
2025-05-13 12:28:50,728 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:29:02,809 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3505.47144 ± 850.270
2025-05-13 12:29:02,809 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3931.537, 3720.3357, 3764.7695, 4331.898, 3016.1482, 3508.2522, 3936.5706, 4080.705, 3601.3096, 1163.1863]
2025-05-13 12:29:02,809 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:29:02,819 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 3 hours, 3 minutes, 33 seconds)
2025-05-13 12:32:36,638 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:32:48,928 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3632.77197 ± 443.365
2025-05-13 12:32:48,928 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4054.3984, 3499.1665, 4360.9346, 4168.265, 3593.0066, 2749.7568, 3512.6956, 3680.883, 3351.2446, 3357.3657]
2025-05-13 12:32:48,928 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:32:48,939 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 2 hours, 58 minutes, 55 seconds)
2025-05-13 12:36:21,884 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:36:33,909 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3737.50342 ± 763.700
2025-05-13 12:36:33,909 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3876.8848, 3191.8276, 3909.712, 4109.654, 3779.4019, 3654.5686, 4629.056, 1889.0505, 3552.185, 4782.693]
2025-05-13 12:36:33,909 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:36:33,919 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 2 hours, 54 minutes, 8 seconds)
2025-05-13 12:40:06,775 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:40:18,936 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3445.23389 ± 863.102
2025-05-13 12:40:18,936 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1597.0354, 3777.3638, 4270.4956, 2967.9463, 3913.645, 3226.7168, 2430.8835, 4606.097, 4022.1545, 3640.001]
2025-05-13 12:40:18,936 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:40:18,948 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 2 hours, 49 minutes, 32 seconds)
2025-05-13 12:43:51,896 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:44:04,110 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3551.98169 ± 541.982
2025-05-13 12:44:04,110 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4003.7993, 4510.933, 3928.1963, 4043.1743, 3532.4714, 3475.9094, 2901.613, 2725.698, 3385.8743, 3012.1458]
2025-05-13 12:44:04,110 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:44:04,121 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 45 minutes, 22 seconds)
2025-05-13 12:47:37,235 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:47:49,580 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3953.47803 ± 362.730
2025-05-13 12:47:49,580 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4122.654, 4440.738, 4455.6, 3704.928, 4348.162, 3445.1897, 3694.2644, 3451.0237, 3909.6548, 3962.5684]
2025-05-13 12:47:49,580 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:47:49,580 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (3953.48) for latency ExtremeSparseL4U32
2025-05-13 12:47:49,590 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 41 minutes, 30 seconds)
2025-05-13 12:51:22,918 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:51:34,921 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3540.27930 ± 782.332
2025-05-13 12:51:34,921 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4137.019, 4211.702, 3255.7993, 3356.3474, 4544.012, 3848.1753, 4035.152, 1669.7567, 3375.0034, 2969.8274]
2025-05-13 12:51:34,921 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:51:34,932 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 37 minutes, 38 seconds)
2025-05-13 12:55:08,167 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:55:20,474 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3836.25830 ± 494.816
2025-05-13 12:55:20,474 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3716.4111, 4002.4026, 4428.318, 3162.6868, 3595.6562, 2915.4111, 4295.68, 4132.7734, 4444.0957, 3669.146]
2025-05-13 12:55:20,474 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:55:20,485 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 33 minutes, 57 seconds)
2025-05-13 12:58:53,755 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:59:05,880 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3528.98975 ± 943.652
2025-05-13 12:59:05,880 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3850.225, 3885.7327, 3648.952, 4327.11, 3516.973, 3438.6516, 3205.376, 913.3132, 4359.68, 4143.888]
2025-05-13 12:59:05,881 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:59:05,889 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 30 minutes, 15 seconds)
2025-05-13 13:02:39,227 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:02:51,492 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3619.12646 ± 922.812
2025-05-13 13:02:51,492 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4318.0957, 1509.1783, 4174.1826, 2213.644, 3911.3662, 4369.7695, 3816.0447, 3546.4739, 4178.3345, 4154.178]
2025-05-13 13:02:51,492 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:02:51,501 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 26 minutes, 33 seconds)
2025-05-13 13:06:25,003 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:06:37,215 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3818.10986 ± 529.049
2025-05-13 13:06:37,215 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3792.9995, 3305.208, 3087.176, 3980.2266, 3069.7246, 3569.968, 4331.517, 4270.739, 4731.604, 4041.9329]
2025-05-13 13:06:37,215 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:06:37,226 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 22 minutes, 50 seconds)
2025-05-13 13:10:10,348 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:10:22,369 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3782.32471 ± 403.124
2025-05-13 13:10:22,369 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2996.688, 4105.33, 3488.848, 3909.657, 4357.295, 3494.414, 3679.7317, 4385.039, 3729.2957, 3676.9446]
2025-05-13 13:10:22,369 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:10:22,380 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 19 minutes, 3 seconds)
2025-05-13 13:13:55,361 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:14:07,787 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3687.24756 ± 470.531
2025-05-13 13:14:07,787 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2534.168, 3836.6384, 4078.1113, 3804.9434, 3640.5068, 4129.232, 3495.6353, 3374.4045, 3688.8137, 4290.0215]
2025-05-13 13:14:07,787 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:14:07,793 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 15 minutes, 16 seconds)
2025-05-13 13:17:40,781 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:17:53,164 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3462.70898 ± 844.991
2025-05-13 13:17:53,164 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3464.5369, 4060.6272, 4143.4795, 4074.906, 4298.746, 3429.5613, 3285.0479, 1298.9457, 3769.4258, 2801.814]
2025-05-13 13:17:53,164 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:17:53,172 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 11 minutes, 30 seconds)
2025-05-13 13:21:26,335 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:21:38,753 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3448.49463 ± 996.395
2025-05-13 13:21:38,753 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3933.8767, 2778.9912, 4476.7397, 1039.1692, 3315.747, 3504.75, 3931.4731, 4131.7783, 2806.4019, 4566.0156]
2025-05-13 13:21:38,753 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:21:38,760 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 7 minutes, 45 seconds)
2025-05-13 13:25:11,803 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:25:23,865 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3546.13135 ± 605.346
2025-05-13 13:25:23,865 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3622.9182, 4052.8433, 4118.987, 3204.9258, 3567.932, 3591.2668, 3747.4666, 3342.0857, 4221.312, 1991.5732]
2025-05-13 13:25:23,865 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:25:23,876 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 2 hours, 3 minutes, 55 seconds)
2025-05-13 13:28:57,312 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:29:09,681 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3781.56445 ± 519.277
2025-05-13 13:29:09,681 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3806.074, 4270.202, 4428.857, 3438.9534, 3579.1465, 3444.5088, 4636.4893, 3961.3342, 3390.4653, 2859.6125]
2025-05-13 13:29:09,681 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:29:09,691 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 2 hours, 14 seconds)
2025-05-13 13:32:44,236 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:32:56,449 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3468.06323 ± 567.557
2025-05-13 13:32:56,449 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3287.5334, 3986.5278, 2192.6958, 3119.618, 4020.388, 3181.8564, 3641.8958, 3381.6533, 3538.9207, 4329.5454]
2025-05-13 13:32:56,449 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:32:56,459 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 1 hour, 56 minutes, 37 seconds)
2025-05-13 13:36:29,572 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:36:41,608 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3426.21802 ± 642.431
2025-05-13 13:36:41,608 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2509.472, 4070.3872, 2858.9194, 3157.7524, 2956.803, 3946.2698, 3312.726, 3502.8474, 3146.0645, 4800.939]
2025-05-13 13:36:41,608 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:36:41,620 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 1 hour, 52 minutes, 50 seconds)
2025-05-13 13:40:14,745 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:40:26,789 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3631.22925 ± 574.414
2025-05-13 13:40:26,789 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4086.7363, 3696.772, 2523.8154, 4274.3926, 3450.8225, 4560.881, 3573.814, 3766.8933, 2926.1904, 3451.9766]
2025-05-13 13:40:26,789 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:40:26,802 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 49 minutes, 2 seconds)
2025-05-13 13:43:59,975 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:44:12,236 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3469.16333 ± 1013.134
2025-05-13 13:44:12,237 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4062.1118, 3613.5237, 3429.2375, 3114.2114, 4381.772, 3621.2227, 4264.1973, 906.5536, 2742.6516, 4556.15]
2025-05-13 13:44:12,237 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:44:12,245 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 45 minutes, 18 seconds)
2025-05-13 13:47:45,639 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:47:57,770 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3834.26099 ± 595.213
2025-05-13 13:47:57,770 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3636.3003, 4033.7803, 4225.5947, 4397.1313, 3534.2002, 2546.7869, 4798.31, 3972.9321, 3293.2734, 3904.302]
2025-05-13 13:47:57,770 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:47:57,781 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 41 minutes, 31 seconds)
2025-05-13 13:51:31,135 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:51:43,408 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3788.22534 ± 470.810
2025-05-13 13:51:43,408 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4243.9575, 4003.487, 4340.0815, 3154.5461, 3280.1438, 4273.6167, 3908.6704, 3445.5798, 4150.6772, 3081.4978]
2025-05-13 13:51:43,408 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:51:43,421 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 37 minutes, 40 seconds)
2025-05-13 13:55:17,347 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:55:29,771 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3833.66406 ± 507.785
2025-05-13 13:55:29,771 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4528.7236, 3503.5059, 3566.5952, 3869.223, 3874.7388, 2634.8577, 3960.036, 3892.4949, 4480.64, 4025.823]
2025-05-13 13:55:29,771 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:55:29,780 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 34 minutes)
2025-05-13 13:59:08,743 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:59:21,125 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3260.12622 ± 986.985
2025-05-13 13:59:21,125 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4346.517, 3823.69, 2823.5042, 1806.8901, 3705.21, 1251.8182, 3924.2021, 3375.031, 3167.6438, 4376.754]
2025-05-13 13:59:21,125 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:59:21,135 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 30 minutes, 44 seconds)
2025-05-13 14:02:59,905 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:03:12,302 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3582.31787 ± 1015.530
2025-05-13 14:03:12,302 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4154.2173, 728.0168, 3778.6777, 3544.7446, 4247.2886, 3647.717, 3525.259, 4620.4756, 3483.0142, 4093.77]
2025-05-13 14:03:12,302 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:03:12,312 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 27 minutes, 24 seconds)
2025-05-13 14:06:51,166 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:07:03,243 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3810.85229 ± 373.004
2025-05-13 14:07:03,243 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3160.277, 4075.814, 4473.293, 3778.039, 4090.0823, 3671.8784, 4131.64, 3647.5322, 3745.3691, 3334.6]
2025-05-13 14:07:03,243 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:07:03,255 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 24 minutes)
2025-05-13 14:10:42,063 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:10:54,130 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3593.47192 ± 388.361
2025-05-13 14:10:54,130 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4033.249, 3481.6582, 2975.2996, 3803.115, 3684.0012, 3794.0461, 3541.1562, 3492.659, 4202.4473, 2927.0857]
2025-05-13 14:10:54,130 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:10:54,141 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 20 minutes, 33 seconds)
2025-05-13 14:14:32,529 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:14:44,657 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3368.60938 ± 1165.143
2025-05-13 14:14:44,658 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3099.0798, 4241.04, 848.6145, 3707.086, 3338.875, 4721.8325, 3615.1853, 3840.6375, 1713.1891, 4560.5547]
2025-05-13 14:14:44,658 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:14:44,668 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 16 minutes, 59 seconds)
2025-05-13 14:18:23,153 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:18:35,256 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3434.01807 ± 989.617
2025-05-13 14:18:35,256 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2652.384, 1667.5485, 4136.7617, 3908.7986, 3214.961, 3999.1252, 4788.0425, 4371.8726, 3654.5718, 1946.1125]
2025-05-13 14:18:35,256 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:18:35,265 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 13 minutes, 5 seconds)
2025-05-13 14:22:13,812 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:22:26,161 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4025.95312 ± 603.100
2025-05-13 14:22:26,161 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4744.066, 3701.738, 4681.7534, 3804.438, 3260.6873, 4624.416, 3139.4026, 3610.8745, 3882.2112, 4809.943]
2025-05-13 14:22:26,161 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:22:26,161 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (4025.95) for latency ExtremeSparseL4U32
2025-05-13 14:22:26,169 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 9 minutes, 13 seconds)
2025-05-13 14:26:04,865 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:26:17,267 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3583.03467 ± 1028.543
2025-05-13 14:26:17,267 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3753.9648, 932.1148, 4374.5913, 3730.6802, 3179.318, 3238.1443, 4705.288, 3336.5786, 3920.5205, 4659.149]
2025-05-13 14:26:17,267 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:26:17,275 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 5 minutes, 23 seconds)
2025-05-13 14:29:56,139 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:30:08,244 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3819.48828 ± 602.662
2025-05-13 14:30:08,244 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4305.938, 4051.4744, 4316.4556, 3550.7761, 3990.3403, 4526.9604, 2761.2297, 2719.374, 3754.9243, 4217.408]
2025-05-13 14:30:08,244 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:30:08,254 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 1 hour, 1 minute, 33 seconds)
2025-05-13 14:33:47,136 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:33:59,411 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3846.35669 ± 531.983
2025-05-13 14:33:59,411 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3181.8362, 4447.3105, 4344.3413, 3868.963, 3854.6968, 3198.6104, 3759.7407, 4184.0234, 3026.1135, 4597.9307]
2025-05-13 14:33:59,411 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:33:59,419 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 57 minutes, 44 seconds)
2025-05-13 14:37:38,338 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:37:50,512 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3733.51318 ± 342.239
2025-05-13 14:37:50,512 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3693.7134, 3926.5771, 3992.7344, 3436.4646, 3148.5737, 4198.2026, 3971.264, 3437.5386, 4134.3413, 3395.7212]
2025-05-13 14:37:50,512 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:37:50,521 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 53 minutes, 54 seconds)
2025-05-13 14:41:29,698 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:41:42,055 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3814.86523 ± 473.016
2025-05-13 14:41:42,055 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3942.6062, 4392.4165, 3167.9812, 3832.9224, 4245.4297, 3813.5, 3035.3335, 3905.2825, 3336.1914, 4476.9897]
2025-05-13 14:41:42,055 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:41:42,063 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 50 minutes, 5 seconds)
2025-05-13 14:45:20,813 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:45:32,862 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4070.14136 ± 302.114
2025-05-13 14:45:32,863 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4336.0376, 3952.764, 4472.8955, 4125.542, 3793.3872, 4257.6084, 4168.5356, 3581.1614, 3623.0696, 4390.4155]
2025-05-13 14:45:32,863 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:45:32,863 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (4070.14) for latency ExtremeSparseL4U32
2025-05-13 14:45:32,872 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 46 minutes, 13 seconds)
2025-05-13 14:49:11,588 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:49:23,728 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3889.27734 ± 606.559
2025-05-13 14:49:23,728 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4112.24, 2735.3262, 4143.621, 3373.5005, 3534.929, 3988.1707, 4962.6807, 3385.8633, 4471.04, 4185.403]
2025-05-13 14:49:23,729 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:49:23,739 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 42 minutes, 22 seconds)
2025-05-13 14:53:02,485 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:53:14,763 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4082.54028 ± 411.465
2025-05-13 14:53:14,763 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4145.042, 4409.235, 3465.752, 3883.0156, 4618.3086, 4097.8774, 3837.4473, 4081.5017, 4778.9644, 3508.2583]
2025-05-13 14:53:14,763 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:53:14,763 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (4082.54) for latency ExtremeSparseL4U32
2025-05-13 14:53:14,772 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 38 minutes, 30 seconds)
2025-05-13 14:56:53,352 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:57:05,791 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3685.29224 ± 841.593
2025-05-13 14:57:05,791 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4529.4116, 4475.4326, 1859.9183, 4829.3936, 3272.5444, 4153.069, 2893.0447, 3394.4858, 3742.1863, 3703.4329]
2025-05-13 14:57:05,791 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:57:05,802 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 34 minutes, 39 seconds)
2025-05-13 15:00:44,484 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:00:56,855 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3886.69604 ± 487.741
2025-05-13 15:00:56,855 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4479.9136, 4026.4443, 3880.7224, 4003.2908, 3766.5608, 3782.4768, 3327.2932, 2816.6914, 4376.161, 4407.406]
2025-05-13 15:00:56,855 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:00:56,869 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 30 minutes, 47 seconds)
2025-05-13 15:04:34,110 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:04:46,375 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3738.49023 ± 637.578
2025-05-13 15:04:46,375 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4299.3984, 4746.276, 3701.4026, 3055.82, 2729.1785, 3654.6433, 3832.1477, 3909.4758, 4503.985, 2952.5742]
2025-05-13 15:04:46,375 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:04:46,386 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 26 minutes, 54 seconds)
2025-05-13 15:08:19,323 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:08:31,549 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3482.66797 ± 551.067
2025-05-13 15:08:31,549 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3414.8838, 3624.808, 4150.045, 2694.6946, 2794.1794, 3820.7524, 3393.3281, 2829.4587, 4422.257, 3682.2751]
2025-05-13 15:08:31,549 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:08:31,560 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 22 minutes, 57 seconds)
2025-05-13 15:12:04,584 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:12:16,906 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4063.83008 ± 329.210
2025-05-13 15:12:16,906 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4777.103, 3935.383, 4080.6611, 4295.7886, 4065.0881, 3921.2817, 3965.4473, 4337.634, 3522.7395, 3737.177]
2025-05-13 15:12:16,906 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:12:16,920 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 19 minutes, 2 seconds)
2025-05-13 15:15:49,938 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:16:02,029 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3933.91138 ± 630.472
2025-05-13 15:16:02,029 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2854.592, 4867.837, 4066.9045, 3804.418, 3697.4294, 3055.879, 4539.353, 3567.125, 4612.321, 4273.255]
2025-05-13 15:16:02,029 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:16:02,039 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 15 minutes, 8 seconds)
2025-05-13 15:19:35,235 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:19:47,500 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3646.42261 ± 1050.874
2025-05-13 15:19:47,500 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4665.415, 4329.7275, 2592.6191, 3844.0928, 2737.892, 4686.958, 1317.1307, 3552.7422, 4335.0723, 4402.576]
2025-05-13 15:19:47,500 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:19:47,513 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 11 minutes, 18 seconds)
2025-05-13 15:23:20,858 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:23:33,102 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3641.33862 ± 645.008
2025-05-13 15:23:33,102 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3179.242, 4286.0703, 4038.4155, 4104.9893, 2788.7263, 4015.6714, 2255.616, 3691.9028, 4216.283, 3836.4707]
2025-05-13 15:23:33,102 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:23:33,114 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 7 minutes, 30 seconds)
2025-05-13 15:27:06,456 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:27:18,580 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3939.38745 ± 562.831
2025-05-13 15:27:18,580 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4625.346, 3700.6682, 3898.5042, 2894.7524, 4481.084, 4194.883, 3363.1746, 4228.0547, 3385.238, 4622.167]
2025-05-13 15:27:18,580 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:27:18,591 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 3 minutes, 45 seconds)
2025-05-13 15:31:10,509 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:31:23,547 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4045.54736 ± 397.645
2025-05-13 15:31:23,547 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4262.9663, 3764.779, 4469.099, 4875.913, 3653.5693, 3716.075, 3591.1472, 4320.077, 3913.8083, 3888.0398]
2025-05-13 15:31:23,547 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:31:23,561 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1251 [DEBUG]: Training session finished
