2025-05-13 09:06:32,178 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-bpql-mda-mem32
2025-05-13 09:06:32,178 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-bpql-mda-mem32
2025-05-13 09:06:32,178 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x149fcde719d0>}
2025-05-13 09:06:32,178 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1111 [DEBUG]: using device: cuda
2025-05-13 09:06:32,182 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1133 [INFO]: Creating new trainer
2025-05-13 09:06:32,198 baseline-bpql-mda-noisy-halfcheetah:119 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-05-13 09:06:32,199 baseline-bpql-mda-noisy-halfcheetah:120 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-13 09:06:32,204 baseline-bpql-mda-noisy-halfcheetah:149 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Identity()
  (net_rec): GRU(6, 384, batch_first=True)
)
2025-05-13 09:06:32,849 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1194 [DEBUG]: Starting training session...
2025-05-13 09:06:32,849 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 1/100
2025-05-13 09:10:38,740 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:11:00,742 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -409.95444 ± 3.545
2025-05-13 09:11:00,742 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-415.61673, -406.2605, -413.89032, -413.09692, -410.6741, -406.92072, -407.48248, -408.11865, -404.71255, -412.77112]
2025-05-13 09:11:00,742 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:11:00,743 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (-409.95) for latency ExtremeSparseL4U32
2025-05-13 09:11:00,750 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 7 hours, 22 minutes, 2 seconds)
2025-05-13 09:15:11,289 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:15:33,294 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -242.66092 ± 12.459
2025-05-13 09:15:33,295 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-237.54059, -252.98198, -248.58594, -221.03131, -262.57187, -243.90091, -255.00896, -240.41554, -240.82367, -223.74821]
2025-05-13 09:15:33,295 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:15:33,295 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (-242.66) for latency ExtremeSparseL4U32
2025-05-13 09:15:33,300 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 7 hours, 21 minutes, 22 seconds)
2025-05-13 09:19:43,862 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:20:05,702 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 336.85098 ± 86.333
2025-05-13 09:20:05,702 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [281.01245, 371.35623, 186.97949, 334.0114, 379.6915, 255.34563, 271.5027, 380.46106, 507.0957, 401.05365]
2025-05-13 09:20:05,702 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:20:05,702 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (336.85) for latency ExtremeSparseL4U32
2025-05-13 09:20:05,709 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 7 hours, 18 minutes, 2 seconds)
2025-05-13 09:24:16,021 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:24:37,819 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 843.27686 ± 409.906
2025-05-13 09:24:37,820 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [470.48984, 518.025, 446.7504, 1049.7676, 723.5217, 1858.7556, 935.9889, 926.67163, 1032.1333, 470.66492]
2025-05-13 09:24:37,820 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:24:37,820 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (843.28) for latency ExtremeSparseL4U32
2025-05-13 09:24:37,828 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 7 hours, 13 minutes, 59 seconds)
2025-05-13 09:28:48,325 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:29:10,562 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1880.82947 ± 265.322
2025-05-13 09:29:10,562 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1996.333, 1730.5199, 1738.8014, 2072.1892, 2070.8274, 1169.0342, 2040.0974, 2000.3242, 2032.0103, 1958.1584]
2025-05-13 09:29:10,562 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:29:10,562 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (1880.83) for latency ExtremeSparseL4U32
2025-05-13 09:29:10,569 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 7 hours, 9 minutes, 56 seconds)
2025-05-13 09:33:21,389 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:33:43,781 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2406.69385 ± 340.579
2025-05-13 09:33:43,781 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2614.7314, 2550.743, 2610.9194, 1822.9805, 2521.7856, 2399.315, 2687.2905, 1666.8419, 2629.8188, 2562.512]
2025-05-13 09:33:43,781 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:33:43,781 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (2406.69) for latency ExtremeSparseL4U32
2025-05-13 09:33:43,791 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 7 hours, 7 minutes, 5 seconds)
2025-05-13 09:37:54,793 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:38:16,810 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1959.62378 ± 752.737
2025-05-13 09:38:16,810 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [567.71875, 2734.7007, 2656.4724, 1520.5973, 2436.2842, 1754.591, 2662.2864, 992.12024, 2722.4937, 1548.9735]
2025-05-13 09:38:16,810 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:38:16,817 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 7 hours, 2 minutes, 41 seconds)
2025-05-13 09:42:27,430 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:42:49,354 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1750.06873 ± 793.682
2025-05-13 09:42:49,354 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2558.7837, 1320.7155, 902.6346, 625.7749, 2647.8145, 636.78894, 2350.0708, 1591.1527, 2194.6484, 2672.3044]
2025-05-13 09:42:49,354 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:42:49,360 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 6 hours, 58 minutes, 11 seconds)
2025-05-13 09:47:00,361 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:47:22,323 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1821.18726 ± 789.795
2025-05-13 09:47:22,323 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2791.5227, 2550.245, 2781.0652, 1244.7378, 941.78796, 1953.4012, 1226.0415, 1037.4066, 2778.4473, 907.2167]
2025-05-13 09:47:22,323 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:47:22,332 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 6 hours, 53 minutes, 53 seconds)
2025-05-13 09:51:32,768 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:51:54,515 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1505.01880 ± 635.560
2025-05-13 09:51:54,515 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2275.33, 1163.7527, 956.39435, 816.5721, 2234.7974, 2773.7932, 1216.7322, 1088.4407, 1125.0095, 1399.3647]
2025-05-13 09:51:54,515 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:51:54,523 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 6 hours, 49 minutes, 11 seconds)
2025-05-13 09:56:03,501 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 09:56:25,163 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2675.02930 ± 411.702
2025-05-13 09:56:25,163 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2627.8992, 2922.1702, 2825.2717, 2071.104, 3005.6113, 1724.3038, 2888.2056, 2732.986, 2956.2385, 2996.503]
2025-05-13 09:56:25,163 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:56:25,163 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (2675.03) for latency ExtremeSparseL4U32
2025-05-13 09:56:25,170 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 6 hours, 43 minutes, 52 seconds)
2025-05-13 10:00:33,394 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:00:55,073 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1750.53711 ± 734.312
2025-05-13 10:00:55,074 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1976.6149, 1373.9789, 2421.6055, 711.22656, 1881.4473, 1467.0955, 2847.1746, 839.03406, 1151.0839, 2836.1094]
2025-05-13 10:00:55,074 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:00:55,080 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 6 hours, 38 minutes, 25 seconds)
2025-05-13 10:05:03,059 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:05:24,573 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1772.98499 ± 819.009
2025-05-13 10:05:24,573 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [745.75323, 1554.733, 1117.7533, 2907.958, 2164.1445, 780.242, 1881.1685, 3005.3577, 2566.7803, 1005.95844]
2025-05-13 10:05:24,573 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:05:24,578 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 6 hours, 33 minutes)
2025-05-13 10:09:32,283 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:09:54,114 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2227.40918 ± 707.422
2025-05-13 10:09:54,114 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1336.0948, 1962.979, 3078.3389, 1795.2944, 974.0121, 2254.0576, 2167.4263, 3099.7024, 3142.398, 2463.7866]
2025-05-13 10:09:54,114 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:09:54,126 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 6 hours, 27 minutes, 30 seconds)
2025-05-13 10:14:01,802 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:14:23,573 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2736.46143 ± 715.572
2025-05-13 10:14:23,574 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3100.4504, 2864.8347, 3050.4167, 3062.7097, 1743.0421, 3194.5093, 3141.4895, 3076.7856, 972.3283, 3158.047]
2025-05-13 10:14:23,574 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:14:23,574 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (2736.46) for latency ExtremeSparseL4U32
2025-05-13 10:14:23,580 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 6 hours, 22 minutes, 13 seconds)
2025-05-13 10:18:31,753 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:18:53,268 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3117.79565 ± 62.067
2025-05-13 10:18:53,268 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3068.6719, 3030.1357, 3185.0066, 3129.1885, 3220.0908, 3171.8066, 3084.8572, 3124.8125, 3139.3843, 3024.002]
2025-05-13 10:18:53,268 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:18:53,268 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (3117.80) for latency ExtremeSparseL4U32
2025-05-13 10:18:53,277 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 6 hours, 17 minutes, 28 seconds)
2025-05-13 10:23:01,102 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:23:22,621 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2763.12402 ± 597.683
2025-05-13 10:23:22,621 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2941.678, 1738.7333, 3000.87, 3109.6328, 2203.077, 3233.5698, 3317.8186, 3213.9565, 3162.041, 1709.8628]
2025-05-13 10:23:22,621 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:23:22,627 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 6 hours, 12 minutes, 49 seconds)
2025-05-13 10:27:29,837 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:27:51,150 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3111.04224 ± 84.694
2025-05-13 10:27:51,150 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2992.512, 2995.9373, 3161.157, 3045.3098, 3234.5295, 3058.6257, 3106.2979, 3211.1455, 3207.8164, 3097.09]
2025-05-13 10:27:51,150 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:27:51,157 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 6 hours, 8 minutes, 3 seconds)
2025-05-13 10:31:58,076 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:32:19,436 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2921.63867 ± 714.941
2025-05-13 10:32:19,436 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3151.0745, 3245.883, 786.57434, 3082.2188, 3119.944, 3081.4922, 3249.7046, 3067.0466, 3248.8506, 3183.5974]
2025-05-13 10:32:19,436 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:32:19,444 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 6 hours, 3 minutes, 14 seconds)
2025-05-13 10:36:27,170 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:36:48,529 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3227.15454 ± 87.857
2025-05-13 10:36:48,529 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3163.4268, 3208.7517, 3215.6892, 3253.5437, 3224.5142, 3394.6033, 3271.0186, 3297.939, 3036.0122, 3206.0469]
2025-05-13 10:36:48,529 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:36:48,529 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (3227.15) for latency ExtremeSparseL4U32
2025-05-13 10:36:48,536 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 5 hours, 58 minutes, 39 seconds)
2025-05-13 10:40:56,129 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:41:17,782 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3204.17236 ± 85.449
2025-05-13 10:41:17,783 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3205.1216, 3308.7407, 3290.9473, 3123.0232, 3238.1265, 3051.2944, 3157.0164, 3319.5881, 3118.8574, 3229.0068]
2025-05-13 10:41:17,783 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:41:17,791 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 5 hours, 54 minutes, 3 seconds)
2025-05-13 10:45:25,342 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:45:47,011 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3134.24951 ± 153.456
2025-05-13 10:45:47,011 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3159.9158, 2758.1921, 3177.1167, 3171.0747, 3291.9832, 2933.614, 3211.1135, 3210.01, 3230.1677, 3199.3086]
2025-05-13 10:45:47,011 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:45:47,019 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 5 hours, 49 minutes, 32 seconds)
2025-05-13 10:49:54,478 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:50:15,917 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2992.45972 ± 530.798
2025-05-13 10:50:15,917 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3258.0696, 3163.8037, 3312.59, 3190.0093, 3236.7783, 3138.1682, 3085.904, 1418.6394, 3048.6096, 3072.0247]
2025-05-13 10:50:15,917 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:50:15,923 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 5 hours, 45 minutes, 9 seconds)
2025-05-13 10:54:23,325 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:54:45,007 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3196.58325 ± 74.272
2025-05-13 10:54:45,007 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3034.189, 3187.32, 3290.6147, 3311.349, 3148.6396, 3164.8235, 3189.0413, 3172.8674, 3228.9756, 3238.011]
2025-05-13 10:54:45,008 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:54:45,013 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 5 hours, 40 minutes, 52 seconds)
2025-05-13 10:58:52,238 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 10:59:13,787 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2784.07446 ± 813.492
2025-05-13 10:59:13,788 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3067.3706, 821.95447, 3243.6929, 3232.8506, 1577.9749, 3198.953, 3063.8635, 3331.3345, 3135.9631, 3166.7847]
2025-05-13 10:59:13,788 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:59:13,796 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 5 hours, 36 minutes, 18 seconds)
2025-05-13 11:03:21,144 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:03:42,627 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2931.52466 ± 728.747
2025-05-13 11:03:42,627 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3142.641, 750.5037, 3130.7153, 3149.3281, 3172.3057, 3201.3652, 3260.1936, 3137.5833, 3263.087, 3107.5234]
2025-05-13 11:03:42,627 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:03:42,633 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 5 hours, 31 minutes, 43 seconds)
2025-05-13 11:07:50,077 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:08:11,449 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2934.27222 ± 661.706
2025-05-13 11:08:11,449 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [960.8164, 3136.8108, 3278.0562, 3250.3298, 3005.2122, 3182.7273, 3123.3406, 3132.6343, 3110.3352, 3162.461]
2025-05-13 11:08:11,449 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:08:11,456 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 5 hours, 27 minutes, 8 seconds)
2025-05-13 11:12:18,989 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:12:40,865 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2887.01123 ± 745.315
2025-05-13 11:12:40,865 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3319.3481, 3187.523, 3185.6248, 2216.7612, 3267.4104, 2996.8572, 857.9586, 3246.9978, 3324.615, 3267.0186]
2025-05-13 11:12:40,865 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:12:40,874 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 5 hours, 22 minutes, 47 seconds)
2025-05-13 11:16:48,346 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:17:09,703 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3196.28125 ± 81.996
2025-05-13 11:17:09,703 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3155.2817, 3139.7424, 3097.5234, 3261.7947, 3316.4736, 3215.9893, 3253.578, 3062.5466, 3159.0774, 3300.8037]
2025-05-13 11:17:09,703 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:17:09,710 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 5 hours, 18 minutes, 14 seconds)
2025-05-13 11:21:17,152 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:21:38,645 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2785.66797 ± 819.364
2025-05-13 11:21:38,645 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3052.2988, 3349.0408, 931.83154, 2819.204, 3346.4705, 3288.4712, 1449.5718, 3248.304, 3167.8015, 3203.6868]
2025-05-13 11:21:38,645 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:21:38,650 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 5 hours, 13 minutes, 47 seconds)
2025-05-13 11:25:45,951 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:26:07,222 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3158.37891 ± 125.385
2025-05-13 11:26:07,222 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3260.9272, 2982.2207, 3229.6921, 3292.5693, 3206.497, 3271.0652, 3127.234, 3131.3374, 2885.6218, 3196.6245]
2025-05-13 11:26:07,222 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:26:07,229 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 5 hours, 9 minutes, 15 seconds)
2025-05-13 11:30:14,608 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:30:36,072 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2533.13184 ± 1076.973
2025-05-13 11:30:36,072 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [844.50494, 881.07153, 3073.29, 3315.8765, 3199.2686, 3344.0557, 952.62823, 3334.7178, 3171.853, 3214.0503]
2025-05-13 11:30:36,072 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:30:36,079 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 5 hours, 4 minutes, 46 seconds)
2025-05-13 11:34:43,583 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:35:05,360 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3015.15479 ± 700.717
2025-05-13 11:35:05,360 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3247.8188, 3273.9255, 3126.1914, 3190.6099, 3222.4463, 3263.473, 3354.9187, 3329.2134, 3221.711, 921.24176]
2025-05-13 11:35:05,360 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:35:05,366 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 5 hours, 16 seconds)
2025-05-13 11:39:12,997 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:39:34,351 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3250.03857 ± 67.983
2025-05-13 11:39:34,352 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3324.8728, 3316.7412, 3128.063, 3329.421, 3307.8835, 3257.4915, 3207.874, 3213.9636, 3256.074, 3158.0005]
2025-05-13 11:39:34,352 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:39:34,352 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (3250.04) for latency ExtremeSparseL4U32
2025-05-13 11:39:34,363 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 4 hours, 55 minutes, 49 seconds)
2025-05-13 11:43:41,588 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:44:02,939 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2199.23975 ± 1106.101
2025-05-13 11:44:02,939 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [866.87616, 3304.4468, 3040.3083, 880.23804, 3175.8667, 859.12646, 827.71014, 3057.0483, 2696.8894, 3283.8892]
2025-05-13 11:44:02,940 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:44:02,947 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 4 hours, 51 minutes, 15 seconds)
2025-05-13 11:48:10,324 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:48:31,629 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3097.07886 ± 246.928
2025-05-13 11:48:31,629 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3298.983, 3096.368, 2407.2427, 3068.6157, 3190.0427, 3120.9902, 3174.166, 3313.8293, 3260.9236, 3039.6265]
2025-05-13 11:48:31,629 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:48:31,637 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 4 hours, 46 minutes, 48 seconds)
2025-05-13 11:52:39,388 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:53:00,660 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3203.66211 ± 128.769
2025-05-13 11:53:00,660 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3345.8982, 3150.0728, 3161.1167, 2900.8772, 3301.71, 3278.8896, 3377.0793, 3146.5742, 3181.092, 3193.3105]
2025-05-13 11:53:00,660 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:53:00,667 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 4 hours, 42 minutes, 21 seconds)
2025-05-13 11:57:08,046 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 11:57:29,685 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3162.34180 ± 289.211
2025-05-13 11:57:29,685 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2353.2224, 3398.394, 3139.6167, 3201.5334, 3062.8206, 3287.692, 3228.2498, 3201.6125, 3406.072, 3344.2046]
2025-05-13 11:57:29,685 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:57:29,694 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 4 hours, 37 minutes, 49 seconds)
2025-05-13 12:01:37,062 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:01:58,350 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3240.43115 ± 107.148
2025-05-13 12:01:58,350 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3336.4524, 3114.0845, 3353.574, 2998.9956, 3177.5156, 3331.49, 3245.0388, 3263.7473, 3306.8381, 3276.577]
2025-05-13 12:01:58,350 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:01:58,359 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 4 hours, 33 minutes, 16 seconds)
2025-05-13 12:06:05,482 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:06:27,269 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2952.63354 ± 708.198
2025-05-13 12:06:27,269 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3201.1113, 2786.8948, 3277.7146, 3084.749, 3279.7427, 3258.7205, 3318.4685, 3296.1228, 3146.9165, 875.8957]
2025-05-13 12:06:27,269 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:06:27,278 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 4 hours, 28 minutes, 51 seconds)
2025-05-13 12:10:34,655 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:10:56,266 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2923.51953 ± 731.288
2025-05-13 12:10:56,266 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [981.73566, 3284.3965, 2139.1272, 3274.8145, 3278.984, 3366.5386, 3127.9653, 3245.4734, 3305.9722, 3230.1873]
2025-05-13 12:10:56,266 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:10:56,274 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 4 hours, 24 minutes, 26 seconds)
2025-05-13 12:15:03,627 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:15:24,989 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2831.42944 ± 764.505
2025-05-13 12:15:24,990 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1657.5392, 3102.5774, 3354.7322, 3284.6184, 3230.0461, 3211.2253, 3108.9626, 3104.7551, 3242.306, 1017.5309]
2025-05-13 12:15:24,990 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:15:24,999 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 4 hours, 19 minutes, 54 seconds)
2025-05-13 12:19:32,589 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:19:54,333 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3272.46167 ± 47.330
2025-05-13 12:19:54,333 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3287.4124, 3273.884, 3229.2056, 3173.6772, 3293.411, 3340.6643, 3239.6501, 3337.6907, 3282.1191, 3266.9033]
2025-05-13 12:19:54,333 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:19:54,333 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (3272.46) for latency ExtremeSparseL4U32
2025-05-13 12:19:54,342 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 4 hours, 15 minutes, 28 seconds)
2025-05-13 12:24:01,652 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:24:23,054 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3251.35327 ± 41.301
2025-05-13 12:24:23,055 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3284.8672, 3246.8625, 3257.4534, 3250.5225, 3219.9055, 3273.6934, 3272.3372, 3325.3706, 3216.2354, 3166.2874]
2025-05-13 12:24:23,055 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:24:23,062 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 4 hours, 11 minutes)
2025-05-13 12:28:30,368 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:28:51,680 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2782.96436 ± 930.728
2025-05-13 12:28:51,680 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3348.7786, 3078.788, 3190.063, 857.4089, 3282.681, 3347.5527, 3246.6804, 3241.8535, 3236.567, 999.27435]
2025-05-13 12:28:51,680 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:28:51,686 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 4 hours, 6 minutes, 28 seconds)
2025-05-13 12:32:58,916 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:33:20,242 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3014.20850 ± 682.273
2025-05-13 12:33:20,242 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [987.501, 3386.3418, 3354.2957, 3144.2642, 3290.7205, 3194.1326, 3113.733, 3146.6748, 3354.6082, 3169.8142]
2025-05-13 12:33:20,242 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:33:20,249 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 4 hours, 1 minute, 54 seconds)
2025-05-13 12:37:27,697 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:37:49,518 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3000.42480 ± 748.610
2025-05-13 12:37:49,518 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [772.6453, 3095.1602, 3271.144, 3347.5837, 3267.3523, 3409.3281, 3313.3125, 3163.2537, 3250.829, 3113.6392]
2025-05-13 12:37:49,518 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:37:49,527 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 3 hours, 57 minutes, 32 seconds)
2025-05-13 12:41:56,828 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:42:18,420 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2990.38721 ± 644.283
2025-05-13 12:42:18,420 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3307.2417, 1063.339, 3210.9185, 3177.451, 3205.6003, 3105.7083, 3191.7854, 3251.1157, 3220.6843, 3170.027]
2025-05-13 12:42:18,420 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:42:18,427 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 3 hours, 52 minutes, 58 seconds)
2025-05-13 12:46:25,683 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:46:47,631 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3002.14185 ± 683.486
2025-05-13 12:46:47,631 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [981.36334, 3321.5132, 3308.3809, 3184.8103, 3238.947, 2918.503, 3231.6885, 3288.7754, 3191.8315, 3355.6038]
2025-05-13 12:46:47,631 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:46:47,642 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 3 hours, 48 minutes, 34 seconds)
2025-05-13 12:50:54,875 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:51:16,315 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3240.06104 ± 63.887
2025-05-13 12:51:16,315 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3299.9397, 3178.51, 3322.6443, 3266.047, 3325.6555, 3207.6091, 3213.5476, 3236.301, 3240.1016, 3110.2556]
2025-05-13 12:51:16,315 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:51:16,324 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 3 hours, 44 minutes, 6 seconds)
2025-05-13 12:55:23,536 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 12:55:45,123 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3050.05542 ± 557.785
2025-05-13 12:55:45,123 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3195.8618, 3260.6216, 3216.8, 1390.8567, 3324.636, 3070.3145, 3314.659, 3169.512, 3290.83, 3266.4634]
2025-05-13 12:55:45,123 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:55:45,132 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 3 hours, 39 minutes, 39 seconds)
2025-05-13 12:59:52,372 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:00:13,605 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3056.37720 ± 310.533
2025-05-13 13:00:13,605 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3236.534, 2482.3896, 3236.8833, 3265.2302, 3202.613, 3025.6694, 3107.2441, 2436.8127, 3205.521, 3364.875]
2025-05-13 13:00:13,605 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:00:13,612 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 3 hours, 35 minutes, 3 seconds)
2025-05-13 13:04:21,118 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:04:42,750 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2821.83398 ± 903.735
2025-05-13 13:04:42,750 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2760.1187, 1023.1802, 1080.9146, 3360.8313, 3191.6597, 3254.9705, 3373.0437, 3342.225, 3378.8354, 3452.5608]
2025-05-13 13:04:42,750 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:04:42,760 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 3 hours, 30 minutes, 36 seconds)
2025-05-13 13:08:50,217 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:09:11,725 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3261.93579 ± 62.784
2025-05-13 13:09:11,725 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3305.2454, 3235.9785, 3253.771, 3295.8374, 3288.1758, 3309.3745, 3146.8337, 3361.567, 3164.339, 3258.2344]
2025-05-13 13:09:11,725 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:09:11,734 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 3 hours, 26 minutes, 5 seconds)
2025-05-13 13:13:19,054 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:13:40,493 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3091.74878 ± 486.436
2025-05-13 13:13:40,494 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3243.419, 1650.936, 3269.5532, 3306.573, 3225.075, 3363.128, 3290.2478, 3247.3308, 3272.3691, 3048.8564]
2025-05-13 13:13:40,494 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:13:40,501 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 3 hours, 21 minutes, 37 seconds)
2025-05-13 13:17:48,419 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:18:10,072 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3054.23096 ± 674.127
2025-05-13 13:18:10,072 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3180.3918, 3325.7031, 3203.395, 3303.4678, 3144.3457, 3336.3904, 1043.5769, 3378.2188, 3294.6238, 3332.2]
2025-05-13 13:18:10,072 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:18:10,082 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 3 hours, 17 minutes, 15 seconds)
2025-05-13 13:22:17,344 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:22:39,215 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3254.55566 ± 79.346
2025-05-13 13:22:39,215 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3293.6982, 3047.2678, 3280.7563, 3277.5854, 3246.4197, 3195.625, 3265.9124, 3351.2798, 3269.193, 3317.8184]
2025-05-13 13:22:39,215 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:22:39,221 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 3 hours, 12 minutes, 52 seconds)
2025-05-13 13:26:46,354 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:27:07,868 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3022.26733 ± 661.554
2025-05-13 13:27:07,868 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3250.3064, 1076.0449, 2881.8088, 3225.3176, 3297.3135, 3289.4958, 3411.7627, 3225.5654, 3271.7554, 3293.3]
2025-05-13 13:27:07,868 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:27:07,875 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 3 hours, 8 minutes, 18 seconds)
2025-05-13 13:31:14,656 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:31:35,954 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3085.25830 ± 536.081
2025-05-13 13:31:35,954 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3192.5557, 3248.2205, 3243.8123, 3284.3252, 3277.4026, 3410.79, 3229.789, 3287.4075, 1486.8027, 3191.4785]
2025-05-13 13:31:35,955 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:31:35,961 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 3 hours, 3 minutes, 42 seconds)
2025-05-13 13:35:43,377 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:36:04,724 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3239.56030 ± 87.328
2025-05-13 13:36:04,724 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3122.4065, 3325.2222, 3272.1587, 3262.478, 3346.624, 3239.6213, 3178.3018, 3079.5776, 3218.487, 3350.7263]
2025-05-13 13:36:04,724 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:36:04,732 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 59 minutes, 13 seconds)
2025-05-13 13:40:11,876 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:40:33,462 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2775.45923 ± 800.403
2025-05-13 13:40:33,462 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3301.4402, 3246.7124, 2801.284, 3178.5068, 3135.2458, 3204.5688, 1104.1177, 3277.84, 3211.1228, 1293.7542]
2025-05-13 13:40:33,462 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:40:33,472 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 54 minutes, 38 seconds)
2025-05-13 13:44:40,852 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:45:02,265 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3255.76221 ± 114.192
2025-05-13 13:45:02,265 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2959.7234, 3258.7935, 3419.4282, 3345.927, 3246.2542, 3275.8657, 3296.799, 3297.4065, 3263.758, 3193.663]
2025-05-13 13:45:02,266 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:45:02,276 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 50 minutes, 7 seconds)
2025-05-13 13:49:09,631 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:49:31,224 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3203.15552 ± 204.031
2025-05-13 13:49:31,224 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3282.8594, 3134.5308, 3193.2917, 3295.4072, 3298.7847, 3285.8354, 3347.101, 3234.8167, 2619.7146, 3339.2112]
2025-05-13 13:49:31,224 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:49:31,231 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 45 minutes, 40 seconds)
2025-05-13 13:53:38,929 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:54:00,308 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2617.81982 ± 904.073
2025-05-13 13:54:00,308 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3356.0203, 1142.6921, 3099.777, 3238.8792, 3358.6274, 3315.5098, 2787.6362, 1559.0051, 1104.1132, 3215.9373]
2025-05-13 13:54:00,308 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:54:00,317 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 41 minutes, 19 seconds)
2025-05-13 13:58:07,658 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 13:58:29,252 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3071.70630 ± 530.086
2025-05-13 13:58:29,252 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3070.1582, 3353.3733, 3179.9062, 3288.0757, 3080.2378, 1519.8889, 3343.865, 3459.6147, 3226.8142, 3195.1272]
2025-05-13 13:58:29,252 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:58:29,261 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 36 minutes, 51 seconds)
2025-05-13 14:02:36,620 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:02:58,179 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3261.12939 ± 82.751
2025-05-13 14:02:58,180 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3280.65, 3236.4463, 3301.742, 3258.0173, 3366.0996, 3317.3208, 3320.7024, 3214.849, 3268.5903, 3046.8762]
2025-05-13 14:02:58,180 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:02:58,190 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 32 minutes, 24 seconds)
2025-05-13 14:07:05,808 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:07:27,153 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3102.00146 ± 651.477
2025-05-13 14:07:27,153 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3313.8057, 3332.14, 3372.6096, 3134.7827, 1160.0392, 3317.4429, 3429.4048, 3286.2334, 3376.0667, 3297.4917]
2025-05-13 14:07:27,153 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:07:27,161 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 2 hours, 27 minutes, 56 seconds)
2025-05-13 14:11:35,330 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:11:56,718 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3114.06665 ± 426.212
2025-05-13 14:11:56,718 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3193.7395, 3205.7854, 3222.591, 1846.3271, 3207.8005, 3391.93, 3239.035, 3300.9614, 3264.4805, 3268.0164]
2025-05-13 14:11:56,718 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:11:56,726 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 2 hours, 23 minutes, 31 seconds)
2025-05-13 14:16:04,353 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:16:25,652 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2529.67432 ± 908.196
2025-05-13 14:16:25,652 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3204.0293, 3294.7214, 3264.1577, 3207.0928, 1245.917, 3182.3713, 1182.1149, 2235.4114, 1224.1096, 3256.8167]
2025-05-13 14:16:25,652 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:16:25,662 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 2 hours, 19 minutes, 1 second)
2025-05-13 14:20:33,391 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:20:54,921 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3198.67676 ± 123.990
2025-05-13 14:20:54,921 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3099.5444, 3339.5999, 2961.756, 3171.5098, 3108.9685, 3293.0679, 3360.0598, 3266.6436, 3094.096, 3291.5227]
2025-05-13 14:20:54,921 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:20:54,928 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 2 hours, 14 minutes, 34 seconds)
2025-05-13 14:25:02,528 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:25:23,935 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3274.80176 ± 51.266
2025-05-13 14:25:23,935 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3262.554, 3290.67, 3232.3872, 3285.204, 3355.4102, 3248.937, 3162.5598, 3306.5767, 3331.303, 3272.4163]
2025-05-13 14:25:23,935 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:25:23,935 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (3274.80) for latency ExtremeSparseL4U32
2025-05-13 14:25:23,946 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 2 hours, 10 minutes, 5 seconds)
2025-05-13 14:29:31,212 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:29:52,917 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3266.56323 ± 58.945
2025-05-13 14:29:52,917 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3322.178, 3323.9414, 3119.608, 3235.686, 3299.1477, 3266.224, 3307.2937, 3255.363, 3229.6292, 3306.5623]
2025-05-13 14:29:52,917 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:29:52,928 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 2 hours, 5 minutes, 36 seconds)
2025-05-13 14:34:00,271 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:34:21,851 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3065.41260 ± 644.263
2025-05-13 14:34:21,851 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3191.357, 1142.3937, 3420.8655, 3266.6055, 3258.4617, 3329.239, 3300.2173, 3292.828, 3274.5957, 3177.5632]
2025-05-13 14:34:21,851 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:34:21,868 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 2 hours, 1 minute, 3 seconds)
2025-05-13 14:38:29,885 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:38:51,267 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3167.24341 ± 383.654
2025-05-13 14:38:51,267 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3319.218, 3341.0042, 3195.5383, 2024.599, 3307.8726, 3317.1826, 3337.2747, 3257.7217, 3237.9934, 3334.03]
2025-05-13 14:38:51,267 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:38:51,277 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 56 minutes, 37 seconds)
2025-05-13 14:42:59,377 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:43:20,708 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3279.69849 ± 44.687
2025-05-13 14:43:20,708 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3284.7375, 3213.0002, 3287.145, 3320.2888, 3209.2566, 3330.1357, 3329.656, 3288.0303, 3226.0344, 3308.698]
2025-05-13 14:43:20,708 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:43:20,708 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (3279.70) for latency ExtremeSparseL4U32
2025-05-13 14:43:20,719 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 52 minutes, 8 seconds)
2025-05-13 14:47:28,520 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:47:49,956 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3236.67920 ± 58.848
2025-05-13 14:47:49,957 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3202.6238, 3270.6682, 3220.5984, 3251.676, 3141.6902, 3252.3137, 3316.3943, 3326.6628, 3146.6104, 3237.5535]
2025-05-13 14:47:49,957 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:47:49,968 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 47 minutes, 40 seconds)
2025-05-13 14:51:57,312 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:52:18,654 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2975.11475 ± 635.519
2025-05-13 14:52:18,654 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1302.223, 3238.342, 3266.3843, 3272.6892, 3291.0986, 3278.7268, 2254.0762, 3255.147, 3318.9265, 3273.5354]
2025-05-13 14:52:18,654 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:52:18,662 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 43 minutes, 10 seconds)
2025-05-13 14:56:26,031 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 14:56:47,421 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2566.90381 ± 814.437
2025-05-13 14:56:47,421 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3255.7424, 2514.7559, 2294.1492, 3322.2222, 3343.2437, 1624.4993, 3232.9893, 3360.8472, 1491.2739, 1229.3165]
2025-05-13 14:56:47,421 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:56:47,431 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 38 minutes, 40 seconds)
2025-05-13 15:00:55,013 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:01:16,757 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3078.56519 ± 662.488
2025-05-13 15:01:16,757 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3300.5603, 1098.1042, 3398.724, 3311.9092, 3393.8264, 3236.644, 3298.0176, 3234.2766, 3248.8462, 3264.7444]
2025-05-13 15:01:16,757 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:01:16,767 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 34 minutes, 11 seconds)
2025-05-13 15:05:23,931 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:05:45,432 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3011.96411 ± 620.428
2025-05-13 15:05:45,433 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3346.7231, 3079.22, 3244.1145, 3223.8828, 3083.8086, 3259.4111, 3364.3352, 3268.2795, 3074.6768, 1175.1906]
2025-05-13 15:05:45,433 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:05:45,442 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 29 minutes, 38 seconds)
2025-05-13 15:09:52,603 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:10:14,500 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2727.21216 ± 935.132
2025-05-13 15:10:14,500 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3297.1892, 3426.9546, 3268.87, 1236.7206, 1567.6333, 3362.59, 3427.6482, 1125.4886, 3260.5525, 3298.4744]
2025-05-13 15:10:14,500 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:10:14,512 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 25 minutes, 9 seconds)
2025-05-13 15:14:21,586 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:14:43,149 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3308.13623 ± 55.940
2025-05-13 15:14:43,150 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3275.53, 3299.393, 3172.2917, 3384.3765, 3348.7336, 3296.3904, 3295.3987, 3368.397, 3318.7559, 3322.0964]
2025-05-13 15:14:43,150 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:14:43,150 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (3308.14) for latency ExtremeSparseL4U32
2025-05-13 15:14:43,160 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 20 minutes, 40 seconds)
2025-05-13 15:18:50,176 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:19:11,500 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2960.17578 ± 769.069
2025-05-13 15:19:11,500 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1714.6273, 3358.8604, 3323.629, 3348.8613, 1171.8234, 3313.009, 3401.5974, 3383.494, 3254.4695, 3331.3867]
2025-05-13 15:19:11,500 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:19:11,509 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 16 minutes, 9 seconds)
2025-05-13 15:23:18,268 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:23:39,658 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3267.06470 ± 61.275
2025-05-13 15:23:39,658 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3174.6484, 3183.5652, 3197.5615, 3324.7974, 3336.6345, 3253.2173, 3245.3672, 3301.557, 3320.447, 3332.8518]
2025-05-13 15:23:39,658 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:23:39,666 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 1 hour, 11 minutes, 37 seconds)
2025-05-13 15:27:46,928 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:28:08,349 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3295.32495 ± 91.807
2025-05-13 15:28:08,350 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3328.9219, 3141.0183, 3320.4753, 3157.262, 3385.2456, 3379.5203, 3377.407, 3372.5535, 3303.713, 3187.1313]
2025-05-13 15:28:08,350 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:28:08,361 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 1 hour, 7 minutes, 8 seconds)
2025-05-13 15:32:16,440 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:32:37,821 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3069.99854 ± 577.914
2025-05-13 15:32:37,821 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3177.2002, 3248.3606, 3347.8557, 3195.266, 3331.0676, 1364.0955, 3041.7883, 3423.955, 3343.0066, 3227.3904]
2025-05-13 15:32:37,821 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:32:37,832 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 1 hour, 2 minutes, 41 seconds)
2025-05-13 15:36:46,649 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:37:08,041 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3251.10278 ± 715.914
2025-05-13 15:37:08,041 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3491.0044, 3331.8667, 3457.3335, 1113.785, 3494.0652, 3473.7383, 3543.4236, 3539.4285, 3448.7166, 3617.6648]
2025-05-13 15:37:08,041 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:37:08,049 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 58 minutes, 16 seconds)
2025-05-13 15:41:16,578 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:41:38,302 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2840.37549 ± 960.896
2025-05-13 15:41:38,302 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3760.5952, 3767.5334, 3837.932, 3195.627, 2725.8813, 2617.7424, 1353.7549, 1367.8508, 1898.52, 3878.3157]
2025-05-13 15:41:38,302 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:41:38,313 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 53 minutes, 52 seconds)
2025-05-13 15:45:46,696 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:46:08,058 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2769.50122 ± 925.239
2025-05-13 15:46:08,058 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3389.17, 1680.2792, 1140.4558, 1351.0208, 3505.3328, 3423.4382, 3300.152, 2921.2246, 3550.2275, 3433.7117]
2025-05-13 15:46:08,058 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:46:08,067 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 49 minutes, 26 seconds)
2025-05-13 15:50:15,743 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:50:37,268 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3460.06714 ± 440.622
2025-05-13 15:50:37,268 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3649.088, 3627.7148, 3760.2668, 3592.434, 3743.8381, 3653.6902, 2175.4358, 3463.5293, 3439.1619, 3495.5146]
2025-05-13 15:50:37,268 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:50:37,268 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (3460.07) for latency ExtremeSparseL4U32
2025-05-13 15:50:37,278 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 44 minutes, 57 seconds)
2025-05-13 15:54:45,973 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:55:07,533 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3609.45459 ± 328.373
2025-05-13 15:55:07,533 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3761.3286, 3484.968, 3879.8213, 3910.7039, 3177.8093, 3796.4128, 3769.4592, 3807.4817, 2843.3599, 3663.2034]
2025-05-13 15:55:07,533 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:55:07,533 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (3609.45) for latency ExtremeSparseL4U32
2025-05-13 15:55:07,547 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 40 minutes, 29 seconds)
2025-05-13 15:59:16,125 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 15:59:37,732 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3030.05273 ± 846.101
2025-05-13 15:59:37,733 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3029.8352, 3824.1326, 3696.4946, 2734.8975, 2245.4854, 3831.552, 3824.7764, 1434.1329, 3702.1716, 1977.051]
2025-05-13 15:59:37,733 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:59:37,743 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 35 minutes, 59 seconds)
2025-05-13 16:03:45,747 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 16:04:07,145 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3063.42993 ± 773.629
2025-05-13 16:04:07,145 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3707.3574, 3715.5466, 2384.3694, 3563.9277, 3566.544, 1684.3225, 2133.4397, 3708.621, 3790.9, 2379.2688]
2025-05-13 16:04:07,145 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 16:04:07,157 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 31 minutes, 28 seconds)
2025-05-13 16:08:15,417 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 16:08:37,304 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2568.98584 ± 942.288
2025-05-13 16:08:37,304 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3856.8735, 1770.1703, 1455.1144, 3008.843, 3761.2212, 3754.692, 2786.3923, 1880.5247, 1309.8937, 2106.1316]
2025-05-13 16:08:37,304 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 16:08:37,312 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 26 minutes, 59 seconds)
2025-05-13 16:12:45,322 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 16:13:06,810 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2875.88013 ± 722.821
2025-05-13 16:13:06,810 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2972.135, 1655.3881, 3123.9915, 3837.0027, 3549.614, 2450.2634, 2698.0898, 3824.4365, 1792.5245, 2855.354]
2025-05-13 16:13:06,810 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 16:13:06,820 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 22 minutes, 29 seconds)
2025-05-13 16:17:15,007 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 16:17:36,422 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2877.99268 ± 910.843
2025-05-13 16:17:36,422 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1425.5574, 3209.5518, 2225.4565, 3327.9136, 3326.0535, 3530.05, 4109.6035, 2006.2947, 3935.9175, 1683.5277]
2025-05-13 16:17:36,422 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 16:17:36,432 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 17 minutes, 59 seconds)
2025-05-13 16:21:44,329 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 16:22:05,677 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3347.25464 ± 543.598
2025-05-13 16:22:05,677 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2024.5243, 3602.4463, 3587.767, 3868.0378, 3672.3906, 3269.5583, 3606.0166, 3413.4316, 3758.5603, 2669.815]
2025-05-13 16:22:05,677 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 16:22:05,689 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 13 minutes, 28 seconds)
2025-05-13 16:26:14,172 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 16:26:35,546 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3344.35034 ± 889.575
2025-05-13 16:26:35,546 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3813.9614, 3848.5786, 1230.3975, 3949.2744, 3661.984, 2485.5303, 4018.9482, 3993.3328, 3875.978, 2565.518]
2025-05-13 16:26:35,546 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 16:26:35,555 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 8 minutes, 59 seconds)
2025-05-13 16:30:44,270 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 16:31:05,941 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2446.97461 ± 1114.621
2025-05-13 16:31:05,941 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3634.283, 1148.9999, 3987.9993, 3883.6753, 3161.4204, 1448.3217, 1256.8138, 2823.0134, 1201.5884, 1923.6305]
2025-05-13 16:31:05,941 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 16:31:05,952 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 4 minutes, 29 seconds)
2025-05-13 16:35:14,854 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-13 16:35:36,487 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2571.65527 ± 784.820
2025-05-13 16:35:36,487 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1346.6611, 1983.5045, 2486.668, 2581.6804, 3821.434, 3282.3044, 3843.3018, 2167.9336, 2192.3386, 2010.7257]
2025-05-13 16:35:36,487 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 16:35:36,499 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1251 [DEBUG]: Training session finished
