2025-05-13 09:06:26,955 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-bpql-mda-mem24
2025-05-13 09:06:26,955 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-bpql-mda-mem24
2025-05-13 09:06:26,955 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x153a85786350>}
2025-05-13 09:06:26,955 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1111 [DEBUG]: using device: cuda
2025-05-13 09:06:26,962 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1133 [INFO]: Creating new trainer
2025-05-13 09:06:26,993 baseline-bpql-mda-noisy-halfcheetah:119 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-05-13 09:06:26,993 baseline-bpql-mda-noisy-halfcheetah:120 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-13 09:06:26,999 baseline-bpql-mda-noisy-halfcheetah:149 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Identity()
  (net_rec): GRU(6, 384, batch_first=True)
)
2025-05-13 09:06:27,685 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1194 [DEBUG]: Starting training session...
2025-05-13 09:06:27,685 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 1/100
2025-05-13 09:10:21,483 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:10:40,276 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -302.60846 ± 5.576
2025-05-13 09:10:40,276 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-305.6218, -305.06873, -294.89975, -292.36853, -295.69217, -306.9754, -306.9845, -304.0021, -307.81427, -306.65726]
2025-05-13 09:10:40,277 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:10:40,277 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (-302.61) for latency ExtremeClogL1U23
2025-05-13 09:10:40,283 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 6 hours, 56 minutes, 47 seconds)
2025-05-13 09:14:37,866 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:14:56,128 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 240.79861 ± 255.728
2025-05-13 09:14:56,128 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [429.23285, -89.12808, 11.250679, -66.68331, -83.335655, 568.8763, 376.66995, 334.6874, 571.521, 354.89502]
2025-05-13 09:14:56,128 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:14:56,128 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (240.80) for latency ExtremeClogL1U23
2025-05-13 09:14:56,134 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 6 hours, 55 minutes, 14 seconds)
2025-05-13 09:18:52,431 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:19:10,752 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 867.38507 ± 68.754
2025-05-13 09:19:10,752 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [783.20557, 778.1949, 893.3437, 906.45825, 909.0758, 902.1537, 879.0604, 898.9596, 747.93494, 975.46344]
2025-05-13 09:19:10,753 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:19:10,753 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (867.39) for latency ExtremeClogL1U23
2025-05-13 09:19:10,757 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 6 hours, 51 minutes, 12 seconds)
2025-05-13 09:23:06,086 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:23:24,017 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 965.63684 ± 60.000
2025-05-13 09:23:24,017 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [882.0808, 1054.7491, 1023.59283, 1017.6734, 993.80975, 900.9406, 950.2733, 884.0547, 936.0046, 1013.1881]
2025-05-13 09:23:24,018 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:23:24,018 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (965.64) for latency ExtremeClogL1U23
2025-05-13 09:23:24,025 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 6 hours, 46 minutes, 32 seconds)
2025-05-13 09:27:19,054 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:27:37,321 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1034.51379 ± 57.734
2025-05-13 09:27:37,322 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1060.8845, 1131.528, 1064.438, 1073.7457, 1050.6305, 983.17365, 988.7628, 912.0981, 1020.4869, 1059.3883]
2025-05-13 09:27:37,322 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:27:37,322 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (1034.51) for latency ExtremeClogL1U23
2025-05-13 09:27:37,326 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 6 hours, 42 minutes, 3 seconds)
2025-05-13 09:31:32,375 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:31:50,462 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1074.63245 ± 285.600
2025-05-13 09:31:50,462 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1232.9703, 258.80823, 1090.4137, 1210.7234, 1249.7747, 1262.9075, 1099.9525, 1017.4877, 1264.3342, 1058.9517]
2025-05-13 09:31:50,462 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:31:50,462 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (1074.63) for latency ExtremeClogL1U23
2025-05-13 09:31:50,471 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 6 hours, 37 minutes, 59 seconds)
2025-05-13 09:35:45,384 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:36:03,288 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1826.66577 ± 324.752
2025-05-13 09:36:03,288 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2032.1068, 2056.2153, 1486.0415, 2239.5317, 1915.897, 2047.6527, 2079.5605, 1725.2711, 1532.9642, 1151.4153]
2025-05-13 09:36:03,288 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:36:03,288 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (1826.67) for latency ExtremeClogL1U23
2025-05-13 09:36:03,293 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 6 hours, 32 minutes, 49 seconds)
2025-05-13 09:39:58,745 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:40:17,019 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1359.80688 ± 243.845
2025-05-13 09:40:17,020 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1362.5171, 1287.1724, 1033.8623, 1531.7556, 1440.7991, 1270.0269, 1709.415, 1133.0665, 1054.7084, 1774.7461]
2025-05-13 09:40:17,020 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:40:17,027 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 6 hours, 28 minutes, 19 seconds)
2025-05-13 09:44:11,358 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:44:29,696 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1348.02600 ± 268.981
2025-05-13 09:44:29,696 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1310.7688, 1456.3474, 1076.7032, 1575.5719, 1245.229, 1226.5411, 1094.2565, 2005.1221, 1099.5927, 1390.1267]
2025-05-13 09:44:29,696 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:44:29,701 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 6 hours, 23 minutes, 55 seconds)
2025-05-13 09:48:24,130 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:48:41,955 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2132.32275 ± 708.229
2025-05-13 09:48:41,955 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2976.1482, 1451.2892, 1395.5304, 2718.2166, 2907.0725, 1657.613, 1469.2169, 2688.2004, 1205.7554, 2854.182]
2025-05-13 09:48:41,956 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:48:41,956 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (2132.32) for latency ExtremeClogL1U23
2025-05-13 09:48:41,961 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 6 hours, 19 minutes, 23 seconds)
2025-05-13 09:52:36,323 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:52:54,278 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1843.30310 ± 572.135
2025-05-13 09:52:54,279 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1847.0092, 1105.3379, 1684.9438, 2290.1726, 1148.804, 1894.0131, 1442.4044, 2074.311, 1761.6996, 3184.3362]
2025-05-13 09:52:54,279 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:52:54,283 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 6 hours, 14 minutes, 55 seconds)
2025-05-13 09:56:48,840 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:57:06,693 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1726.17212 ± 557.512
2025-05-13 09:57:06,693 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1603.6392, 2023.7188, 1121.0356, 3122.5374, 1593.6432, 1393.022, 2151.8274, 1446.5225, 1616.9685, 1188.8071]
2025-05-13 09:57:06,693 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:57:06,699 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 6 hours, 10 minutes, 35 seconds)
2025-05-13 10:01:01,066 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:01:19,189 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2196.45972 ± 544.790
2025-05-13 10:01:19,189 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2606.5852, 1930.9023, 2894.1167, 2335.8735, 1739.2672, 3011.0508, 1773.9801, 1292.2103, 2611.1733, 1769.438]
2025-05-13 10:01:19,189 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:01:19,189 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (2196.46) for latency ExtremeClogL1U23
2025-05-13 10:01:19,195 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 6 hours, 6 minutes, 1 second)
2025-05-13 10:05:13,419 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:05:31,446 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2057.97119 ± 497.447
2025-05-13 10:05:31,446 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2405.2097, 2450.626, 1582.3915, 1280.3856, 2403.913, 1424.1324, 2948.417, 2070.7075, 1849.9907, 2163.937]
2025-05-13 10:05:31,446 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:05:31,453 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 6 hours, 1 minute, 42 seconds)
2025-05-13 10:09:25,454 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:09:43,816 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2062.03857 ± 653.676
2025-05-13 10:09:43,816 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1284.5951, 1763.3256, 2819.8906, 2254.6055, 1247.2239, 2977.093, 2119.2751, 3027.051, 1696.5778, 1430.7509]
2025-05-13 10:09:43,816 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:09:43,821 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 5 hours, 57 minutes, 31 seconds)
2025-05-13 10:13:37,510 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:13:55,328 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1866.32422 ± 351.282
2025-05-13 10:13:55,328 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1788.9663, 1874.3134, 1400.9967, 1638.5189, 2178.964, 1335.1638, 2099.9663, 2552.3582, 1734.9302, 2059.064]
2025-05-13 10:13:55,328 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:13:55,333 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 5 hours, 53 minutes, 5 seconds)
2025-05-13 10:17:49,084 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:18:06,942 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2306.63794 ± 855.556
2025-05-13 10:18:06,942 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1131.6387, 1687.523, 2060.23, 3318.2476, 2879.9385, 3171.214, 3091.5493, 1467.9438, 1125.7968, 3132.2966]
2025-05-13 10:18:06,942 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:18:06,942 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (2306.64) for latency ExtremeClogL1U23
2025-05-13 10:18:06,948 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 5 hours, 48 minutes, 40 seconds)
2025-05-13 10:22:00,658 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:22:18,717 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2730.46484 ± 701.535
2025-05-13 10:22:18,717 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1609.9419, 3146.7317, 3203.5867, 2819.9094, 3297.1743, 2779.473, 2660.908, 3308.0415, 3268.1843, 1210.699]
2025-05-13 10:22:18,717 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:22:18,717 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (2730.46) for latency ExtremeClogL1U23
2025-05-13 10:22:18,723 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 5 hours, 44 minutes, 16 seconds)
2025-05-13 10:26:12,713 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:26:30,924 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1965.82581 ± 535.636
2025-05-13 10:26:30,924 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2312.4023, 3184.7188, 1571.5347, 1863.0911, 2112.5803, 1618.2076, 1269.7737, 2292.505, 1370.6484, 2062.7964]
2025-05-13 10:26:30,924 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:26:30,931 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 5 hours, 40 minutes, 3 seconds)
2025-05-13 10:30:25,237 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:30:43,248 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2332.01221 ± 656.824
2025-05-13 10:30:43,248 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1956.7173, 1396.3956, 2779.0508, 3099.5059, 1315.3651, 2856.8176, 1814.4604, 3053.1384, 2935.5625, 2113.1067]
2025-05-13 10:30:43,248 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:30:43,255 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 5 hours, 35 minutes, 50 seconds)
2025-05-13 10:34:37,038 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:34:55,008 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2385.26001 ± 696.207
2025-05-13 10:34:55,008 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2629.938, 3312.1062, 3205.6301, 3138.7183, 1482.9169, 1769.6726, 1443.4276, 2164.0828, 1825.6975, 2880.41]
2025-05-13 10:34:55,008 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:34:55,014 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 5 hours, 31 minutes, 42 seconds)
2025-05-13 10:38:49,038 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:39:06,827 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1977.47192 ± 565.653
2025-05-13 10:39:06,827 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2961.9058, 1947.9805, 2082.5178, 1300.8071, 1574.5216, 1660.4708, 2657.8062, 1184.9645, 1810.5576, 2593.188]
2025-05-13 10:39:06,827 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:39:06,832 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 5 hours, 27 minutes, 34 seconds)
2025-05-13 10:43:00,792 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:43:19,088 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1527.35156 ± 517.836
2025-05-13 10:43:19,088 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1774.3484, 1219.3964, 1635.1399, 1105.6488, 2879.1482, 1102.8857, 1483.7596, 1134.8, 1757.2593, 1181.129]
2025-05-13 10:43:19,088 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:43:19,093 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 5 hours, 23 minutes, 29 seconds)
2025-05-13 10:47:13,611 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:47:31,842 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2290.12646 ± 771.711
2025-05-13 10:47:31,842 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3278.284, 1613.022, 1989.7224, 1967.579, 2876.3088, 3045.4883, 3356.2822, 1123.4978, 1334.3156, 2316.7654]
2025-05-13 10:47:31,842 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:47:31,847 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 5 hours, 19 minutes, 25 seconds)
2025-05-13 10:51:25,751 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:51:44,054 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2335.66040 ± 806.949
2025-05-13 10:51:44,054 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2446.0498, 1420.5554, 3628.6929, 2489.5989, 1289.5916, 3229.6436, 1984.9255, 1157.7742, 2962.9407, 2746.8303]
2025-05-13 10:51:44,054 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:51:44,062 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 5 hours, 15 minutes, 12 seconds)
2025-05-13 10:55:38,727 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:55:56,741 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1882.54553 ± 655.330
2025-05-13 10:55:56,741 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2794.7864, 1759.0587, 3060.1301, 2563.4246, 1922.4551, 1545.649, 1418.2881, 1155.0568, 1484.8552, 1121.7496]
2025-05-13 10:55:56,741 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:55:56,747 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 5 hours, 11 minutes, 13 seconds)
2025-05-13 10:59:51,505 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:00:09,377 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2945.17041 ± 728.877
2025-05-13 11:00:09,377 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3332.4573, 3294.5112, 3302.4124, 3255.1426, 3276.806, 3390.4607, 2050.7383, 1075.9943, 3386.4817, 3086.6985]
2025-05-13 11:00:09,377 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:00:09,377 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (2945.17) for latency ExtremeClogL1U23
2025-05-13 11:00:09,384 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 5 hours, 7 minutes, 13 seconds)
2025-05-13 11:04:03,949 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:04:21,786 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3069.82788 ± 284.789
2025-05-13 11:04:21,786 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3255.927, 3020.662, 2828.3606, 3305.3745, 2618.481, 2648.6504, 3199.2473, 3433.618, 3416.6753, 2971.2825]
2025-05-13 11:04:21,786 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:04:21,786 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (3069.83) for latency ExtremeClogL1U23
2025-05-13 11:04:21,794 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 5 hours, 3 minutes, 2 seconds)
2025-05-13 11:08:16,556 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:08:34,412 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2562.96094 ± 797.511
2025-05-13 11:08:34,412 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2309.6062, 3386.9944, 3771.2542, 3099.185, 1393.3108, 2892.4102, 2714.8013, 2968.332, 1281.7374, 1811.9757]
2025-05-13 11:08:34,412 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:08:34,421 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 4 hours, 58 minutes, 48 seconds)
2025-05-13 11:12:29,039 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:12:46,877 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2796.13281 ± 1181.099
2025-05-13 11:12:46,878 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3912.5093, 3915.6143, 3501.4119, 3735.4878, 3794.2014, 3591.7236, 1643.1256, 1735.1687, 1065.956, 1066.1288]
2025-05-13 11:12:46,878 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:12:46,884 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 4 hours, 54 minutes, 39 seconds)
2025-05-13 11:16:41,572 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:16:59,701 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3060.67529 ± 1025.120
2025-05-13 11:16:59,702 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3180.9004, 4065.4998, 1759.9531, 1413.9818, 4022.0957, 1825.7064, 4084.0842, 3605.8967, 4081.9407, 2566.6943]
2025-05-13 11:16:59,702 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:16:59,707 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 4 hours, 50 minutes, 28 seconds)
2025-05-13 11:20:53,936 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:21:12,026 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3529.78320 ± 1027.699
2025-05-13 11:21:12,026 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1166.7294, 4081.521, 4160.6094, 4140.8843, 4048.8914, 1860.7891, 4148.374, 3734.2195, 4079.7234, 3876.0884]
2025-05-13 11:21:12,026 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:21:12,026 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (3529.78) for latency ExtremeClogL1U23
2025-05-13 11:21:12,032 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 4 hours, 46 minutes, 12 seconds)
2025-05-13 11:25:06,424 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:25:24,645 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3807.81104 ± 908.755
2025-05-13 11:25:24,645 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1082.8485, 4072.7546, 4080.9683, 4088.253, 4099.651, 4112.069, 4093.9663, 4153.721, 4137.5117, 4156.3687]
2025-05-13 11:25:24,645 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:25:24,645 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (3807.81) for latency ExtremeClogL1U23
2025-05-13 11:25:24,651 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 4 hours, 42 minutes, 2 seconds)
2025-05-13 11:29:19,045 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:29:37,139 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3877.57422 ± 553.780
2025-05-13 11:29:37,140 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4111.422, 4086.6528, 4188.6606, 4125.368, 4175.6846, 4233.904, 3824.1716, 2321.701, 3562.8745, 4145.303]
2025-05-13 11:29:37,140 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:29:37,140 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (3877.57) for latency ExtremeClogL1U23
2025-05-13 11:29:37,145 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 4 hours, 37 minutes, 47 seconds)
2025-05-13 11:33:31,949 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:33:50,297 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3592.45239 ± 1030.304
2025-05-13 11:33:50,298 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4200.0107, 4028.9834, 3913.0168, 1419.6768, 4143.0186, 4149.6475, 4133.158, 1662.5112, 4199.037, 4075.4644]
2025-05-13 11:33:50,298 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:33:50,304 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 4 hours, 33 minutes, 44 seconds)
2025-05-13 11:37:44,862 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:38:02,695 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3426.52344 ± 1104.968
2025-05-13 11:38:02,696 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4164.7783, 2262.7563, 1259.1761, 4036.9792, 3810.335, 4249.9233, 4189.6826, 1832.7141, 4266.899, 4191.994]
2025-05-13 11:38:02,696 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:38:02,702 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 4 hours, 29 minutes, 26 seconds)
2025-05-13 11:41:56,484 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:42:14,720 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3613.26685 ± 1136.323
2025-05-13 11:42:14,721 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4123.871, 4204.5513, 4225.086, 4176.673, 968.1283, 4096.6807, 4227.077, 4287.3657, 1779.3689, 4043.8665]
2025-05-13 11:42:14,721 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:42:14,729 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 4 hours, 25 minutes, 9 seconds)
2025-05-13 11:46:08,757 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:46:27,047 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4042.77148 ± 574.697
2025-05-13 11:46:27,047 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2325.689, 4097.3384, 4260.767, 4293.8384, 4213.3438, 4231.8936, 4266.009, 4255.634, 4211.727, 4271.4775]
2025-05-13 11:46:27,047 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:46:27,047 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (4042.77) for latency ExtremeClogL1U23
2025-05-13 11:46:27,055 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 4 hours, 20 minutes, 53 seconds)
2025-05-13 11:50:21,556 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:50:39,604 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3935.28979 ± 893.909
2025-05-13 11:50:39,604 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4272.0933, 4162.919, 1275.4324, 4295.7373, 4357.3867, 3924.9138, 4215.15, 4256.895, 4306.019, 4286.3525]
2025-05-13 11:50:39,604 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:50:39,610 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 4 hours, 16 minutes, 42 seconds)
2025-05-13 11:54:34,335 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:54:52,391 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3895.89136 ± 772.157
2025-05-13 11:54:52,391 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4223.592, 3818.4888, 4326.049, 4386.0845, 1749.6135, 4259.361, 4203.924, 4328.5503, 4275.8145, 3387.4373]
2025-05-13 11:54:52,391 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:54:52,398 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 4 hours, 12 minutes, 25 seconds)
2025-05-13 11:58:47,049 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:59:05,171 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3904.96826 ± 702.864
2025-05-13 11:59:05,171 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4135.842, 4124.043, 4217.5513, 4261.334, 4197.93, 1822.7206, 4070.1177, 4226.218, 3846.6396, 4147.2886]
2025-05-13 11:59:05,171 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:59:05,179 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 4 hours, 8 minutes, 17 seconds)
2025-05-13 12:03:00,077 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:03:18,254 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4071.21826 ± 325.494
2025-05-13 12:03:18,254 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4161.1475, 4142.868, 4266.1567, 3203.9373, 4240.607, 4221.7617, 4207.834, 4303.425, 3741.0964, 4223.346]
2025-05-13 12:03:18,254 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:03:18,255 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (4071.22) for latency ExtremeClogL1U23
2025-05-13 12:03:18,264 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 4 hours, 4 minutes, 17 seconds)
2025-05-13 12:07:13,030 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:07:31,076 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4003.37256 ± 679.550
2025-05-13 12:07:31,076 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4232.8003, 4240.118, 4298.9834, 4156.63, 4246.7124, 4277.1533, 1971.391, 4103.458, 4240.5405, 4265.9404]
2025-05-13 12:07:31,076 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:07:31,083 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 4 hours, 9 seconds)
2025-05-13 12:11:25,740 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:11:43,628 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3945.17334 ± 417.114
2025-05-13 12:11:43,629 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4303.763, 4296.155, 4160.3716, 3585.997, 4356.59, 3272.2058, 4287.324, 3658.3997, 4227.3643, 3303.5625]
2025-05-13 12:11:43,629 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:11:43,635 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 55 minutes, 57 seconds)
2025-05-13 12:15:37,678 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:15:55,754 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3973.04834 ± 972.296
2025-05-13 12:15:55,754 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4265.585, 4239.3364, 4276.4556, 4290.2705, 4280.116, 4348.585, 4350.814, 4311.287, 1057.8582, 4310.1733]
2025-05-13 12:15:55,754 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:15:55,762 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 3 hours, 51 minutes, 37 seconds)
2025-05-13 12:19:49,743 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:20:07,524 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4340.25732 ± 45.517
2025-05-13 12:20:07,524 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4350.8735, 4396.848, 4368.2993, 4299.8257, 4352.5234, 4307.476, 4404.189, 4325.2217, 4353.351, 4243.965]
2025-05-13 12:20:07,524 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:20:07,524 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (4340.26) for latency ExtremeClogL1U23
2025-05-13 12:20:07,529 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 3 hours, 47 minutes, 13 seconds)
2025-05-13 12:24:01,709 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:24:19,881 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4289.48926 ± 42.532
2025-05-13 12:24:19,881 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4228.427, 4340.4326, 4338.533, 4281.338, 4233.6353, 4291.1606, 4236.896, 4286.1084, 4318.147, 4340.2134]
2025-05-13 12:24:19,881 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:24:19,889 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 3 hours, 42 minutes, 53 seconds)
2025-05-13 12:28:13,837 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:28:31,712 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4325.60400 ± 64.324
2025-05-13 12:28:31,712 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4372.5967, 4316.3193, 4361.95, 4307.498, 4394.738, 4371.164, 4321.974, 4376.108, 4170.174, 4263.5137]
2025-05-13 12:28:31,712 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:28:31,722 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 3 hours, 38 minutes, 30 seconds)
2025-05-13 12:32:25,467 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:32:43,540 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4280.71143 ± 336.046
2025-05-13 12:32:43,540 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4318.703, 4344.9146, 4435.9375, 4413.1255, 4434.5254, 4356.5396, 4410.255, 3280.2063, 4446.7573, 4366.154]
2025-05-13 12:32:43,540 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:32:43,549 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 3 hours, 34 minutes, 11 seconds)
2025-05-13 12:36:37,926 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:36:55,843 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4393.14990 ± 68.414
2025-05-13 12:36:55,844 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4359.1094, 4452.45, 4438.154, 4219.707, 4356.891, 4392.9336, 4423.8228, 4479.746, 4407.253, 4401.4336]
2025-05-13 12:36:55,844 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:36:55,844 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (4393.15) for latency ExtremeClogL1U23
2025-05-13 12:36:55,851 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 3 hours, 30 minutes)
2025-05-13 12:40:50,441 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:41:08,516 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4413.72803 ± 43.761
2025-05-13 12:41:08,516 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4415.3906, 4418.3135, 4426.8604, 4434.7354, 4342.7886, 4427.763, 4445.3496, 4429.924, 4474.008, 4322.1475]
2025-05-13 12:41:08,516 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:41:08,516 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (4413.73) for latency ExtremeClogL1U23
2025-05-13 12:41:08,528 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 3 hours, 25 minutes, 57 seconds)
2025-05-13 12:45:03,018 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:45:20,896 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4435.68164 ± 46.455
2025-05-13 12:45:20,896 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4513.745, 4480.26, 4404.509, 4450.369, 4350.7095, 4424.542, 4415.7085, 4485.908, 4390.6963, 4440.372]
2025-05-13 12:45:20,896 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:45:20,897 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (4435.68) for latency ExtremeClogL1U23
2025-05-13 12:45:20,903 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 3 hours, 21 minutes, 45 seconds)
2025-05-13 12:49:15,380 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:49:33,476 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4347.03809 ± 431.547
2025-05-13 12:49:33,476 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4401.8467, 4605.4966, 4154.73, 4544.429, 4321.607, 3124.121, 4597.9834, 4526.471, 4611.4326, 4582.263]
2025-05-13 12:49:33,476 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:49:33,483 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 3 hours, 17 minutes, 40 seconds)
2025-05-13 12:53:28,366 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:53:46,445 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4510.95703 ± 81.777
2025-05-13 12:53:46,445 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4579.145, 4493.551, 4498.5024, 4506.192, 4466.0786, 4509.875, 4541.3433, 4328.35, 4516.093, 4670.4395]
2025-05-13 12:53:46,445 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:53:46,445 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (4510.96) for latency ExtremeClogL1U23
2025-05-13 12:53:46,452 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 3 hours, 13 minutes, 38 seconds)
2025-05-13 12:57:41,159 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:57:59,247 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4567.32812 ± 68.706
2025-05-13 12:57:59,247 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4580.7705, 4613.2285, 4584.789, 4394.5957, 4484.3804, 4611.843, 4629.002, 4578.438, 4591.3613, 4604.8735]
2025-05-13 12:57:59,247 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:57:59,247 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (4567.33) for latency ExtremeClogL1U23
2025-05-13 12:57:59,258 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 3 hours, 9 minutes, 30 seconds)
2025-05-13 13:01:53,526 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:02:11,894 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4417.19043 ± 299.764
2025-05-13 13:02:11,894 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4498.9375, 4513.2207, 3528.0068, 4521.5356, 4644.092, 4506.136, 4489.4434, 4495.92, 4503.1006, 4471.511]
2025-05-13 13:02:11,894 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:02:11,904 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 3 hours, 5 minutes, 17 seconds)
2025-05-13 13:06:06,203 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:06:24,282 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4424.82080 ± 416.288
2025-05-13 13:06:24,283 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4586.0103, 4630.902, 4590.754, 3180.653, 4576.62, 4542.247, 4562.5825, 4552.0186, 4539.72, 4486.6997]
2025-05-13 13:06:24,283 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:06:24,290 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 3 hours, 1 minute, 5 seconds)
2025-05-13 13:10:18,833 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:10:36,579 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4355.66455 ± 526.433
2025-05-13 13:10:36,579 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4519.9233, 4536.061, 2782.0474, 4568.5483, 4620.316, 4554.9346, 4534.244, 4514.515, 4474.718, 4451.3364]
2025-05-13 13:10:36,579 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:10:36,587 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 56 minutes, 50 seconds)
2025-05-13 13:14:31,324 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:14:49,436 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4537.28418 ± 115.619
2025-05-13 13:14:49,437 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4632.779, 4593.6167, 4639.2847, 4409.791, 4530.626, 4249.9927, 4639.44, 4551.0186, 4553.5815, 4572.7065]
2025-05-13 13:14:49,437 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:14:49,445 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 52 minutes, 36 seconds)
2025-05-13 13:18:43,918 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:19:01,767 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4654.68848 ± 49.676
2025-05-13 13:19:01,767 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4654.0693, 4579.6274, 4696.12, 4609.745, 4706.7563, 4644.8203, 4710.8716, 4718.031, 4644.4307, 4582.413]
2025-05-13 13:19:01,767 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:19:01,767 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (4654.69) for latency ExtremeClogL1U23
2025-05-13 13:19:01,776 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 48 minutes, 20 seconds)
2025-05-13 13:22:55,882 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:23:13,945 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4384.74902 ± 924.630
2025-05-13 13:23:13,945 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1630.4099, 4763.212, 4762.628, 4793.9023, 4639.629, 4708.0527, 4727.7905, 4749.883, 4683.2334, 4388.748]
2025-05-13 13:23:13,945 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:23:13,954 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 44 minutes, 3 seconds)
2025-05-13 13:27:08,546 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:27:26,412 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4740.58203 ± 72.804
2025-05-13 13:27:26,412 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4677.2256, 4713.4556, 4769.545, 4729.4814, 4735.554, 4887.4053, 4588.5703, 4761.742, 4782.658, 4760.186]
2025-05-13 13:27:26,412 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:27:26,412 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (4740.58) for latency ExtremeClogL1U23
2025-05-13 13:27:26,420 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 39 minutes, 52 seconds)
2025-05-13 13:31:21,506 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:31:39,582 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4663.00684 ± 225.175
2025-05-13 13:31:39,582 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4754.467, 4713.429, 4783.4985, 4740.038, 4756.9585, 4771.95, 4709.6113, 4745.0913, 4659.8945, 3995.1265]
2025-05-13 13:31:39,582 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:31:39,591 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 35 minutes, 46 seconds)
2025-05-13 13:35:34,755 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:35:52,605 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4753.08545 ± 97.858
2025-05-13 13:35:52,605 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4876.971, 4810.7686, 4804.4062, 4831.995, 4764.308, 4829.72, 4547.1167, 4713.5986, 4732.3657, 4619.606]
2025-05-13 13:35:52,605 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:35:52,605 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (4753.09) for latency ExtremeClogL1U23
2025-05-13 13:35:52,613 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 31 minutes, 34 seconds)
2025-05-13 13:39:47,880 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:40:05,908 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4672.68311 ± 187.748
2025-05-13 13:40:05,908 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4771.0186, 4692.245, 4729.96, 4115.751, 4747.9385, 4788.768, 4699.667, 4730.2847, 4715.5376, 4735.661]
2025-05-13 13:40:05,908 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:40:05,914 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 27 minutes, 28 seconds)
2025-05-13 13:44:00,655 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:44:18,484 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4563.38721 ± 570.540
2025-05-13 13:44:18,485 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4808.917, 4729.4316, 4761.716, 2854.059, 4696.776, 4783.2476, 4764.4595, 4754.64, 4725.7036, 4754.9204]
2025-05-13 13:44:18,485 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:44:18,493 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 23 minutes, 18 seconds)
2025-05-13 13:48:13,263 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:48:31,363 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4769.10254 ± 40.902
2025-05-13 13:48:31,363 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4718.4927, 4806.484, 4691.355, 4801.0933, 4760.553, 4782.8203, 4737.506, 4763.873, 4802.01, 4826.8345]
2025-05-13 13:48:31,363 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:48:31,363 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (4769.10) for latency ExtremeClogL1U23
2025-05-13 13:48:31,374 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 2 hours, 19 minutes, 8 seconds)
2025-05-13 13:52:26,234 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:52:44,261 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4406.91797 ± 1046.594
2025-05-13 13:52:44,261 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4879.0347, 4844.7314, 4731.5933, 4879.3315, 4433.9243, 4262.782, 4898.4214, 4876.5806, 4930.393, 1332.3861]
2025-05-13 13:52:44,261 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:52:44,269 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 2 hours, 14 minutes, 53 seconds)
2025-05-13 13:56:39,040 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:56:56,999 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4460.49121 ± 672.858
2025-05-13 13:56:56,999 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4816.0317, 4827.2627, 4797.6694, 4777.3145, 4733.346, 4836.876, 4840.6885, 4666.8315, 3499.136, 2809.755]
2025-05-13 13:56:56,999 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:56:57,007 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 2 hours, 10 minutes, 39 seconds)
2025-05-13 14:00:52,244 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:01:10,106 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4791.59229 ± 48.932
2025-05-13 14:01:10,106 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4728.979, 4856.129, 4847.2017, 4760.4844, 4848.4844, 4830.4155, 4779.7583, 4730.4023, 4798.3687, 4735.7026]
2025-05-13 14:01:10,106 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:01:10,106 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (4791.59) for latency ExtremeClogL1U23
2025-05-13 14:01:10,117 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 2 hours, 6 minutes, 25 seconds)
2025-05-13 14:05:05,145 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:05:23,244 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4619.89746 ± 588.731
2025-05-13 14:05:23,244 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4847.5356, 4919.724, 4759.46, 4741.992, 4806.426, 4783.409, 2861.1226, 4895.093, 4785.7007, 4798.5117]
2025-05-13 14:05:23,244 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:05:23,251 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 2 hours, 2 minutes, 15 seconds)
2025-05-13 14:09:18,092 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:09:36,182 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4760.67334 ± 151.958
2025-05-13 14:09:36,182 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4772.943, 4820.6816, 4832.688, 4916.06, 4561.534, 4787.157, 4792.77, 4776.9966, 4938.27, 4407.6377]
2025-05-13 14:09:36,182 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:09:36,189 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 58 minutes, 2 seconds)
2025-05-13 14:13:30,778 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:13:48,847 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4509.48340 ± 1092.552
2025-05-13 14:13:48,847 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4992.202, 4931.692, 4890.4995, 1243.9734, 4634.654, 4825.3853, 4895.9775, 4815.2505, 4920.369, 4944.837]
2025-05-13 14:13:48,847 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:13:48,857 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 53 minutes, 48 seconds)
2025-05-13 14:17:43,322 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:18:01,162 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4204.89258 ± 1156.929
2025-05-13 14:18:01,162 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4879.05, 1249.6888, 4805.7935, 4684.6567, 4766.67, 4671.1147, 4748.692, 4685.891, 4823.3228, 2734.0469]
2025-05-13 14:18:01,162 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:18:01,170 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 49 minutes, 33 seconds)
2025-05-13 14:21:55,239 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:22:13,291 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4269.55859 ± 1213.394
2025-05-13 14:22:13,291 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4855.0156, 4952.6074, 4834.696, 4885.049, 4927.5176, 4748.981, 1188.1451, 4907.066, 4697.9224, 2698.5862]
2025-05-13 14:22:13,291 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:22:13,300 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 45 minutes, 15 seconds)
2025-05-13 14:26:07,735 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:26:25,573 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4846.46338 ± 57.050
2025-05-13 14:26:25,573 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4877.42, 4877.2314, 4742.182, 4854.28, 4849.9106, 4774.8833, 4862.5044, 4859.6377, 4807.7036, 4958.879]
2025-05-13 14:26:25,573 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:26:25,573 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (4846.46) for latency ExtremeClogL1U23
2025-05-13 14:26:25,585 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 40 minutes, 59 seconds)
2025-05-13 14:30:19,449 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:30:37,712 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4848.18066 ± 65.357
2025-05-13 14:30:37,712 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4888.511, 4842.279, 4787.3677, 4829.805, 4743.796, 4758.374, 4957.2656, 4876.1426, 4908.7065, 4889.557]
2025-05-13 14:30:37,712 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:30:37,712 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (4848.18) for latency ExtremeClogL1U23
2025-05-13 14:30:37,722 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 36 minutes, 43 seconds)
2025-05-13 14:34:31,593 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:34:49,485 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4750.78271 ± 225.531
2025-05-13 14:34:49,485 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4803.8364, 4854.163, 4613.862, 4914.855, 4117.6406, 4841.783, 4781.5366, 4905.6978, 4822.518, 4851.9355]
2025-05-13 14:34:49,485 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:34:49,493 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 32 minutes, 26 seconds)
2025-05-13 14:38:44,162 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:39:02,058 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4084.61597 ± 1323.847
2025-05-13 14:39:02,058 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1055.7297, 4898.618, 4802.7124, 4842.0415, 4851.684, 4884.0317, 4882.7393, 3445.7473, 4977.1606, 2205.6948]
2025-05-13 14:39:02,058 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:39:02,070 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 28 minutes, 15 seconds)
2025-05-13 14:42:56,438 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:43:14,675 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4214.81592 ± 1230.550
2025-05-13 14:43:14,675 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1679.0304, 4903.267, 4921.339, 4504.367, 4779.824, 1852.912, 4858.4375, 4860.8657, 4933.668, 4854.449]
2025-05-13 14:43:14,675 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:43:14,686 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 24 minutes, 5 seconds)
2025-05-13 14:47:09,075 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:47:26,999 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4645.60400 ± 544.175
2025-05-13 14:47:26,999 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4842.478, 3033.7043, 4837.3027, 4879.0728, 4585.202, 4863.3813, 4808.1855, 4857.2275, 4821.222, 4928.261]
2025-05-13 14:47:26,999 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:47:27,009 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 19 minutes, 53 seconds)
2025-05-13 14:51:21,008 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:51:39,069 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4780.80029 ± 135.137
2025-05-13 14:51:39,070 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4848.0312, 4857.186, 4861.889, 4876.9536, 4865.5786, 4891.1987, 4653.6655, 4498.434, 4594.303, 4860.7603]
2025-05-13 14:51:39,070 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:51:39,079 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 15 minutes, 40 seconds)
2025-05-13 14:55:32,877 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:55:51,179 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4152.11035 ± 1441.550
2025-05-13 14:55:51,179 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4928.996, 4969.6577, 4970.2188, 4940.6304, 4524.6807, 1267.0841, 4813.114, 4907.0566, 4907.319, 1292.3458]
2025-05-13 14:55:51,179 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:55:51,188 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 11 minutes, 29 seconds)
2025-05-13 14:59:45,948 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:00:04,047 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4534.57568 ± 1119.945
2025-05-13 15:00:04,047 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1211.7434, 5029.1035, 4980.9727, 5010.2065, 4954.3184, 5088.427, 4834.2886, 4459.2183, 4855.6396, 4921.8384]
2025-05-13 15:00:04,047 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:00:04,057 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 1 hour, 7 minutes, 18 seconds)
2025-05-13 15:03:58,698 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:04:16,508 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4785.66113 ± 224.993
2025-05-13 15:04:16,509 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4953.7017, 4156.1216, 4891.2085, 4725.8965, 4720.324, 4835.5513, 4908.8896, 4982.794, 4843.066, 4839.056]
2025-05-13 15:04:16,509 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:04:16,518 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 1 hour, 3 minutes, 5 seconds)
2025-05-13 15:08:11,106 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:08:29,353 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4859.15527 ± 128.286
2025-05-13 15:08:29,353 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4883.598, 4862.943, 5003.8257, 4862.995, 4944.9204, 4884.392, 4922.642, 4874.4316, 4854.2188, 4497.5806]
2025-05-13 15:08:29,353 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:08:29,353 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (4859.16) for latency ExtremeClogL1U23
2025-05-13 15:08:29,363 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 58 minutes, 54 seconds)
2025-05-13 15:12:24,450 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:12:42,719 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4768.97461 ± 580.844
2025-05-13 15:12:42,719 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4971.2583, 4981.298, 5024.6177, 5007.3335, 4858.5093, 4932.413, 4922.339, 4952.621, 5007.2935, 3032.0625]
2025-05-13 15:12:42,719 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:12:42,728 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 54 minutes, 45 seconds)
2025-05-13 15:16:37,598 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:16:55,552 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4710.95068 ± 739.369
2025-05-13 15:16:55,552 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4967.137, 5025.285, 4851.055, 4867.7144, 5006.3984, 5028.38, 2506.9841, 5022.6577, 5040.3745, 4793.522]
2025-05-13 15:16:55,552 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:16:55,563 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 50 minutes, 34 seconds)
2025-05-13 15:20:50,196 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:21:08,074 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4384.08691 ± 1110.695
2025-05-13 15:21:08,074 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3689.857, 4947.494, 4890.0444, 4500.652, 4927.7437, 4939.66, 1245.4379, 4824.0796, 4988.0044, 4887.894]
2025-05-13 15:21:08,074 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:21:08,082 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 46 minutes, 20 seconds)
2025-05-13 15:25:02,465 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:25:20,257 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4926.36182 ± 60.085
2025-05-13 15:25:20,257 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4962.872, 4858.724, 4997.421, 4841.5254, 4938.7583, 4840.928, 4951.172, 4891.173, 5004.803, 4976.24]
2025-05-13 15:25:20,257 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:25:20,257 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (4926.36) for latency ExtremeClogL1U23
2025-05-13 15:25:20,270 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 42 minutes, 7 seconds)
2025-05-13 15:29:14,852 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:29:32,951 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4254.76465 ± 1271.498
2025-05-13 15:29:32,951 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2063.063, 4785.1416, 4847.142, 4963.748, 4888.503, 4966.3843, 4728.6587, 4980.7534, 4920.146, 1404.1042]
2025-05-13 15:29:32,951 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:29:32,962 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 37 minutes, 54 seconds)
2025-05-13 15:33:29,349 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:33:47,197 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4520.19824 ± 1100.865
2025-05-13 15:33:47,197 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4871.9893, 4891.772, 4956.4365, 4892.1035, 4911.254, 1224.5592, 4910.5776, 4903.469, 4953.092, 4686.7305]
2025-05-13 15:33:47,197 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:33:47,209 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 33 minutes, 43 seconds)
2025-05-13 15:37:54,078 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:38:13,658 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4700.45361 ± 426.269
2025-05-13 15:38:13,658 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4902.741, 4876.0146, 4978.8887, 4587.735, 4877.3174, 4937.752, 4781.7827, 5019.6665, 3503.3767, 4539.261]
2025-05-13 15:38:13,658 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:38:13,669 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 29 minutes, 49 seconds)
2025-05-13 15:42:10,995 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:42:28,858 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4657.73535 ± 582.510
2025-05-13 15:42:28,858 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4860.2417, 4802.803, 5008.131, 4991.419, 4958.528, 4635.308, 4781.546, 2958.3198, 4974.1284, 4606.9272]
2025-05-13 15:42:28,858 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:42:28,867 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 25 minutes, 36 seconds)
2025-05-13 15:46:11,539 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:46:29,183 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4021.44287 ± 1412.765
2025-05-13 15:46:29,184 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1817.3036, 5032.8643, 1710.4003, 2101.0676, 4974.7285, 5017.94, 5001.3145, 4578.5938, 4982.3423, 4997.877]
2025-05-13 15:46:29,184 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:46:29,195 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 21 minutes, 8 seconds)
2025-05-13 15:50:12,872 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:50:31,956 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4600.87061 ± 963.338
2025-05-13 15:50:31,956 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5002.1245, 4968.7695, 4856.5474, 4992.5493, 4941.6836, 4809.1704, 5023.727, 4955.978, 4735.229, 1722.9275]
2025-05-13 15:50:31,956 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:50:31,965 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 16 minutes, 47 seconds)
2025-05-13 15:54:57,880 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:55:17,300 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3786.78662 ± 1668.196
2025-05-13 15:55:17,301 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4985.672, 5014.8667, 1243.5093, 4524.3467, 5041.839, 4977.2944, 4798.732, 1142.7892, 4777.409, 1361.4114]
2025-05-13 15:55:17,301 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:55:17,314 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 12 minutes, 54 seconds)
2025-05-13 15:59:43,937 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 16:00:01,797 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4662.95410 ± 552.568
2025-05-13 16:00:01,797 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4962.378, 3445.6958, 4849.773, 5079.8296, 4791.021, 3704.2512, 4925.134, 4878.286, 4993.561, 4999.614]
2025-05-13 16:00:01,797 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 16:00:01,809 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 8 minutes, 43 seconds)
2025-05-13 16:03:44,925 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 16:04:02,799 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4297.12012 ± 1247.142
2025-05-13 16:04:02,799 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5006.6562, 2497.655, 4762.94, 4665.507, 4985.0967, 1251.67, 4929.1123, 4973.257, 4909.873, 4989.433]
2025-05-13 16:04:02,799 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 16:04:02,812 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 4 minutes, 18 seconds)
2025-05-13 16:07:45,365 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 16:08:02,949 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4704.54492 ± 530.042
2025-05-13 16:08:02,949 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4933.182, 4939.869, 4936.3765, 5016.308, 4860.5303, 3187.7988, 4920.1855, 4832.5215, 4994.886, 4423.7935]
2025-05-13 16:08:02,949 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 16:08:02,958 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1251 [DEBUG]: Training session finished
