2025-05-13 09:06:38,281 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-bpql-mda-mem2
2025-05-13 09:06:38,281 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-bpql-mda-mem2
2025-05-13 09:06:38,281 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x1455aa376110>}
2025-05-13 09:06:38,281 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1111 [DEBUG]: using device: cuda
2025-05-13 09:06:38,287 baseline-bpql-mda-noisy-halfcheetah:91 [WARNING]: args.assumed_delay != args.horizon: 2 != 24
2025-05-13 09:06:38,287 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1133 [INFO]: Creating new trainer
2025-05-13 09:06:38,304 baseline-bpql-mda-noisy-halfcheetah:119 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-05-13 09:06:38,304 baseline-bpql-mda-noisy-halfcheetah:120 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-13 09:06:38,310 baseline-bpql-mda-noisy-halfcheetah:149 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Identity()
  (net_rec): GRU(6, 384, batch_first=True)
)
2025-05-13 09:06:38,995 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1194 [DEBUG]: Starting training session...
2025-05-13 09:06:38,995 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 1/100
2025-05-13 09:10:25,630 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:10:38,384 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -411.19449 ± 50.138
2025-05-13 09:10:38,385 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-370.0723, -398.2757, -388.12125, -470.1932, -365.7175, -389.0151, -414.55234, -409.14822, -534.4339, -372.41525]
2025-05-13 09:10:38,385 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:10:38,385 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (-411.19) for latency ExtremeClogL1U23
2025-05-13 09:10:38,392 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 6 hours, 35 minutes)
2025-05-13 09:14:29,156 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:14:41,899 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: -296.37933 ± 25.435
2025-05-13 09:14:41,899 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [-296.7315, -287.9579, -293.20544, -297.61792, -311.69595, -266.3883, -352.25, -297.15338, -308.853, -251.9403]
2025-05-13 09:14:41,900 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:14:41,900 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (-296.38) for latency ExtremeClogL1U23
2025-05-13 09:14:41,906 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 6 hours, 34 minutes, 22 seconds)
2025-05-13 09:18:33,149 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:18:45,772 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 827.91541 ± 214.225
2025-05-13 09:18:45,772 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1202.6042, 702.39746, 633.0901, 455.42743, 656.21735, 968.65265, 899.49194, 980.2009, 748.00635, 1033.0654]
2025-05-13 09:18:45,772 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:18:45,772 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (827.92) for latency ExtremeClogL1U23
2025-05-13 09:18:45,778 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 6 hours, 31 minutes, 39 seconds)
2025-05-13 09:22:37,118 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:22:49,820 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 1070.48608 ± 548.423
2025-05-13 09:22:49,820 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1550.3231, 483.68774, 1648.8688, 358.2063, 797.66785, 972.5882, 1845.339, 863.2159, 435.63403, 1749.3302]
2025-05-13 09:22:49,820 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:22:49,820 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (1070.49) for latency ExtremeClogL1U23
2025-05-13 09:22:49,825 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 6 hours, 28 minutes, 19 seconds)
2025-05-13 09:26:41,449 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:26:53,976 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2127.26001 ± 322.793
2025-05-13 09:26:53,976 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1834.2115, 2370.8025, 1550.1252, 2588.6436, 2169.3718, 2144.9895, 1687.0209, 2436.1519, 2120.2139, 2371.0684]
2025-05-13 09:26:53,976 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:26:53,976 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (2127.26) for latency ExtremeClogL1U23
2025-05-13 09:26:53,984 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 6 hours, 24 minutes, 44 seconds)
2025-05-13 09:30:45,407 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:30:57,950 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2373.56592 ± 150.989
2025-05-13 09:30:57,950 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2545.2878, 2624.6062, 2278.3318, 2427.4312, 2387.7458, 2287.9146, 2512.9692, 2108.905, 2212.5671, 2349.9011]
2025-05-13 09:30:57,950 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:30:57,950 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (2373.57) for latency ExtremeClogL1U23
2025-05-13 09:30:57,954 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 6 hours, 22 minutes, 7 seconds)
2025-05-13 09:34:49,420 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:35:01,839 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2346.08838 ± 410.248
2025-05-13 09:35:01,839 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [1729.2198, 2819.9941, 2643.374, 2610.3535, 2010.5035, 3084.6262, 2307.7578, 1893.8701, 2159.3743, 2201.8086]
2025-05-13 09:35:01,839 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:35:01,846 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 6 hours, 18 minutes, 10 seconds)
2025-05-13 09:38:53,055 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:39:05,647 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2305.30811 ± 677.747
2025-05-13 09:39:05,648 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2065.5432, 2617.3633, 2624.326, 3020.1438, 1138.3359, 1208.6561, 3003.7388, 3004.4805, 1827.5874, 2542.9067]
2025-05-13 09:39:05,648 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:39:05,653 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 6 hours, 14 minutes, 5 seconds)
2025-05-13 09:42:56,824 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:43:09,378 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2834.61816 ± 280.482
2025-05-13 09:43:09,378 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2671.465, 3103.3674, 2662.4106, 2584.7007, 2249.787, 2902.9314, 3038.2363, 3169.6094, 3143.119, 2820.5542]
2025-05-13 09:43:09,378 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:43:09,378 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (2834.62) for latency ExtremeClogL1U23
2025-05-13 09:43:09,384 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 6 hours, 9 minutes, 55 seconds)
2025-05-13 09:47:00,908 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:47:13,444 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2154.38989 ± 926.326
2025-05-13 09:47:13,444 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2758.6707, 577.1378, 2610.4182, 2817.6982, 2168.4836, 2980.5903, 2353.1304, 2580.879, 149.13477, 2547.7559]
2025-05-13 09:47:13,444 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:47:13,452 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 6 hours, 5 minutes, 50 seconds)
2025-05-13 09:51:05,150 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:51:17,909 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2873.48193 ± 371.696
2025-05-13 09:51:17,909 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2998.7954, 2421.787, 3255.9558, 2377.133, 2883.093, 2771.8213, 3416.783, 3058.4368, 2322.6355, 3228.3809]
2025-05-13 09:51:17,909 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:51:17,909 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (2873.48) for latency ExtremeClogL1U23
2025-05-13 09:51:17,915 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 6 hours, 1 minute, 55 seconds)
2025-05-13 09:55:09,642 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:55:22,287 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2652.51758 ± 596.541
2025-05-13 09:55:22,287 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3072.9138, 1185.9403, 2639.8696, 3054.88, 2849.737, 3377.979, 2911.2078, 2198.6558, 2279.1013, 2954.8916]
2025-05-13 09:55:22,287 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:55:22,294 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 5 hours, 57 minutes, 59 seconds)
2025-05-13 09:59:13,950 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 09:59:26,774 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2724.80835 ± 839.705
2025-05-13 09:59:26,774 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [2855.404, 3516.0623, 3217.6401, 2538.5337, 2857.2625, 375.44626, 3210.503, 3245.984, 2922.4932, 2508.7566]
2025-05-13 09:59:26,774 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 09:59:26,781 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 5 hours, 54 minutes, 7 seconds)
2025-05-13 10:03:18,315 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:03:31,208 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 2791.46045 ± 993.854
2025-05-13 10:03:31,208 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3441.1724, 2973.9224, 2945.9595, 3471.3433, 3271.8115, 3139.8235, 1357.5815, 412.43942, 3448.7524, 3451.799]
2025-05-13 10:03:31,208 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:03:31,214 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 5 hours, 50 minutes, 15 seconds)
2025-05-13 10:07:22,594 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:07:35,285 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3117.03638 ± 270.079
2025-05-13 10:07:35,285 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3653.044, 2790.1702, 3499.1438, 3301.3047, 2937.8494, 3167.8252, 2927.4033, 2868.686, 3036.1907, 2988.7456]
2025-05-13 10:07:35,285 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:07:35,285 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (3117.04) for latency ExtremeClogL1U23
2025-05-13 10:07:35,292 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 5 hours, 46 minutes, 11 seconds)
2025-05-13 10:11:26,438 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:11:39,207 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3398.16479 ± 371.956
2025-05-13 10:11:39,207 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3638.959, 3514.7524, 3137.384, 3002.049, 3572.1917, 3737.224, 3035.7456, 3405.4607, 4101.539, 2836.3447]
2025-05-13 10:11:39,207 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:11:39,207 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (3398.16) for latency ExtremeClogL1U23
2025-05-13 10:11:39,218 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 5 hours, 41 minutes, 57 seconds)
2025-05-13 10:15:30,135 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:15:42,950 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3553.11719 ± 349.282
2025-05-13 10:15:42,950 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4048.3315, 3048.9387, 3236.7463, 3072.1956, 3836.0073, 3492.097, 3499.503, 3500.5554, 3723.266, 4073.5317]
2025-05-13 10:15:42,950 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:15:42,950 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (3553.12) for latency ExtremeClogL1U23
2025-05-13 10:15:42,956 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 5 hours, 37 minutes, 42 seconds)
2025-05-13 10:19:34,106 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:19:46,988 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 3776.73755 ± 375.279
2025-05-13 10:19:46,988 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4190.069, 3653.6863, 3685.1755, 3795.2952, 3848.0254, 3410.5613, 4222.3496, 4318.663, 3031.7532, 3611.796]
2025-05-13 10:19:46,988 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:19:46,988 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (3776.74) for latency ExtremeClogL1U23
2025-05-13 10:19:46,994 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 5 hours, 33 minutes, 31 seconds)
2025-05-13 10:23:38,082 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:23:50,986 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4064.64062 ± 533.588
2025-05-13 10:23:50,986 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3697.5063, 4649.2285, 4100.182, 4404.0166, 4390.135, 4567.88, 3822.7322, 4519.0474, 3632.3296, 2863.3528]
2025-05-13 10:23:50,986 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:23:50,986 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (4064.64) for latency ExtremeClogL1U23
2025-05-13 10:23:50,991 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 5 hours, 29 minutes, 20 seconds)
2025-05-13 10:27:42,032 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:27:54,903 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4297.14600 ± 687.111
2025-05-13 10:27:54,903 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3316.8928, 4457.309, 4752.8276, 4256.9985, 3577.6313, 3608.0007, 5230.879, 5221.0674, 4909.8696, 3639.984]
2025-05-13 10:27:54,904 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:27:54,904 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (4297.15) for latency ExtremeClogL1U23
2025-05-13 10:27:54,911 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 5 hours, 25 minutes, 13 seconds)
2025-05-13 10:31:45,894 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:31:58,594 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4065.15894 ± 625.562
2025-05-13 10:31:58,594 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4558.5327, 4293.797, 3522.5803, 4715.318, 2855.8525, 4345.7563, 3266.7202, 4460.256, 4785.1616, 3847.6184]
2025-05-13 10:31:58,594 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:31:58,603 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 5 hours, 21 minutes, 6 seconds)
2025-05-13 10:35:49,780 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:36:02,535 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4290.52295 ± 355.458
2025-05-13 10:36:02,535 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3893.2583, 4644.1807, 3794.8228, 4597.764, 4528.3135, 4621.7603, 3860.2585, 3886.3457, 4570.8955, 4507.628]
2025-05-13 10:36:02,535 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:36:02,541 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 5 hours, 17 minutes, 5 seconds)
2025-05-13 10:39:54,158 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:40:06,852 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4687.35156 ± 472.362
2025-05-13 10:40:06,853 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4685.2944, 3491.751, 4925.635, 4641.8413, 4902.933, 4633.117, 4413.8984, 5374.4556, 5066.978, 4737.6147]
2025-05-13 10:40:06,853 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:40:06,853 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (4687.35) for latency ExtremeClogL1U23
2025-05-13 10:40:06,859 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 5 hours, 13 minutes, 5 seconds)
2025-05-13 10:43:57,930 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:44:10,603 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4786.24951 ± 399.075
2025-05-13 10:44:10,603 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4726.6055, 5071.6523, 4938.872, 4238.017, 5328.3154, 4750.722, 4479.126, 5125.8633, 4049.016, 5154.3096]
2025-05-13 10:44:10,603 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:44:10,603 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (4786.25) for latency ExtremeClogL1U23
2025-05-13 10:44:10,611 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 5 hours, 8 minutes, 58 seconds)
2025-05-13 10:48:01,663 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:48:14,094 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4435.32715 ± 470.915
2025-05-13 10:48:14,095 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4203.6787, 4363.7114, 3780.686, 5183.9526, 3906.8508, 5032.0884, 4359.7295, 4837.2666, 3924.3403, 4760.9644]
2025-05-13 10:48:14,095 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:48:14,101 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 5 hours, 4 minutes, 47 seconds)
2025-05-13 10:52:05,374 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:52:17,917 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4341.14111 ± 922.107
2025-05-13 10:52:17,918 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4997.2227, 4429.1245, 2354.0696, 3632.4685, 4788.6255, 5300.0815, 4052.6409, 5743.9604, 4431.502, 3681.715]
2025-05-13 10:52:17,918 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:52:17,924 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 5 hours, 45 seconds)
2025-05-13 10:56:09,383 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 10:56:22,285 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4472.52881 ± 488.805
2025-05-13 10:56:22,285 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4082.8396, 4643.2354, 5121.341, 4016.6592, 5051.473, 3579.1565, 4640.339, 4882.0273, 4020.6873, 4687.531]
2025-05-13 10:56:22,285 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 10:56:22,292 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 4 hours, 56 minutes, 48 seconds)
2025-05-13 11:00:14,137 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:00:26,674 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4191.25977 ± 1294.903
2025-05-13 11:00:26,674 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4717.607, 3697.3516, 4825.938, 4159.868, 4784.565, 4351.4917, 4764.18, 4606.4663, 5466.1626, 538.9665]
2025-05-13 11:00:26,674 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:00:26,679 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 4 hours, 52 minutes, 45 seconds)
2025-05-13 11:04:17,831 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:04:30,332 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4190.60254 ± 1087.903
2025-05-13 11:04:30,332 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4956.7197, 4569.654, 4927.265, 4662.1406, 3779.6445, 1093.0249, 4265.2715, 4733.9697, 4714.8384, 4203.5]
2025-05-13 11:04:30,332 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:04:30,340 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 4 hours, 48 minutes, 40 seconds)
2025-05-13 11:08:21,449 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:08:33,962 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4759.11621 ± 421.890
2025-05-13 11:08:33,962 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5139.4116, 5051.021, 5065.511, 3891.1675, 4089.675, 4720.8916, 5104.098, 5024.127, 4931.409, 4573.851]
2025-05-13 11:08:33,962 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:08:33,968 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 4 hours, 44 minutes, 38 seconds)
2025-05-13 11:12:26,216 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:12:38,656 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4600.86914 ± 504.930
2025-05-13 11:12:38,656 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5336.219, 4779.7344, 4218.7837, 4543.509, 4361.8926, 4180.7026, 5113.513, 4772.1357, 3583.969, 5118.2295]
2025-05-13 11:12:38,656 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:12:38,663 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 4 hours, 40 minutes, 46 seconds)
2025-05-13 11:16:31,888 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:16:44,431 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4529.36182 ± 593.798
2025-05-13 11:16:44,431 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4211.1016, 4675.2305, 3251.7046, 5282.2207, 4462.285, 4877.706, 5245.498, 4416.475, 3918.3914, 4953.0024]
2025-05-13 11:16:44,431 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:16:44,439 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 4 hours, 37 minutes, 1 second)
2025-05-13 11:20:36,764 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:20:49,341 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4273.58545 ± 1324.503
2025-05-13 11:20:49,341 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4881.5674, 4712.2104, 459.6034, 5163.8677, 4614.0806, 4887.6577, 4684.059, 5112.579, 4444.121, 3776.1086]
2025-05-13 11:20:49,341 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:20:49,349 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 4 hours, 33 minutes, 3 seconds)
2025-05-13 11:24:41,102 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:24:53,893 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4740.96533 ± 492.201
2025-05-13 11:24:53,893 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3507.4065, 4524.6636, 4704.765, 4791.8774, 4600.51, 5218.4214, 5111.9897, 4544.8276, 5213.48, 5191.71]
2025-05-13 11:24:53,893 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:24:53,900 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 4 hours, 29 minutes, 10 seconds)
2025-05-13 11:28:44,697 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:28:57,152 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4725.25879 ± 567.524
2025-05-13 11:28:57,152 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3332.3027, 5278.9873, 4397.725, 4724.5146, 4752.979, 5279.8022, 5132.231, 4866.9253, 5174.8154, 4312.302]
2025-05-13 11:28:57,152 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:28:57,160 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 4 hours, 25 minutes, 1 second)
2025-05-13 11:32:51,291 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:33:03,651 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4734.28223 ± 458.440
2025-05-13 11:33:03,651 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5014.1294, 5408.583, 4362.062, 5042.8394, 5075.9526, 4376.307, 4082.6133, 4459.7476, 5302.8076, 4217.781]
2025-05-13 11:33:03,651 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:33:10,125 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 4 hours, 22 minutes, 42 seconds)
2025-05-13 11:36:59,001 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:37:11,492 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4869.25928 ± 431.232
2025-05-13 11:37:11,492 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4436.066, 5271.469, 4028.5483, 5376.8257, 5042.6772, 4571.4004, 4864.7886, 5025.797, 4621.68, 5453.3403]
2025-05-13 11:37:11,492 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:37:11,492 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (4869.26) for latency ExtremeClogL1U23
2025-05-13 11:37:11,500 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 4 hours, 17 minutes, 40 seconds)
2025-05-13 11:41:00,571 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:41:12,813 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4371.59619 ± 954.435
2025-05-13 11:41:12,813 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5289.0283, 4402.1787, 4751.8887, 1828.6226, 5162.653, 5018.819, 4132.5146, 3811.1326, 4467.5938, 4851.5312]
2025-05-13 11:41:12,813 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:41:12,822 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 4 hours, 12 minutes, 51 seconds)
2025-05-13 11:45:01,182 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:45:13,395 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4618.24414 ± 724.385
2025-05-13 11:45:13,395 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [3456.9807, 3411.8184, 5292.8022, 5140.6777, 4219.65, 5520.691, 4572.2446, 4916.8613, 5356.869, 4293.847]
2025-05-13 11:45:13,395 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:45:13,401 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 4 hours, 7 minutes, 57 seconds)
2025-05-13 11:49:01,407 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:49:13,917 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5024.85010 ± 408.881
2025-05-13 11:49:13,917 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5447.7876, 5012.9004, 5328.319, 4504.0527, 4610.078, 4437.824, 4650.7036, 5395.672, 5375.215, 5485.9478]
2025-05-13 11:49:13,917 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:49:13,917 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (5024.85) for latency ExtremeClogL1U23
2025-05-13 11:49:13,925 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 4 hours, 3 minutes, 21 seconds)
2025-05-13 11:53:01,556 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:53:13,934 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5035.37598 ± 459.323
2025-05-13 11:53:13,934 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5582.106, 4795.9805, 5323.6514, 5051.6353, 5421.913, 5002.4697, 4112.6626, 4420.0513, 5572.4146, 5070.8755]
2025-05-13 11:53:13,934 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:53:13,934 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (5035.38) for latency ExtremeClogL1U23
2025-05-13 11:53:13,941 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 3 hours, 56 minutes, 45 seconds)
2025-05-13 11:57:01,587 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 11:57:13,959 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4599.73926 ± 1131.800
2025-05-13 11:57:13,959 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5638.618, 4639.1626, 5556.2266, 1467.2362, 4567.7896, 5425.043, 4513.517, 5128.549, 4637.092, 4424.157]
2025-05-13 11:57:13,959 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 11:57:13,968 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 3 hours, 52 minutes, 28 seconds)
2025-05-13 12:01:01,986 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:01:14,063 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5215.70654 ± 520.314
2025-05-13 12:01:14,063 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5060.4634, 4911.3257, 4711.3223, 5939.182, 5313.438, 4476.1733, 4964.0996, 4879.9663, 5960.98, 5940.1187]
2025-05-13 12:01:14,063 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:01:14,063 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (5215.71) for latency ExtremeClogL1U23
2025-05-13 12:01:14,071 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 3 hours, 48 minutes, 14 seconds)
2025-05-13 12:05:02,434 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:05:14,667 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5189.91309 ± 598.974
2025-05-13 12:05:14,667 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5933.976, 4499.064, 4662.7446, 5655.531, 4639.8984, 5604.575, 6265.6484, 5145.7134, 4614.975, 4877.0015]
2025-05-13 12:05:14,667 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:05:14,674 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 3 hours, 44 minutes, 14 seconds)
2025-05-13 12:09:03,039 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:09:15,174 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4918.87402 ± 559.146
2025-05-13 12:09:15,174 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4339.6445, 5235.8076, 4098.7075, 4237.869, 5360.6597, 5688.555, 4651.2153, 5228.3506, 4689.761, 5658.1694]
2025-05-13 12:09:15,174 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:09:15,181 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 3 hours, 40 minutes, 13 seconds)
2025-05-13 12:13:03,346 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:13:15,677 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5296.70459 ± 370.968
2025-05-13 12:13:15,677 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4511.402, 4902.734, 5480.6553, 5817.0405, 5531.8403, 5137.168, 5420.845, 5526.4175, 5045.87, 5593.074]
2025-05-13 12:13:15,677 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:13:15,677 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (5296.70) for latency ExtremeClogL1U23
2025-05-13 12:13:15,686 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 3 hours, 36 minutes, 18 seconds)
2025-05-13 12:17:03,918 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:17:16,128 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5013.22363 ± 274.710
2025-05-13 12:17:16,128 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5133.239, 5032.7153, 4515.4414, 5180.965, 5339.2363, 4838.648, 5354.8843, 5256.132, 4777.141, 4703.84]
2025-05-13 12:17:16,128 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:17:16,135 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 3 hours, 32 minutes, 22 seconds)
2025-05-13 12:21:04,110 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:21:16,234 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4945.72705 ± 522.792
2025-05-13 12:21:16,234 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5101.52, 5120.8726, 4645.2036, 5110.21, 4888.507, 4599.272, 3759.9663, 5580.912, 4913.2764, 5737.5283]
2025-05-13 12:21:16,235 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:21:16,243 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 3 hours, 28 minutes, 22 seconds)
2025-05-13 12:25:04,464 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:25:16,650 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5240.14453 ± 209.173
2025-05-13 12:25:16,650 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5014.065, 5013.07, 4906.507, 5418.9863, 5471.6904, 5202.209, 5145.752, 5561.522, 5391.97, 5275.674]
2025-05-13 12:25:16,650 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:25:16,657 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 3 hours, 24 minutes, 20 seconds)
2025-05-13 12:29:04,516 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:29:16,620 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5060.22900 ± 403.660
2025-05-13 12:29:16,620 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4719.1357, 4345.354, 5095.481, 5468.0254, 5806.9204, 4936.4473, 5144.7334, 4795.7476, 5436.835, 4853.608]
2025-05-13 12:29:16,620 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:29:16,633 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 3 hours, 20 minutes, 14 seconds)
2025-05-13 12:33:04,653 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:33:16,839 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4774.85693 ± 1445.751
2025-05-13 12:33:16,839 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5105.8047, 4929.222, 4355.9243, 5607.5474, 5775.7124, 665.24915, 4971.9185, 4821.77, 5892.067, 5623.356]
2025-05-13 12:33:16,839 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:33:16,846 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 3 hours, 16 minutes, 11 seconds)
2025-05-13 12:37:05,013 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:37:17,189 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4900.07812 ± 769.273
2025-05-13 12:37:17,189 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5991.911, 3840.503, 4062.905, 4012.9304, 4742.76, 5452.7207, 5408.5093, 6008.5356, 4388.909, 5091.099]
2025-05-13 12:37:17,189 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:37:17,198 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 3 hours, 12 minutes, 10 seconds)
2025-05-13 12:41:05,541 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:41:17,892 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5104.94385 ± 365.622
2025-05-13 12:41:17,892 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5183.444, 5722.301, 4495.4688, 4548.2617, 5041.4185, 5306.759, 4888.254, 5170.3545, 5180.2114, 5512.9707]
2025-05-13 12:41:17,892 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:41:17,899 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 3 hours, 8 minutes, 15 seconds)
2025-05-13 12:45:06,504 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:45:18,869 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5271.09082 ± 687.781
2025-05-13 12:45:18,869 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5869.3413, 4795.2715, 5583.351, 3558.6472, 5741.512, 4802.937, 5674.233, 5523.0444, 5939.9272, 5222.648]
2025-05-13 12:45:18,869 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:45:18,876 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 3 hours, 4 minutes, 20 seconds)
2025-05-13 12:49:07,427 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:49:19,599 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5185.49463 ± 476.348
2025-05-13 12:49:19,599 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5508.4683, 4870.3115, 5188.642, 4883.6987, 4136.275, 4940.5537, 5847.0747, 5261.2905, 5682.7744, 5535.8613]
2025-05-13 12:49:19,599 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:49:19,609 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 3 hours, 26 seconds)
2025-05-13 12:53:07,967 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:53:20,334 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5226.48975 ± 422.572
2025-05-13 12:53:20,334 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4752.1123, 4600.075, 5529.5293, 5027.4834, 5410.7817, 5657.7485, 5020.349, 5659.078, 4767.5625, 5840.179]
2025-05-13 12:53:20,335 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:53:20,343 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 2 hours, 56 minutes, 30 seconds)
2025-05-13 12:57:08,621 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 12:57:21,204 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4697.14600 ± 588.118
2025-05-13 12:57:21,204 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4951.743, 5487.009, 4354.9663, 4538.227, 4408.853, 4541.6807, 5117.9688, 3304.7063, 5022.895, 5243.4146]
2025-05-13 12:57:21,204 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 12:57:21,212 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 2 hours, 52 minutes, 34 seconds)
2025-05-13 13:01:09,391 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:01:21,880 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4400.80078 ± 1571.154
2025-05-13 13:01:21,880 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5677.3306, 5919.59, 5032.2505, 3572.825, 969.1311, 2209.2551, 5490.0, 4710.385, 5704.514, 4722.725]
2025-05-13 13:01:21,880 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:01:21,888 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 2 hours, 48 minutes, 33 seconds)
2025-05-13 13:05:10,104 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:05:22,485 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4950.15576 ± 1146.531
2025-05-13 13:05:22,485 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5102.3, 5945.7056, 5583.0254, 5135.439, 4208.3896, 1779.256, 5449.462, 5741.5444, 5261.985, 5294.45]
2025-05-13 13:05:22,485 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:05:22,493 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 2 hours, 44 minutes, 29 seconds)
2025-05-13 13:09:10,574 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:09:22,928 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4657.67871 ± 678.415
2025-05-13 13:09:22,929 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5073.3936, 5469.4233, 5430.166, 4778.0767, 3033.768, 4457.9844, 4110.389, 4605.3345, 5068.3374, 4549.9146]
2025-05-13 13:09:22,929 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:09:22,937 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 2 hours, 40 minutes, 26 seconds)
2025-05-13 13:13:10,800 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:13:22,967 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5091.90186 ± 654.325
2025-05-13 13:13:22,967 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4391.232, 3923.5098, 4530.14, 5174.9917, 5366.2007, 5790.1523, 4577.5537, 5736.726, 5920.48, 5508.0312]
2025-05-13 13:13:22,967 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:13:22,976 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 2 hours, 36 minutes, 20 seconds)
2025-05-13 13:17:10,953 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:17:23,142 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4858.07471 ± 556.323
2025-05-13 13:17:23,142 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4753.1133, 4434.0522, 5277.9053, 3784.309, 5526.011, 5487.2783, 4506.043, 4372.962, 4982.574, 5456.4985]
2025-05-13 13:17:23,142 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:17:23,149 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 2 hours, 32 minutes, 14 seconds)
2025-05-13 13:21:11,416 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:21:23,761 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5532.18115 ± 421.286
2025-05-13 13:21:23,761 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5454.223, 5781.0713, 5846.051, 5521.238, 5668.037, 5315.2266, 5339.9634, 5777.0474, 4488.949, 6130.005]
2025-05-13 13:21:23,761 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:21:23,761 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (5532.18) for latency ExtremeClogL1U23
2025-05-13 13:21:23,771 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 2 hours, 28 minutes, 13 seconds)
2025-05-13 13:25:12,080 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:25:24,302 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4944.91895 ± 1205.891
2025-05-13 13:25:24,302 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5388.8345, 1473.0115, 5246.085, 4938.5264, 5249.4136, 5398.065, 5334.311, 5985.244, 4708.519, 5727.1777]
2025-05-13 13:25:24,302 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:25:24,312 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 2 hours, 24 minutes, 13 seconds)
2025-05-13 13:29:12,548 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:29:24,803 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4893.33496 ± 1043.896
2025-05-13 13:29:24,803 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4749.306, 5387.5864, 6038.666, 5347.8506, 5819.493, 2458.611, 5557.0303, 4205.021, 5499.4424, 3870.3403]
2025-05-13 13:29:24,803 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:29:24,809 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 2 hours, 20 minutes, 13 seconds)
2025-05-13 13:33:12,951 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:33:25,292 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4696.79785 ± 1432.257
2025-05-13 13:33:25,292 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4771.8555, 5543.9346, 4968.7793, 5160.0464, 5431.8965, 5540.6494, 520.4891, 5137.97, 4451.845, 5440.5103]
2025-05-13 13:33:25,292 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:33:25,298 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 2 hours, 16 minutes, 15 seconds)
2025-05-13 13:37:13,311 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:37:25,527 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4964.90039 ± 509.159
2025-05-13 13:37:25,528 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4891.7603, 4325.3916, 5876.163, 5159.4053, 5116.425, 4951.8735, 4108.7007, 4738.589, 5659.0996, 4821.596]
2025-05-13 13:37:25,528 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:37:25,537 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 2 hours, 12 minutes, 15 seconds)
2025-05-13 13:41:13,565 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:41:25,766 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5228.90918 ± 679.628
2025-05-13 13:41:25,766 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4676.2007, 4620.086, 6152.1577, 5339.3945, 5964.8267, 5432.514, 3959.8704, 6032.217, 4756.294, 5355.534]
2025-05-13 13:41:25,766 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:41:25,774 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 2 hours, 8 minutes, 12 seconds)
2025-05-13 13:45:13,657 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:45:25,869 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4987.97803 ± 834.951
2025-05-13 13:45:25,869 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5245.923, 4691.829, 4152.969, 5177.54, 5235.407, 6101.5396, 5681.228, 5310.394, 5317.295, 2965.658]
2025-05-13 13:45:25,869 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:45:25,877 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 2 hours, 4 minutes, 9 seconds)
2025-05-13 13:49:14,014 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:49:26,477 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4612.82471 ± 1200.694
2025-05-13 13:49:26,477 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5365.9746, 4995.6797, 4683.635, 5381.754, 1889.4154, 2905.0256, 5415.9697, 4280.502, 5760.331, 5449.9604]
2025-05-13 13:49:26,477 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:49:26,488 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 2 hours, 10 seconds)
2025-05-13 13:53:14,121 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:53:26,249 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4972.33740 ± 774.085
2025-05-13 13:53:26,249 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5780.3286, 5136.811, 5106.049, 3823.2302, 5054.039, 5394.8823, 5927.366, 4156.598, 5679.6973, 3664.374]
2025-05-13 13:53:26,249 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:53:26,258 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 1 hour, 56 minutes, 5 seconds)
2025-05-13 13:57:14,623 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 13:57:27,215 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4994.37744 ± 1184.546
2025-05-13 13:57:27,215 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5256.203, 6145.38, 5836.2246, 5874.388, 5696.5137, 1776.5173, 5139.14, 4982.4956, 4575.6685, 4661.2417]
2025-05-13 13:57:27,215 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 13:57:27,227 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 1 hour, 52 minutes, 9 seconds)
2025-05-13 14:01:16,085 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:01:28,226 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4749.03223 ± 1050.703
2025-05-13 14:01:28,226 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5097.299, 5565.517, 5331.41, 4886.873, 4623.91, 5097.4565, 1881.4172, 4353.8364, 4730.1743, 5922.428]
2025-05-13 14:01:28,226 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:01:28,233 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 1 hour, 48 minutes, 13 seconds)
2025-05-13 14:05:17,448 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:05:29,750 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5445.86572 ± 471.813
2025-05-13 14:05:29,751 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5866.318, 5233.5728, 6061.059, 5701.934, 5460.1567, 5833.356, 4929.6997, 4479.0903, 5769.81, 5123.659]
2025-05-13 14:05:29,751 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:05:29,758 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 1 hour, 44 minutes, 20 seconds)
2025-05-13 14:09:18,331 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:09:30,567 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4741.58838 ± 642.262
2025-05-13 14:09:30,567 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4756.529, 3864.6282, 5420.359, 4516.9507, 5559.0474, 4929.507, 3953.8472, 5466.776, 5126.6006, 3821.6357]
2025-05-13 14:09:30,567 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:09:30,578 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 1 hour, 40 minutes, 20 seconds)
2025-05-13 14:13:18,383 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:13:30,613 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5663.64941 ± 572.386
2025-05-13 14:13:30,613 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5349.4688, 6322.426, 5780.249, 5886.877, 6093.602, 5826.6255, 4448.554, 4859.6333, 5846.5, 6222.558]
2025-05-13 14:13:30,613 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:13:30,613 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (5663.65) for latency ExtremeClogL1U23
2025-05-13 14:13:30,625 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 1 hour, 36 minutes, 20 seconds)
2025-05-13 14:17:18,489 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:17:30,722 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 4945.58301 ± 590.603
2025-05-13 14:17:30,722 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [6148.6606, 3984.0967, 4470.879, 4738.399, 4857.363, 4847.1987, 5209.5127, 5682.277, 5031.0513, 4486.3906]
2025-05-13 14:17:30,722 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:17:30,731 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 1 hour, 32 minutes, 16 seconds)
2025-05-13 14:21:18,833 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:21:31,361 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5487.21191 ± 471.806
2025-05-13 14:21:31,362 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5567.556, 4773.707, 5497.6216, 5617.17, 4851.5903, 5933.153, 4960.694, 6309.1616, 5858.032, 5503.435]
2025-05-13 14:21:31,362 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:21:31,372 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 1 hour, 28 minutes, 13 seconds)
2025-05-13 14:25:19,753 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:25:32,207 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5478.82129 ± 461.614
2025-05-13 14:25:32,207 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4835.538, 5718.375, 4963.14, 4763.038, 5326.068, 6090.4404, 5902.5327, 5990.734, 5681.145, 5517.205]
2025-05-13 14:25:32,207 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:25:32,218 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 1 hour, 24 minutes, 10 seconds)
2025-05-13 14:29:20,333 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:29:32,895 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5031.82275 ± 331.312
2025-05-13 14:29:32,895 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4834.537, 5695.9883, 5182.4526, 4517.8643, 5186.191, 4587.9985, 4971.782, 4919.955, 5332.896, 5088.5645]
2025-05-13 14:29:32,895 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:29:32,905 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 1 hour, 20 minutes, 9 seconds)
2025-05-13 14:33:21,058 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:33:33,417 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5387.68652 ± 517.378
2025-05-13 14:33:33,417 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5125.0605, 5343.363, 4782.6104, 4763.143, 4830.209, 6390.0083, 5739.6357, 5884.0977, 5768.555, 5250.185]
2025-05-13 14:33:33,418 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:33:33,427 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 1 hour, 16 minutes, 10 seconds)
2025-05-13 14:37:22,065 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:37:34,488 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5360.81104 ± 647.168
2025-05-13 14:37:34,488 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4991.628, 4987.6304, 5950.8726, 5544.569, 4889.106, 4090.926, 5857.1343, 6367.468, 5009.1416, 5919.638]
2025-05-13 14:37:34,488 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:37:34,498 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 1 hour, 12 minutes, 13 seconds)
2025-05-13 14:41:23,265 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:41:35,642 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5307.08008 ± 496.007
2025-05-13 14:41:35,642 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5348.503, 5702.0566, 5765.27, 5444.3633, 6178.232, 5343.234, 4912.4927, 5271.108, 4456.4487, 4649.0938]
2025-05-13 14:41:35,642 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:41:35,651 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 1 hour, 8 minutes, 14 seconds)
2025-05-13 14:45:24,352 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:45:36,876 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5211.93018 ± 391.281
2025-05-13 14:45:36,876 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4646.416, 5608.87, 5575.0615, 5265.613, 5392.6904, 4453.6777, 4971.856, 5612.5786, 5494.8765, 5097.6597]
2025-05-13 14:45:36,876 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:45:36,887 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 1 hour, 4 minutes, 14 seconds)
2025-05-13 14:49:25,353 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:49:37,582 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5345.26953 ± 605.891
2025-05-13 14:49:37,582 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [6433.559, 5255.6147, 5077.3486, 5670.486, 5605.3594, 5040.566, 5996.0566, 5362.6787, 4114.1147, 4896.9136]
2025-05-13 14:49:37,582 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:49:37,590 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 1 hour, 14 seconds)
2025-05-13 14:53:25,686 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:53:37,902 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5348.05518 ± 660.141
2025-05-13 14:53:37,902 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4936.947, 5817.293, 6510.76, 6084.791, 4780.1694, 5063.9775, 5111.754, 4982.821, 5918.981, 4273.0586]
2025-05-13 14:53:37,902 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:53:37,911 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 56 minutes, 12 seconds)
2025-05-13 14:57:26,150 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 14:57:38,324 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5710.20459 ± 566.480
2025-05-13 14:57:38,324 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [6390.8906, 5763.111, 6505.76, 4968.484, 5721.695, 5449.4697, 4616.973, 5866.213, 5599.4185, 6220.0337]
2025-05-13 14:57:38,324 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 14:57:38,324 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1226 [INFO]: New best (5710.20) for latency ExtremeClogL1U23
2025-05-13 14:57:38,334 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 52 minutes, 9 seconds)
2025-05-13 15:01:26,582 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:01:38,871 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5468.76855 ± 437.590
2025-05-13 15:01:38,871 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5728.525, 5423.994, 5884.3237, 5713.9404, 5487.699, 5194.004, 5948.617, 4444.629, 5787.8765, 5074.08]
2025-05-13 15:01:38,871 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:01:38,881 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 48 minutes, 7 seconds)
2025-05-13 15:05:27,355 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:05:39,742 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5208.21387 ± 546.476
2025-05-13 15:05:39,742 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5583.78, 4469.056, 5353.7407, 4751.486, 5526.416, 6092.826, 4608.8228, 5617.6284, 5587.531, 4490.85]
2025-05-13 15:05:39,742 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:05:39,751 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 44 minutes, 6 seconds)
2025-05-13 15:09:28,274 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:09:40,489 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5491.11182 ± 675.290
2025-05-13 15:09:40,489 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5429.4854, 6449.1416, 5098.552, 5597.5786, 5631.3936, 4470.6255, 6190.54, 5832.292, 4237.01, 5974.502]
2025-05-13 15:09:40,489 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:09:40,501 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 40 minutes, 5 seconds)
2025-05-13 15:13:29,096 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:13:41,538 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5581.87061 ± 287.107
2025-05-13 15:13:41,538 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5008.025, 5328.359, 5805.318, 5554.411, 5411.038, 5383.8535, 5660.049, 5886.046, 5996.036, 5785.572]
2025-05-13 15:13:41,538 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:13:41,552 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 36 minutes, 6 seconds)
2025-05-13 15:17:29,708 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:17:42,048 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5684.55762 ± 284.310
2025-05-13 15:17:42,048 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5850.1953, 5313.0996, 5943.991, 6178.525, 5623.4146, 5567.775, 5492.6865, 5404.038, 5424.929, 6046.922]
2025-05-13 15:17:42,048 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:17:42,060 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 32 minutes, 5 seconds)
2025-05-13 15:21:29,851 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:21:42,017 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5459.36279 ± 615.831
2025-05-13 15:21:42,017 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [6037.8564, 6027.2417, 5488.5034, 5591.7617, 5828.264, 4751.3306, 4725.106, 5733.236, 6136.878, 4273.449]
2025-05-13 15:21:42,017 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:21:42,030 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 28 minutes, 4 seconds)
2025-05-13 15:25:29,486 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:25:41,886 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5566.17334 ± 541.911
2025-05-13 15:25:41,886 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5445.5137, 5503.16, 6177.4727, 6357.686, 4587.867, 5557.3467, 4952.9033, 6241.4907, 5625.807, 5212.4824]
2025-05-13 15:25:41,886 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:25:41,898 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 24 minutes, 2 seconds)
2025-05-13 15:29:29,927 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:29:42,369 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5261.27832 ± 559.411
2025-05-13 15:29:42,370 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4995.801, 4544.976, 5880.5254, 5051.7676, 4809.927, 4642.172, 5684.534, 6081.9565, 6007.9995, 4913.1226]
2025-05-13 15:29:42,370 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:29:42,429 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 20 minutes, 1 second)
2025-05-13 15:33:32,438 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:33:44,829 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5465.03613 ± 533.216
2025-05-13 15:33:44,829 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5538.8843, 5944.9717, 5221.6953, 5019.826, 4459.5645, 5237.494, 5267.657, 5452.074, 6291.9243, 6216.27]
2025-05-13 15:33:44,829 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:33:44,867 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 16 minutes, 2 seconds)
2025-05-13 15:37:35,126 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:37:47,358 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5381.57764 ± 500.676
2025-05-13 15:37:47,359 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4829.3286, 4794.752, 5864.4194, 5013.3506, 6142.416, 5414.296, 5783.983, 4701.712, 5357.03, 5914.489]
2025-05-13 15:37:47,359 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:37:47,428 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 12 minutes, 3 seconds)
2025-05-13 15:41:37,272 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:41:49,648 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5547.26074 ± 419.342
2025-05-13 15:41:49,648 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5562.2007, 5549.8315, 5165.7544, 5632.226, 6068.8936, 4490.8477, 5897.328, 5790.867, 5732.3438, 5582.315]
2025-05-13 15:41:49,648 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:41:49,662 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 8 minutes, 3 seconds)
2025-05-13 15:45:38,598 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:45:50,917 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5677.44141 ± 493.810
2025-05-13 15:45:50,917 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [4994.48, 5128.906, 6173.771, 5858.9443, 5857.978, 5422.2993, 6188.328, 6236.345, 6014.95, 4898.4116]
2025-05-13 15:45:50,917 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:45:50,929 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 4 minutes, 1 second)
2025-05-13 15:49:39,576 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-13 15:49:51,698 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1221 [DEBUG]: Total Reward: 5562.15137 ± 493.497
2025-05-13 15:49:51,698 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1222 [DEBUG]: All rewards: [5736.932, 5232.3765, 5006.399, 5955.707, 4571.848, 5280.9844, 6147.5273, 5649.934, 6059.5767, 5980.225]
2025-05-13 15:49:51,698 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-13 15:49:51,710 latency_env.delayed_mdp:training_loop(baseline-bpql-mda-noisy-halfcheetah):1251 [DEBUG]: Training session finished
