2025-09-11 18:36:06,212 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc20-halfcheetah/ExtremeClogL1U23-mbpac_memdelay
2025-09-11 18:36:06,212 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc20-halfcheetah/ExtremeClogL1U23-mbpac_memdelay
2025-09-11 18:36:06,212 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x1499ff669850>}
2025-09-11 18:36:06,212 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1111 [DEBUG]: using device: cuda
2025-09-11 18:36:06,218 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-11 18:36:06,236 baseline-mbpac-noiseperc20-halfcheetah:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-11 18:36:06,236 baseline-mbpac-noiseperc20-halfcheetah:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-11 18:36:06,244 baseline-mbpac-noiseperc20-halfcheetah:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=6, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-11 18:36:07,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-11 18:36:07,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-11 18:46:39,481 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 18:46:39,498 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 18:51:21,297 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: -395.56448 ± 37.979
2025-09-11 18:51:21,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [-329.57483, -421.9414, -366.08133, -413.7637, -431.9745, -335.70996, -445.96417, -387.094, -419.762, -403.77887]
2025-09-11 18:51:21,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 18:51:21,306 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (-395.56) for latency ExtremeClogL1U23
2025-09-11 18:51:21,314 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 25 hours, 8 minutes, 14 seconds)
2025-09-11 19:03:07,314 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:03:07,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:07:43,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: -184.84763 ± 60.554
2025-09-11 19:07:43,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [-120.7874, -226.06982, -296.19806, -110.22514, -101.34718, -154.07486, -239.3868, -172.49854, -222.81403, -205.0745]
2025-09-11 19:07:43,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 19:07:43,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (-184.85) for latency ExtremeClogL1U23
2025-09-11 19:07:43,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 25 hours, 48 minutes, 40 seconds)
2025-09-11 19:19:28,618 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:19:28,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:24:05,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 47.17355 ± 169.471
2025-09-11 19:24:05,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [-14.326639, -172.57922, -46.643204, 174.12724, 56.12751, 238.4676, -114.32315, 62.4399, 397.5876, -109.14211]
2025-09-11 19:24:05,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 19:24:05,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (47.17) for latency ExtremeClogL1U23
2025-09-11 19:24:05,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 25 hours, 51 minutes, 18 seconds)
2025-09-11 19:35:51,142 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:35:51,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:40:27,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 416.93719 ± 270.557
2025-09-11 19:40:27,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [378.681, 645.5762, 25.029043, 416.94504, 366.5535, 832.12646, -100.359856, 371.8603, 632.2178, 600.7426]
2025-09-11 19:40:27,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 19:40:27,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (416.94) for latency ExtremeClogL1U23
2025-09-11 19:40:27,711 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 25 hours, 44 minutes, 11 seconds)
2025-09-11 19:52:15,470 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:52:15,480 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:56:55,512 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1194.70215 ± 229.807
2025-09-11 19:56:55,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1409.3181, 1405.9465, 1145.0665, 561.8968, 1221.9042, 1168.2014, 1234.5417, 1174.021, 1274.0188, 1352.1063]
2025-09-11 19:56:55,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 19:56:55,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (1194.70) for latency ExtremeClogL1U23
2025-09-11 19:56:55,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 25 hours, 35 minutes, 17 seconds)
2025-09-11 20:08:44,445 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:08:44,463 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:13:23,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1317.65552 ± 501.051
2025-09-11 20:13:23,913 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1647.7522, 239.15897, 1590.186, 1310.0916, 1717.5126, 1394.3684, 1800.5979, 1833.7251, 878.86426, 764.29926]
2025-09-11 20:13:23,913 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 20:13:23,913 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (1317.66) for latency ExtremeClogL1U23
2025-09-11 20:13:23,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 25 hours, 42 minutes, 24 seconds)
2025-09-11 20:25:13,311 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:25:13,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:29:52,451 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1284.04431 ± 856.683
2025-09-11 20:29:52,452 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2008.1022, 2187.496, 363.49075, 561.19354, 2126.453, 1987.6323, 174.44778, 2041.8743, 1374.6595, 15.094494]
2025-09-11 20:29:52,452 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 20:29:52,463 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 25 hours, 27 minutes, 57 seconds)
2025-09-11 20:41:39,702 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:41:39,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:46:16,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2359.95557 ± 188.965
2025-09-11 20:46:16,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2111.8894, 2342.2427, 2290.7107, 2344.273, 2694.348, 2463.388, 2465.5032, 1993.3038, 2505.94, 2387.9592]
2025-09-11 20:46:16,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 20:46:16,415 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (2359.96) for latency ExtremeClogL1U23
2025-09-11 20:46:16,421 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 25 hours, 12 minutes)
2025-09-11 20:57:47,980 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:57:47,984 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:02:18,516 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1526.68042 ± 1093.091
2025-09-11 21:02:18,523 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [581.9217, 424.08035, -87.03197, 2842.167, 2902.993, 96.87736, 1905.949, 2386.2505, 2181.8113, 2031.7856]
2025-09-11 21:02:18,523 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 21:02:18,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 24 hours, 49 minutes, 36 seconds)
2025-09-11 21:13:48,316 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:13:48,325 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:18:22,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2060.19678 ± 711.828
2025-09-11 21:18:22,153 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2762.0164, 2232.9507, 2406.083, 2515.1826, 2105.9934, 2328.7358, 2465.8997, 668.9077, 680.8142, 2435.3833]
2025-09-11 21:18:22,153 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 21:18:22,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 24 hours, 25 minutes, 59 seconds)
2025-09-11 21:29:52,714 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:29:52,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:34:24,624 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1614.72827 ± 885.439
2025-09-11 21:34:24,624 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [699.93134, 2098.994, 2172.6604, 199.8194, 2334.138, 1714.8593, 2611.4907, 2100.1084, 62.551933, 2152.7302]
2025-09-11 21:34:24,624 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 21:34:24,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 24 hours, 2 minutes)
2025-09-11 21:45:55,019 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:45:55,029 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:50:26,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2602.85474 ± 133.121
2025-09-11 21:50:26,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2549.32, 2709.585, 2701.7576, 2293.3154, 2534.1807, 2518.078, 2621.171, 2663.0427, 2633.9172, 2804.1787]
2025-09-11 21:50:26,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 21:50:26,017 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (2602.85) for latency ExtremeClogL1U23
2025-09-11 21:50:26,023 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 23 hours, 37 minutes, 50 seconds)
2025-09-11 22:01:56,528 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:01:56,532 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:06:28,371 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2043.36975 ± 734.093
2025-09-11 22:06:28,391 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1605.3898, 1237.1879, 2413.676, 2816.1729, 2435.7966, 1276.8666, 2173.1846, 741.22986, 2836.9775, 2897.216]
2025-09-11 22:06:28,391 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 22:06:28,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 23 hours, 15 minutes, 28 seconds)
2025-09-11 22:18:05,241 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:18:05,248 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:22:37,463 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2468.58838 ± 786.399
2025-09-11 22:22:37,465 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [3040.393, 3115.2354, 2947.8457, 1797.5249, 2928.5767, 2644.502, 2829.3687, 588.9851, 1747.4263, 3046.0251]
2025-09-11 22:22:37,465 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 22:22:37,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 23 hours, 1 minute, 25 seconds)
2025-09-11 22:34:09,495 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:34:09,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:38:40,774 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2503.26343 ± 893.779
2025-09-11 22:38:40,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [3262.6477, 3269.08, 2775.003, 2879.343, 3017.6328, 3151.8984, 2935.748, 2037.4552, 1034.6119, 669.2149]
2025-09-11 22:38:40,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 22:38:40,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 22 hours, 45 minutes, 16 seconds)
2025-09-11 22:50:12,177 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:50:12,182 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:54:48,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2193.10864 ± 1064.483
2025-09-11 22:54:48,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1561.5067, 3166.1077, 3163.1843, 3226.1394, 1951.6157, 1371.9816, 1331.6821, 3255.598, -10.26696, 2913.5396]
2025-09-11 22:54:48,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 22:54:48,622 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 22 hours, 30 minutes, 43 seconds)
2025-09-11 23:06:24,578 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:06:24,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:10:55,186 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2272.14307 ± 1008.807
2025-09-11 23:10:55,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1924.1329, 3157.382, 3447.8652, 1932.5803, 1094.4547, 1231.2474, 3151.4846, 528.2988, 3246.0476, 3007.9402]
2025-09-11 23:10:55,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 23:10:55,200 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 22 hours, 16 minutes, 4 seconds)
2025-09-11 23:22:29,489 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:22:29,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:27:04,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2133.68701 ± 1265.234
2025-09-11 23:27:04,188 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [3111.6665, 3413.2234, 2145.894, 3494.1582, 782.1897, 2975.9214, 3527.6135, 390.99496, 1067.8663, 427.34283]
2025-09-11 23:27:04,188 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 23:27:04,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 22 hours, 1 minute, 46 seconds)
2025-09-11 23:38:32,383 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:38:32,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:42:58,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2709.82373 ± 1224.432
2025-09-11 23:42:58,585 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1689.8031, 3171.7993, 3336.9185, -62.621845, 1281.4507, 3682.4124, 3225.4685, 3815.5793, 3450.1194, 3507.3088]
2025-09-11 23:42:58,585 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 23:42:58,585 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (2709.82) for latency ExtremeClogL1U23
2025-09-11 23:42:58,621 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 21 hours, 41 minutes, 42 seconds)
2025-09-11 23:54:16,951 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:54:16,960 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:58:45,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2565.65771 ± 1066.886
2025-09-11 23:58:45,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [3377.2747, 3190.5164, 3482.3054, 3441.7327, 3651.2324, 3264.4536, 1644.5007, 1609.5033, 562.8218, 1432.2356]
2025-09-11 23:58:45,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 23:58:45,495 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 21 hours, 21 minutes, 15 seconds)
2025-09-12 00:10:11,843 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:10:11,853 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:14:41,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2777.56201 ± 1111.340
2025-09-12 00:14:41,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [3820.6152, 767.80963, 3473.07, 3482.594, 1575.4678, 1146.2958, 3535.287, 3748.362, 2668.578, 3557.5398]
2025-09-12 00:14:41,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:14:41,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (2777.56) for latency ExtremeClogL1U23
2025-09-12 00:14:41,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 21 hours, 2 minutes, 11 seconds)
2025-09-12 00:26:00,665 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:26:00,670 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:30:27,073 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 1667.68042 ± 1348.462
2025-09-12 00:30:27,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [3776.8716, 2325.135, 298.51328, 3256.877, 225.24203, 3512.0925, 291.1591, 658.47815, 1123.2804, 1209.1572]
2025-09-12 00:30:27,078 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:30:27,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 20 hours, 40 minutes, 41 seconds)
2025-09-12 00:41:45,604 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:41:45,615 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:46:14,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2605.40063 ± 1066.918
2025-09-12 00:46:14,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1313.3468, 992.97437, 2804.364, 2550.084, 3884.8984, 3553.5461, 3874.298, 3252.9202, 2793.8513, 1033.722]
2025-09-12 00:46:14,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:46:14,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 20 hours, 19 minutes, 22 seconds)
2025-09-12 00:58:19,547 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:58:19,552 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:03:02,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2023.85840 ± 1390.509
2025-09-12 01:03:02,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [-75.50357, 3962.0542, 2314.331, 1285.2386, 326.52063, 1556.2231, 2417.1743, 3752.0486, 920.69226, 3779.807]
2025-09-12 01:03:02,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:03:02,768 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 20 hours, 17 minutes, 3 seconds)
2025-09-12 01:15:09,134 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:15:09,139 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:19:50,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2414.56006 ± 1345.856
2025-09-12 01:19:50,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [4043.1443, 757.14594, 1047.7498, 3758.243, 3018.1301, 3546.7068, 650.3331, 3109.7004, 3450.6577, 763.7886]
2025-09-12 01:19:50,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:19:50,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 20 hours, 16 minutes, 20 seconds)
2025-09-12 01:31:56,535 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:31:56,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:36:41,726 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2555.87573 ± 1389.368
2025-09-12 01:36:41,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1294.8639, 3428.1484, 3574.776, 2619.059, 3821.4111, 196.1814, 3700.4746, 3216.6543, 120.47305, 3586.7139]
2025-09-12 01:36:41,727 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:36:41,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 20 hours, 13 minutes, 36 seconds)
2025-09-12 01:48:49,800 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:48:49,811 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:53:31,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2580.96240 ± 1272.302
2025-09-12 01:53:31,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1564.4276, 3541.0068, 333.13782, 2535.0144, 492.81628, 3375.5286, 3616.6907, 4049.7766, 2758.938, 3542.2876]
2025-09-12 01:53:31,020 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:53:31,025 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 20 hours, 12 minutes, 45 seconds)
2025-09-12 02:05:31,898 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:05:31,903 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:10:10,863 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2197.77734 ± 995.530
2025-09-12 02:10:10,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1738.1964, 1362.7892, 1838.3322, 2523.0217, 3279.1436, 3900.5947, 994.2457, 2554.8896, 681.4625, 3105.0972]
2025-09-12 02:10:10,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 02:10:10,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 20 hours, 8 minutes, 36 seconds)
2025-09-12 02:22:13,153 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:22:13,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:26:53,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2805.01489 ± 901.659
2025-09-12 02:26:53,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1937.5287, 3763.673, 3764.906, 2720.3413, 3613.188, 3178.1514, 1688.9974, 2052.3518, 1464.1597, 3866.852]
2025-09-12 02:26:53,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 02:26:53,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (2805.01) for latency ExtremeClogL1U23
2025-09-12 02:26:53,317 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 19 hours, 50 minutes, 33 seconds)
2025-09-12 02:38:55,419 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:38:55,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:43:40,359 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 3171.58325 ± 1292.641
2025-09-12 02:43:40,371 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [3978.4739, 1640.5371, 673.15894, 4132.421, 1415.1816, 3971.938, 4185.305, 3590.384, 4179.463, 3948.9712]
2025-09-12 02:43:40,371 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 02:43:40,371 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (3171.58) for latency ExtremeClogL1U23
2025-09-12 02:43:40,377 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 19 hours, 33 minutes, 33 seconds)
2025-09-12 02:55:44,837 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:55:44,842 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:00:23,532 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 3576.55078 ± 607.456
2025-09-12 03:00:23,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2521.606, 4068.5066, 3579.9495, 4480.7266, 3507.5664, 3987.789, 3848.8638, 3718.355, 3593.2434, 2458.9043]
2025-09-12 03:00:23,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:00:23,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (3576.55) for latency ExtremeClogL1U23
2025-09-12 03:00:23,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 19 hours, 15 minutes, 1 second)
2025-09-12 03:12:27,035 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:12:27,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:17:05,617 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2380.46240 ± 1383.949
2025-09-12 03:17:05,624 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [3067.0923, 413.66425, 2654.1274, 3659.2737, 1370.8378, 432.17783, 4289.7236, 3124.1958, 3826.6328, 966.89856]
2025-09-12 03:17:05,624 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:17:05,631 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 18 hours, 56 minutes, 38 seconds)
2025-09-12 03:29:10,079 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:29:10,085 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:33:54,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 3213.87354 ± 1419.822
2025-09-12 03:33:54,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [3694.6848, -50.76765, 3857.6768, 4072.1465, 3025.4834, 4004.3474, 1042.5566, 4300.648, 4152.9688, 4038.9927]
2025-09-12 03:33:54,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:33:54,768 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 18 hours, 42 minutes)
2025-09-12 03:46:00,715 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:46:00,720 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:50:42,531 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2637.09131 ± 1427.676
2025-09-12 03:50:42,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [3830.848, 3967.1812, 1202.9269, 4054.8901, 4237.197, 931.9557, 1767.1984, 1802.4475, 4013.9253, 562.3439]
2025-09-12 03:50:42,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:50:42,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 18 hours, 26 minutes, 26 seconds)
2025-09-12 04:02:50,924 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:02:50,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:07:32,411 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2421.78223 ± 1655.696
2025-09-12 04:07:32,413 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [3077.9182, 4105.9688, 20.695944, 604.6862, 3842.5479, 881.7469, 303.31943, 3233.4578, 3868.4097, 4279.0703]
2025-09-12 04:07:32,413 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:07:32,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 18 hours, 10 minutes, 16 seconds)
2025-09-12 04:19:39,678 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:19:39,683 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:24:19,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 3190.96338 ± 1274.944
2025-09-12 04:24:19,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [4096.728, 4386.14, 592.7511, 2772.2356, 3769.341, 4082.4172, 3974.6965, 1031.9994, 4012.762, 3190.5635]
2025-09-12 04:24:19,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:24:19,877 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 17 hours, 54 minutes, 25 seconds)
2025-09-12 04:36:26,279 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:36:26,283 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:41:07,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 3358.07666 ± 1238.363
2025-09-12 04:41:07,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2221.5242, 4133.6973, 3477.7327, 197.20479, 4338.9404, 3756.9304, 4622.8477, 3822.404, 3982.8728, 3026.6118]
2025-09-12 04:41:07,849 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:41:07,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 17 hours, 38 minutes, 52 seconds)
2025-09-12 04:53:14,905 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:53:14,910 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:57:58,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2981.58447 ± 1163.780
2025-09-12 04:57:58,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2727.4597, 4220.081, 598.36237, 4213.86, 3886.9983, 1223.7657, 3011.6904, 2805.548, 3312.016, 3816.065]
2025-09-12 04:57:58,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:57:58,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 17 hours, 22 minutes, 19 seconds)
2025-09-12 05:10:04,415 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:10:04,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:14:46,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2701.17017 ± 1462.326
2025-09-12 05:14:46,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [3454.2012, 4058.796, -82.78797, 4253.63, 2733.81, 1390.9578, 4126.725, 2218.5715, 3958.954, 898.84503]
2025-09-12 05:14:46,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 05:14:46,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 17 hours, 5 minutes, 35 seconds)
2025-09-12 05:26:54,120 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:26:54,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:31:40,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 3461.45386 ± 1151.375
2025-09-12 05:31:40,197 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2613.0645, 1640.9921, 4012.5654, 1174.978, 4180.4673, 3782.0554, 4410.5503, 4309.9116, 3911.164, 4578.79]
2025-09-12 05:31:40,197 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 05:31:40,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 16 hours, 49 minutes, 33 seconds)
2025-09-12 05:43:49,078 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:43:49,091 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:48:33,526 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 3372.01221 ± 1358.850
2025-09-12 05:48:33,539 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [3952.7888, 790.859, 4637.955, 4213.082, 4005.5842, 4676.072, 3094.698, 3553.1853, 809.5925, 3986.3037]
2025-09-12 05:48:33,539 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 05:48:33,549 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 16 hours, 33 minutes, 53 seconds)
2025-09-12 06:00:40,805 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:00:40,810 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:05:22,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2773.01709 ± 1241.724
2025-09-12 06:05:22,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [3706.9001, 4014.99, 2025.7906, 2452.158, 4515.559, 2573.5132, 4067.4077, 2126.0957, 192.93265, 2054.8247]
2025-09-12 06:05:22,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:05:22,533 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 16 hours, 17 minutes, 14 seconds)
2025-09-12 06:17:28,527 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:17:28,534 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:22:09,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2486.77856 ± 1606.723
2025-09-12 06:22:09,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1450.1135, -165.51726, 3193.7983, 3895.5637, 4452.0103, 167.57076, 3745.629, 3414.7632, 1033.818, 3680.0356]
2025-09-12 06:22:09,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:22:09,542 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 15 hours, 59 minutes, 44 seconds)
2025-09-12 06:34:17,989 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:34:17,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:39:02,936 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 3844.65039 ± 682.965
2025-09-12 06:39:02,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [4276.726, 3556.2412, 4217.5405, 4053.6667, 1913.9469, 4187.312, 3997.2844, 4252.643, 4251.9536, 3739.19]
2025-09-12 06:39:02,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:39:02,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (3844.65) for latency ExtremeClogL1U23
2025-09-12 06:39:02,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 15 hours, 43 minutes, 52 seconds)
2025-09-12 06:51:11,509 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:51:11,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:55:54,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2908.67627 ± 1191.036
2025-09-12 06:55:54,587 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [4151.708, 3602.3584, 4245.403, 2037.0785, 594.1607, 3585.397, 3097.5654, 1215.4142, 2721.5957, 3836.0818]
2025-09-12 06:55:54,587 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:55:54,598 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 15 hours, 26 minutes, 38 seconds)
2025-09-12 07:08:01,747 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:08:01,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:12:43,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2090.02344 ± 1634.479
2025-09-12 07:12:43,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [3963.858, 437.2443, 249.64754, 3243.572, 1476.9691, 1114.5667, 1434.7109, 218.42853, 4406.31, 4354.928]
2025-09-12 07:12:43,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:12:43,032 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 15 hours, 8 minutes, 54 seconds)
2025-09-12 07:24:49,802 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:24:49,808 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:29:29,600 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 3859.25781 ± 665.899
2025-09-12 07:29:29,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [4713.8594, 4169.5693, 2668.5737, 3903.6216, 4428.4014, 3759.2705, 4160.5503, 2600.5854, 3908.8042, 4279.3438]
2025-09-12 07:29:29,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:29:29,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (3859.26) for latency ExtremeClogL1U23
2025-09-12 07:29:29,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 14 hours, 51 minutes, 39 seconds)
2025-09-12 07:41:36,249 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:41:36,254 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:46:15,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 3079.47510 ± 1400.203
2025-09-12 07:46:15,605 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [4150.94, 4493.6675, 4280.873, 4051.5347, 4786.2036, 1657.2633, 1190.8027, 1114.5331, 1788.9172, 3280.0127]
2025-09-12 07:46:15,605 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:46:15,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 14 hours, 34 minutes, 39 seconds)
2025-09-12 07:58:23,732 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:58:23,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:03:07,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2378.13037 ± 1703.634
2025-09-12 08:03:07,828 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1227.5598, 4355.9355, 331.79346, 4069.6519, 4598.488, 287.26007, 231.0847, 4008.535, 2047.3776, 2623.6174]
2025-09-12 08:03:07,828 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:03:07,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 14 hours, 17 minutes, 37 seconds)
2025-09-12 08:15:16,448 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:15:16,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:20:00,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2879.17236 ± 1576.450
2025-09-12 08:20:00,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [256.6347, 3357.2366, 2023.1996, 4593.576, 4399.057, 4352.5786, 3376.3098, 2206.173, 148.7661, 4078.1912]
2025-09-12 08:20:00,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:20:00,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 14 hours, 1 minute, 2 seconds)
2025-09-12 08:32:09,302 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:32:09,309 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:36:53,985 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 3418.02783 ± 1561.472
2025-09-12 08:36:53,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [4169.4707, 186.3213, 2071.9646, 4359.8296, 4197.191, 4374.623, 4770.5933, 1143.0981, 4490.784, 4416.4033]
2025-09-12 08:36:53,986 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:36:53,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 13 hours, 44 minutes, 59 seconds)
2025-09-12 08:49:03,019 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:49:03,024 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:53:47,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2961.81128 ± 1540.683
2025-09-12 08:53:47,382 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2011.3457, 3869.362, 2039.6283, 4126.905, 578.58966, 4600.597, 3850.609, 4425.7344, 204.45752, 3910.8828]
2025-09-12 08:53:47,382 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:53:47,387 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 13 hours, 29 minutes, 14 seconds)
2025-09-12 09:05:58,792 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:05:58,799 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:10:39,831 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2914.61108 ± 1471.229
2025-09-12 09:10:39,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1752.3318, 88.31646, 4024.2832, 3753.6533, 1812.4047, 4381.837, 4488.652, 3697.3706, 1217.0751, 3930.1895]
2025-09-12 09:10:39,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:10:39,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 13 hours, 13 minutes, 23 seconds)
2025-09-12 09:22:49,454 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:22:49,459 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:27:35,036 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 3082.70337 ± 1453.630
2025-09-12 09:27:35,037 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1111.1461, 4447.2827, 3796.1707, 3642.3494, 4215.7207, 4516.6914, 4317.055, 207.7522, 2570.9946, 2001.8707]
2025-09-12 09:27:35,037 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:27:35,044 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 12 hours, 56 minutes, 58 seconds)
2025-09-12 09:39:44,321 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:39:44,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:44:29,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2705.99219 ± 1851.533
2025-09-12 09:44:29,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [40.529808, 4508.909, 4524.251, 767.8295, 711.6989, 357.82214, 4349.952, 4247.762, 3826.5996, 3724.5667]
2025-09-12 09:44:29,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:44:29,141 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 12 hours, 40 minutes, 14 seconds)
2025-09-12 09:56:38,655 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:56:38,661 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:01:19,209 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 3686.95459 ± 1061.448
2025-09-12 10:01:19,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [3907.0403, 4395.796, 4502.477, 4240.291, 1252.0718, 4638.861, 2579.503, 4595.2983, 3957.3354, 2800.8743]
2025-09-12 10:01:19,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:01:19,216 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 12 hours, 22 minutes, 53 seconds)
2025-09-12 10:13:28,373 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:13:28,379 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:18:10,303 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 3147.26221 ± 1186.407
2025-09-12 10:18:10,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [3411.6196, 3602.3665, 1356.466, 4263.9087, 4023.8337, 3935.352, 3768.7773, 2550.8032, 3969.1387, 590.3552]
2025-09-12 10:18:10,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:18:10,315 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 12 hours, 5 minutes, 41 seconds)
2025-09-12 10:30:21,705 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:30:21,712 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:35:03,245 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 3566.85742 ± 1391.743
2025-09-12 10:35:03,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [4784.2305, 1839.7954, 4625.174, 156.5053, 4529.1187, 4046.5598, 3332.114, 4260.295, 4060.888, 4033.8936]
2025-09-12 10:35:03,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:35:03,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 11 hours, 48 minutes, 52 seconds)
2025-09-12 10:47:12,776 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:47:12,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:51:53,231 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 3262.58350 ± 1377.900
2025-09-12 10:51:53,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [4145.3447, 4376.7495, 3418.7642, 2399.8606, 529.9787, 2686.5884, 4537.8735, 4416.878, 4693.1274, 1420.6678]
2025-09-12 10:51:53,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:51:53,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 11 hours, 31 minutes, 17 seconds)
2025-09-12 11:04:03,483 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:04:03,489 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:08:46,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 3509.90869 ± 1130.666
2025-09-12 11:08:46,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [4188.7153, 4417.995, 2973.924, 2874.7302, 4355.465, 2736.001, 4259.113, 4530.143, 746.68726, 4016.3135]
2025-09-12 11:08:46,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:08:46,153 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 11 hours, 14 minutes, 16 seconds)
2025-09-12 11:20:57,432 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:20:57,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:25:39,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 3622.58276 ± 1530.274
2025-09-12 11:25:39,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [4299.227, 263.75317, 4921.19, 4439.3403, 3999.1292, 4701.0425, 4389.6846, 1013.93396, 3806.4036, 4392.1216]
2025-09-12 11:25:39,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:25:39,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 10 hours, 57 minutes, 51 seconds)
2025-09-12 11:37:50,702 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:37:50,708 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:42:32,519 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 3311.75464 ± 1317.455
2025-09-12 11:42:32,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [3990.0059, 4668.641, 1033.2058, 3413.0542, 4262.1963, 3497.328, 553.29114, 3665.3252, 4322.458, 3712.0425]
2025-09-12 11:42:32,520 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:42:32,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 10 hours, 41 minutes, 12 seconds)
2025-09-12 11:54:44,313 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:54:44,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:59:27,425 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 3139.44800 ± 1633.381
2025-09-12 11:59:27,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2398.4456, 4642.377, 1886.5139, 4343.258, 4425.001, 219.36717, 572.395, 4298.713, 4190.2686, 4418.14]
2025-09-12 11:59:27,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:59:27,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 10 hours, 24 minutes, 34 seconds)
2025-09-12 12:11:37,689 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:11:37,694 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:16:20,325 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2396.20117 ± 1727.038
2025-09-12 12:16:20,341 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [3941.1355, 4616.518, 1385.2744, 4789.1377, 4267.382, 546.34204, 1530.1821, 99.15115, 760.5628, 2026.3282]
2025-09-12 12:16:20,341 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:16:20,353 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 10 hours, 8 minutes, 3 seconds)
2025-09-12 12:28:31,196 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:28:31,202 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:33:13,227 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2922.11426 ± 1594.372
2025-09-12 12:33:13,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [5114.24, 4341.027, 1964.8341, 802.31665, 4213.451, 3955.785, 2025.4563, 2167.3167, 274.73672, 4361.979]
2025-09-12 12:33:13,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:33:13,263 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 9 hours, 51 minutes, 9 seconds)
2025-09-12 12:45:25,298 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:45:25,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:50:11,542 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2778.60791 ± 1566.309
2025-09-12 12:50:11,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [203.90219, 2564.4912, 4241.775, 1917.0834, 334.65067, 4286.034, 2823.4355, 4766.6274, 4368.2417, 2279.8354]
2025-09-12 12:50:11,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:50:11,553 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 9 hours, 34 minutes, 48 seconds)
2025-09-12 13:02:23,888 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:02:23,893 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:07:04,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2521.78076 ± 1682.133
2025-09-12 13:07:04,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2787.3687, 461.1164, 1740.2848, 4156.2495, 441.31763, 2119.6602, 4045.4634, 4817.3735, 258.40338, 4390.569]
2025-09-12 13:07:04,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:07:04,798 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 9 hours, 17 minutes, 56 seconds)
2025-09-12 13:19:16,629 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:19:16,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:23:59,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2489.18872 ± 1316.890
2025-09-12 13:23:59,383 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1824.5234, -61.082638, 3595.4055, 1404.6335, 3920.3027, 1331.6913, 2743.725, 3460.5095, 2295.4788, 4376.698]
2025-09-12 13:23:59,383 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:23:59,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 9 hours, 1 minute)
2025-09-12 13:36:11,932 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:36:11,938 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:40:57,970 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 3976.42188 ± 1385.951
2025-09-12 13:40:57,971 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [3883.596, -42.586838, 4546.129, 3965.0815, 4935.315, 4034.9277, 4616.3926, 4421.934, 4966.113, 4437.318]
2025-09-12 13:40:57,971 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:40:57,971 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (3976.42) for latency ExtremeClogL1U23
2025-09-12 13:40:57,978 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 8 hours, 44 minutes, 41 seconds)
2025-09-12 13:53:11,291 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:53:11,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:57:53,659 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2864.94458 ± 1591.876
2025-09-12 13:57:53,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [4147.547, 1925.4054, 4401.0186, 4723.076, 506.2276, 4546.782, 3812.0396, 1823.807, 2437.2417, 326.30106]
2025-09-12 13:57:53,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:57:53,682 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 8 hours, 28 minutes, 2 seconds)
2025-09-12 14:10:06,205 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:10:06,211 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:14:48,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 3505.89917 ± 1464.674
2025-09-12 14:14:48,926 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [968.9099, 4612.8994, 4871.6777, 1103.5669, 4310.79, 4186.579, 4671.2725, 3982.5715, 4448.0366, 1902.6877]
2025-09-12 14:14:48,926 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:14:48,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 8 hours, 10 minutes, 48 seconds)
2025-09-12 14:27:00,640 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:27:00,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:31:42,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2910.82251 ± 1989.267
2025-09-12 14:31:42,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [4120.475, 195.56514, 4477.6997, 4496.829, 1606.3536, 4189.5864, 5017.2866, 4650.252, 274.9263, 79.25292]
2025-09-12 14:31:42,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:31:42,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 7 hours, 53 minutes, 53 seconds)
2025-09-12 14:43:54,939 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:43:54,948 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:48:38,362 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 3099.24072 ± 1317.373
2025-09-12 14:48:38,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2375.7532, 4655.357, 1604.8214, 2082.1626, 2632.662, 4089.7998, 2254.3335, 4958.7485, 1472.0856, 4866.685]
2025-09-12 14:48:38,364 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:48:38,378 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 7 hours, 37 minutes, 6 seconds)
2025-09-12 15:00:51,923 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:00:51,929 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:05:35,329 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 3308.55200 ± 1490.642
2025-09-12 15:05:35,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1654.7429, 891.3682, 1051.2019, 4465.2627, 2825.5122, 3831.599, 4793.3916, 4730.2246, 4433.356, 4408.861]
2025-09-12 15:05:35,330 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:05:35,341 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 7 hours, 20 minutes, 2 seconds)
2025-09-12 15:17:48,006 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:17:48,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:22:30,108 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2895.14893 ± 1773.303
2025-09-12 15:22:30,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [4745.2305, 1042.1553, 4152.7896, 2885.3342, 4497.47, 218.81323, 402.40338, 1812.349, 4816.1777, 4378.7646]
2025-09-12 15:22:30,109 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:22:30,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 7 hours, 3 minutes, 2 seconds)
2025-09-12 15:34:40,705 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:34:40,712 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:39:24,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2836.05420 ± 1733.648
2025-09-12 15:39:24,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [4334.976, 207.16716, 4793.9014, 2766.5596, 3714.3728, 282.3906, 509.68796, 3172.803, 4195.988, 4382.6978]
2025-09-12 15:39:24,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:39:24,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 6 hours, 46 minutes, 1 second)
2025-09-12 15:51:36,479 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:51:36,487 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:56:19,114 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2698.15259 ± 1616.481
2025-09-12 15:56:19,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [4506.3364, 4342.302, 3515.716, -17.105951, 2706.8992, 1118.5419, 4350.122, 2052.3633, 473.28992, 3933.06]
2025-09-12 15:56:19,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:56:19,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 6 hours, 29 minutes, 13 seconds)
2025-09-12 16:08:31,960 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:08:31,966 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:13:14,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2936.17041 ± 1459.403
2025-09-12 16:13:14,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1650.6416, 1659.5166, 4511.137, 4141.6807, 2840.6526, 895.8337, 4354.778, 3909.8416, 849.876, 4547.745]
2025-09-12 16:13:14,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:13:14,416 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 6 hours, 12 minutes, 14 seconds)
2025-09-12 16:25:28,417 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:25:28,424 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:30:11,345 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 3502.54956 ± 1569.274
2025-09-12 16:30:11,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [4990.593, 3835.4014, 4540.2783, 794.01666, 4529.184, 1562.6918, 1136.6979, 4406.274, 4916.188, 4314.1724]
2025-09-12 16:30:11,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:30:11,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 5 hours, 55 minutes, 19 seconds)
2025-09-12 16:42:23,335 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:42:23,340 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:47:09,894 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 3704.21143 ± 917.427
2025-09-12 16:47:09,895 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [1743.9337, 3195.4695, 2690.627, 4420.251, 3415.6614, 3688.9067, 4758.1343, 4475.84, 4002.1062, 4651.1836]
2025-09-12 16:47:09,895 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:47:09,905 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 5 hours, 38 minutes, 39 seconds)
2025-09-12 16:59:22,907 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:59:22,913 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:04:06,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2650.81494 ± 1602.682
2025-09-12 17:04:06,173 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [3208.984, 1916.4797, 4295.0005, 882.63837, 751.6805, 4797.663, 1486.8365, 4435.49, 621.9421, 4111.4355]
2025-09-12 17:04:06,173 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:04:06,183 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 5 hours, 21 minutes, 51 seconds)
2025-09-12 17:16:19,648 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:16:19,661 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:21:01,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2764.34424 ± 1935.649
2025-09-12 17:21:01,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [4869.5786, 4906.0273, 424.8249, 651.2851, 2014.9415, 1230.4775, 152.86525, 4270.57, 4716.71, 4406.161]
2025-09-12 17:21:01,817 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:21:01,824 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 5 hours, 4 minutes, 57 seconds)
2025-09-12 17:33:13,092 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:33:13,097 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:37:58,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 3394.43994 ± 1585.177
2025-09-12 17:37:58,960 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [4988.791, 4606.199, 1540.3291, 4752.8745, 3299.6785, 691.5066, 4103.611, 1063.3385, 4872.3345, 4025.7349]
2025-09-12 17:37:58,960 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:37:58,968 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 4 hours, 48 minutes, 7 seconds)
2025-09-12 17:50:13,342 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:50:13,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:54:59,824 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2400.34937 ± 1785.148
2025-09-12 17:54:59,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [4446.9165, 204.42307, 505.72458, 1212.669, 4193.4004, 4820.634, 312.63763, 4374.6553, 1691.7762, 2240.6575]
2025-09-12 17:54:59,825 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:54:59,833 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 4 hours, 31 minutes, 23 seconds)
2025-09-12 18:07:13,087 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:07:13,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:11:56,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 3835.82959 ± 1025.509
2025-09-12 18:11:56,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [3198.0728, 1878.8927, 4995.68, 4254.3843, 4208.169, 4714.6714, 4001.9653, 4727.2407, 4236.391, 2142.8284]
2025-09-12 18:11:56,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:11:56,934 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 4 hours, 14 minutes, 21 seconds)
2025-09-12 18:24:09,369 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:24:09,375 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:28:51,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 3334.30322 ± 1661.793
2025-09-12 18:28:51,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [662.92664, 2510.2424, 1613.231, 769.7235, 4596.7646, 4373.035, 4693.975, 4599.293, 4939.7163, 4584.126]
2025-09-12 18:28:51,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:28:51,524 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 57 minutes, 18 seconds)
2025-09-12 18:41:05,375 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:41:05,380 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:45:47,715 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 3858.64502 ± 938.339
2025-09-12 18:45:47,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [2451.294, 3065.7888, 4935.5864, 2921.4817, 4859.683, 4723.701, 2587.0881, 4334.839, 4161.532, 4545.4517]
2025-09-12 18:45:47,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:45:47,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 40 minutes, 23 seconds)
2025-09-12 18:57:59,725 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:57:59,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 19:02:42,305 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2878.99438 ± 1728.206
2025-09-12 19:02:42,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [3606.172, -80.203415, 49.001827, 4618.075, 4211.7065, 4281.196, 4589.5396, 1396.3381, 3614.3496, 2503.7668]
2025-09-12 19:02:42,307 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:02:42,322 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 3 hours, 23 minutes, 20 seconds)
2025-09-12 19:14:56,854 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:14:56,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 19:19:38,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 3525.42310 ± 1634.533
2025-09-12 19:19:38,283 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [4246.8843, 3230.9138, 2273.856, 4962.86, 4827.3237, 1607.395, 4522.508, -32.76226, 4524.447, 5090.8037]
2025-09-12 19:19:38,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:19:38,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 3 hours, 6 minutes, 12 seconds)
2025-09-12 19:31:51,420 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:31:51,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 19:36:32,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2975.13892 ± 1810.389
2025-09-12 19:36:32,138 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [4227.792, 4392.506, 830.60535, 4133.842, 1252.8058, 4886.6846, 4153.0195, 1185.9283, -27.589912, 4715.7954]
2025-09-12 19:36:32,138 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:36:32,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 49 minutes, 10 seconds)
2025-09-12 19:48:46,349 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:48:46,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 19:53:33,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 3911.25269 ± 1111.381
2025-09-12 19:53:33,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [4313.046, 4508.5933, 938.7824, 3179.418, 4618.3794, 3568.1658, 4759.3237, 4062.1018, 4264.7285, 4899.9893]
2025-09-12 19:53:33,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:53:33,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 32 minutes, 27 seconds)
2025-09-12 20:05:46,336 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:05:46,343 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 20:10:29,593 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 2167.44019 ± 1903.105
2025-09-12 20:10:29,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [174.06671, 1824.1855, 4060.1033, 965.7852, 838.50494, 4447.7803, 4181.263, 24.83826, 217.82935, 4940.046]
2025-09-12 20:10:29,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:10:29,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 2 hours, 15 minutes, 30 seconds)
2025-09-12 20:22:43,776 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:22:43,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 20:27:26,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 4446.72949 ± 627.703
2025-09-12 20:27:26,592 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [4419.1304, 4253.712, 2753.9841, 4725.9136, 4762.7363, 5039.898, 4748.2695, 5008.9565, 4164.7363, 4589.9565]
2025-09-12 20:27:26,592 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:27:26,592 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1226 [INFO]: New best (4446.73) for latency ExtremeClogL1U23
2025-09-12 20:27:26,605 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 58 minutes, 37 seconds)
2025-09-12 20:39:39,620 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:39:39,629 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 20:44:23,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 3449.52075 ± 1584.222
2025-09-12 20:44:23,195 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [4499.7573, 551.2797, 3745.384, 463.32227, 2754.7014, 4402.3984, 4283.465, 4113.013, 4531.653, 5150.234]
2025-09-12 20:44:23,195 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:44:23,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 41 minutes, 41 seconds)
2025-09-12 20:56:35,663 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:56:35,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 21:01:18,762 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 3208.92505 ± 1897.484
2025-09-12 21:01:18,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [4292.4146, 447.26758, 696.1957, 4208.0117, 4818.3745, -52.653687, 5136.779, 3915.1804, 4370.714, 4256.966]
2025-09-12 21:01:18,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:01:18,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 24 minutes, 46 seconds)
2025-09-12 21:13:32,745 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:13:32,753 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 21:18:16,603 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 3237.20288 ± 1304.430
2025-09-12 21:18:16,605 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [4605.748, 2899.4949, 877.23206, 3194.2212, 4682.971, 4347.2314, 1695.6622, 3892.826, 4369.763, 1806.8796]
2025-09-12 21:18:16,605 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:18:16,616 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 7 minutes, 46 seconds)
2025-09-12 21:30:30,220 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:30:30,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 21:35:15,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 3470.56641 ± 1401.604
2025-09-12 21:35:15,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [4881.1353, 828.4274, 3937.4414, 4403.6323, 5048.1245, 2627.3357, 1478.239, 4289.7266, 4530.2065, 2681.396]
2025-09-12 21:35:15,130 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:35:15,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 50 minutes, 51 seconds)
2025-09-12 21:47:29,818 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:47:29,824 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 21:52:22,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 3621.47461 ± 1667.488
2025-09-12 21:52:22,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [291.1128, 5100.7446, 5104.176, 2129.7656, 4444.2383, 5064.376, 1200.723, 4424.949, 4454.6704, 3999.9915]
2025-09-12 21:52:22,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:52:22,663 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 33 minutes, 58 seconds)
2025-09-12 22:04:50,532 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:04:50,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 22:09:45,678 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 3200.06104 ± 1698.574
2025-09-12 22:09:45,679 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [305.90555, 4437.5605, 4178.3076, 4528.439, 4768.3154, 119.78598, 2648.9287, 4418.343, 2186.6619, 4408.3633]
2025-09-12 22:09:45,679 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:09:45,690 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 17 minutes, 4 seconds)
2025-09-12 22:22:13,973 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 22:22:13,979 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 22:27:03,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1221 [DEBUG]: Total Reward: 3280.75537 ± 1381.735
2025-09-12 22:27:03,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1222 [DEBUG]: All rewards: [507.17792, 4305.1606, 4195.6416, 2921.4104, 1855.2377, 4326.417, 4574.1494, 1690.537, 4607.98, 3823.8406]
2025-09-12 22:27:03,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 22:27:03,972 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc20-halfcheetah):1251 [DEBUG]: Training session finished
