2025-09-11 18:18:47,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1108 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noiseperc0-halfcheetah/ExtremeClogL1U23-mbpac_memdelay
2025-09-11 18:18:47,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1109 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noiseperc0-halfcheetah/ExtremeClogL1U23-mbpac_memdelay
2025-09-11 18:18:47,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1110 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x14db78619550>}
2025-09-11 18:18:47,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1111 [DEBUG]: using device: cuda
2025-09-11 18:18:47,905 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1133 [INFO]: Creating new trainer
2025-09-11 18:18:47,922 baseline-mbpac-noiseperc0-halfcheetah:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-09-11 18:18:47,922 baseline-mbpac-noiseperc0-halfcheetah:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-09-11 18:18:47,930 baseline-mbpac-noiseperc0-halfcheetah:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=6, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-09-11 18:18:48,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1194 [DEBUG]: Starting training session...
2025-09-11 18:18:48,840 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 1/100
2025-09-11 18:29:14,953 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 18:29:14,964 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 18:33:45,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: -359.71008 ± 31.316
2025-09-11 18:33:45,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [-379.1601, -328.52792, -359.9151, -373.16614, -342.479, -373.35397, -313.72363, -426.80267, -327.77423, -372.19797]
2025-09-11 18:33:45,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 18:33:45,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (-359.71) for latency ExtremeClogL1U23
2025-09-11 18:33:45,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 2/100 (estimated time remaining: 24 hours, 39 minutes)
2025-09-11 18:45:19,074 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 18:45:19,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 18:49:48,685 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: -300.69794 ± 9.898
2025-09-11 18:49:48,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [-300.18234, -307.78305, -293.91772, -321.52127, -291.69058, -294.4895, -294.7453, -309.15262, -306.54535, -286.95135]
2025-09-11 18:49:48,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 18:49:48,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (-300.70) for latency ExtremeClogL1U23
2025-09-11 18:49:48,709 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 3/100 (estimated time remaining: 25 hours, 18 minutes, 53 seconds)
2025-09-11 19:01:24,297 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:01:24,319 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:05:55,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1308.67358 ± 844.368
2025-09-11 19:05:55,566 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2365.733, 1028.1365, 313.50067, 1165.1394, 228.92224, 615.0773, 580.033, 2288.1433, 2182.968, 2319.0823]
2025-09-11 19:05:55,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 19:05:55,567 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (1308.67) for latency ExtremeClogL1U23
2025-09-11 19:05:55,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 4/100 (estimated time remaining: 25 hours, 23 minutes, 17 seconds)
2025-09-11 19:17:30,531 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:17:30,542 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:22:02,653 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2070.90576 ± 1122.312
2025-09-11 19:22:02,656 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3046.6863, 433.8122, 3111.0483, 639.3343, 713.5992, 3104.786, 2722.216, 2815.8408, 3046.2087, 1075.5271]
2025-09-11 19:22:02,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 19:22:02,657 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (2070.91) for latency ExtremeClogL1U23
2025-09-11 19:22:02,724 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 5/100 (estimated time remaining: 25 hours, 17 minutes, 33 seconds)
2025-09-11 19:33:38,435 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:33:38,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:38:09,905 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2941.09619 ± 1039.317
2025-09-11 19:38:09,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3592.682, 3190.9143, 3315.469, 3605.989, 135.78838, 3066.7856, 3254.5872, 3780.216, 2041.7809, 3426.749]
2025-09-11 19:38:09,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 19:38:09,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (2941.10) for latency ExtremeClogL1U23
2025-09-11 19:38:09,913 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 6/100 (estimated time remaining: 25 hours, 7 minutes, 40 seconds)
2025-09-11 19:49:47,030 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 19:49:47,036 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 19:54:23,036 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2662.95630 ± 1105.213
2025-09-11 19:54:23,036 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2089.8809, 1412.33, 3161.8562, 3967.3938, 4350.4, 3266.9602, 3227.0437, 780.89636, 2836.4702, 1536.3302]
2025-09-11 19:54:23,036 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 19:54:23,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 7/100 (estimated time remaining: 25 hours, 15 minutes, 51 seconds)
2025-09-11 20:05:59,937 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:05:59,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:10:34,823 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2985.84180 ± 1722.310
2025-09-11 20:10:34,824 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4420.6597, 4589.2153, 1909.1998, 4539.2046, 3292.4397, 4325.806, 2852.8943, 4031.0955, 40.5192, -142.61813]
2025-09-11 20:10:34,824 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 20:10:34,824 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (2985.84) for latency ExtremeClogL1U23
2025-09-11 20:10:34,827 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 8/100 (estimated time remaining: 25 hours, 2 minutes, 17 seconds)
2025-09-11 20:22:13,578 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:22:13,582 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:26:49,662 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2905.22266 ± 1629.142
2025-09-11 20:26:49,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2168.634, 989.0487, 4684.3604, 4335.9487, 4550.2563, 4623.8315, 1083.8756, 613.07275, 1804.7594, 4198.439]
2025-09-11 20:26:49,667 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 20:26:49,701 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 9/100 (estimated time remaining: 24 hours, 48 minutes, 35 seconds)
2025-09-11 20:38:16,373 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:38:16,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:42:40,830 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3819.72388 ± 1262.262
2025-09-11 20:42:40,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4882.1665, 4433.099, 4536.024, 4360.912, 4665.7837, 3381.3625, 3951.2021, 4834.3545, 2470.6575, 681.6743]
2025-09-11 20:42:40,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 20:42:40,832 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (3819.72) for latency ExtremeClogL1U23
2025-09-11 20:42:40,839 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 10/100 (estimated time remaining: 24 hours, 27 minutes, 33 seconds)
2025-09-11 20:54:00,421 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 20:54:00,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 20:58:23,481 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2651.97827 ± 1764.191
2025-09-11 20:58:23,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4871.3813, 792.0132, 1622.6403, 860.402, 620.9894, 4774.843, 782.59155, 4463.6865, 3625.8203, 4105.4155]
2025-09-11 20:58:23,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 20:58:23,493 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 11/100 (estimated time remaining: 24 hours, 4 minutes, 4 seconds)
2025-09-11 21:09:43,392 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:09:43,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:14:06,747 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2531.09863 ± 1852.923
2025-09-11 21:14:06,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4096.228, 4879.981, 4933.8823, 4698.2026, 840.14465, 2039.622, 864.974, 765.66736, 2276.154, -83.868965]
2025-09-11 21:14:06,752 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 21:14:06,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 12/100 (estimated time remaining: 23 hours, 39 minutes, 10 seconds)
2025-09-11 21:25:26,918 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:25:26,939 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:29:52,909 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3566.89771 ± 1620.649
2025-09-11 21:29:52,913 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3137.1836, 1647.6775, 4935.474, 3733.3638, 5003.0874, 5087.0527, 4830.911, 4988.064, 1805.8801, 500.28482]
2025-09-11 21:29:52,913 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 21:29:52,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 13/100 (estimated time remaining: 23 hours, 15 minutes, 42 seconds)
2025-09-11 21:41:14,538 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:41:14,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 21:45:41,731 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3687.06958 ± 1557.498
2025-09-11 21:45:41,739 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5239.2744, 2284.5232, 2839.496, 3231.9373, 4839.029, 5028.841, 5300.149, 2775.3394, 395.54593, 4936.56]
2025-09-11 21:45:41,739 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 21:45:41,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 14/100 (estimated time remaining: 22 hours, 52 minutes, 17 seconds)
2025-09-11 21:57:03,572 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 21:57:03,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:01:27,793 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3719.36206 ± 1467.058
2025-09-11 22:01:27,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [478.27353, 4719.565, 4924.6533, 4685.579, 4472.08, 5267.2476, 3304.9404, 3190.0427, 1774.0917, 4377.15]
2025-09-11 22:01:27,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 22:01:27,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 15/100 (estimated time remaining: 22 hours, 35 minutes, 4 seconds)
2025-09-11 22:12:52,411 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:12:52,420 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:17:19,947 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3224.47363 ± 1709.016
2025-09-11 22:17:19,957 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4162.288, 4626.2393, 4208.929, 3523.9482, 4034.7844, 7.7494373, 4573.4434, 1460.7874, 4950.2456, 696.3226]
2025-09-11 22:17:19,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 22:17:20,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 16/100 (estimated time remaining: 22 hours, 22 minutes)
2025-09-11 22:28:43,265 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:28:43,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:33:07,769 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3113.44287 ± 1938.988
2025-09-11 22:33:07,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [164.79195, 1044.8346, -86.98911, 4637.4717, 4024.7112, 3879.4705, 5121.562, 4711.16, 2581.6987, 5055.715]
2025-09-11 22:33:07,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 22:33:07,826 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 17/100 (estimated time remaining: 22 hours, 7 minutes, 29 seconds)
2025-09-11 22:44:28,968 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 22:44:28,975 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 22:48:52,480 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3523.26050 ± 1612.690
2025-09-11 22:48:52,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4437.0796, 345.1215, 4984.9565, 1019.7504, 5052.1953, 4101.5728, 2963.071, 4715.867, 2781.9758, 4831.012]
2025-09-11 22:48:52,487 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 22:48:52,491 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 18/100 (estimated time remaining: 21 hours, 51 minutes, 16 seconds)
2025-09-11 23:00:20,803 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:00:20,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:04:46,245 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3779.90820 ± 2005.881
2025-09-11 23:04:46,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [192.46246, 1501.7078, 4333.777, 5291.274, 5269.14, 694.56824, 5460.4536, 4515.81, 5413.624, 5126.265]
2025-09-11 23:04:46,246 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 23:04:46,251 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 19/100 (estimated time remaining: 21 hours, 36 minutes, 49 seconds)
2025-09-11 23:16:08,347 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:16:08,354 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:20:33,204 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3635.93896 ± 1932.813
2025-09-11 23:20:33,207 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5284.2334, 5138.9805, 1470.7921, 4721.0107, 1604.8483, 109.019356, 5420.4644, 5120.114, 5251.7583, 2238.172]
2025-09-11 23:20:33,207 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 23:20:33,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 20/100 (estimated time remaining: 21 hours, 21 minutes, 15 seconds)
2025-09-11 23:32:02,082 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:32:02,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:36:22,553 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3651.88745 ± 1531.638
2025-09-11 23:36:22,563 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5379.4917, 2944.1702, 5314.8413, 2742.9973, 3105.248, 1094.8915, 5123.3, 1467.8961, 5185.285, 4160.755]
2025-09-11 23:36:22,563 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 23:36:22,584 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 21/100 (estimated time remaining: 21 hours, 4 minutes, 41 seconds)
2025-09-11 23:47:29,110 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-11 23:47:29,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-11 23:51:49,587 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4103.66797 ± 1349.956
2025-09-11 23:51:49,592 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4509.5176, 3716.3975, 3347.5076, 3432.3933, 5181.0234, 5252.0796, 767.5247, 4169.0327, 4968.437, 5692.7637]
2025-09-11 23:51:49,592 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-11 23:51:49,592 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (4103.67) for latency ExtremeClogL1U23
2025-09-11 23:51:49,624 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 22/100 (estimated time remaining: 20 hours, 43 minutes, 24 seconds)
2025-09-12 00:02:57,774 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:02:57,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:07:18,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4955.40234 ± 689.488
2025-09-12 00:07:18,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4143.293, 4911.939, 5617.6323, 5589.6035, 5391.8716, 5181.1206, 5176.7803, 5223.8203, 3258.7905, 5059.17]
2025-09-12 00:07:18,238 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:07:18,239 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (4955.40) for latency ExtremeClogL1U23
2025-09-12 00:07:18,267 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 23/100 (estimated time remaining: 20 hours, 23 minutes, 30 seconds)
2025-09-12 00:18:27,296 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:18:27,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:22:45,818 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4379.84912 ± 1278.036
2025-09-12 00:22:45,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5370.1694, 3212.942, 2006.016, 5365.7256, 5462.5757, 5220.722, 5355.425, 5508.4, 3498.9375, 2797.5781]
2025-09-12 00:22:45,819 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:22:45,824 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 24/100 (estimated time remaining: 20 hours, 1 minute, 5 seconds)
2025-09-12 00:33:51,817 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:33:51,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:38:10,580 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3092.16602 ± 1581.834
2025-09-12 00:38:10,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5328.0566, 812.2758, 737.9185, 4917.91, 1939.9989, 3154.897, 2820.9626, 3925.8289, 2368.1992, 4915.6143]
2025-09-12 00:38:10,583 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:38:10,592 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 25/100 (estimated time remaining: 19 hours, 39 minutes, 52 seconds)
2025-09-12 00:49:33,218 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 00:49:33,222 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 00:54:09,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4588.35400 ± 1379.138
2025-09-12 00:54:09,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5551.7256, 5637.7485, 5194.5884, 5386.6455, 4217.634, 2897.5923, 4781.87, 5540.923, 1226.0765, 5448.7324]
2025-09-12 00:54:09,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 00:54:09,853 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 26/100 (estimated time remaining: 19 hours, 26 minutes, 49 seconds)
2025-09-12 01:06:03,825 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:06:03,830 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:10:36,865 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4091.62842 ± 1327.171
2025-09-12 01:10:36,866 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5397.405, 2416.4058, 4984.9634, 5077.676, 3159.3533, 1785.9456, 4119.1875, 5374.0664, 5607.159, 2994.1238]
2025-09-12 01:10:36,866 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:10:36,874 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 27/100 (estimated time remaining: 19 hours, 26 minutes, 3 seconds)
2025-09-12 01:22:25,811 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:22:25,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:26:59,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2384.34351 ± 2019.420
2025-09-12 01:26:59,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5380.0503, 500.9573, 1037.1775, 3217.874, -78.97882, 5267.9634, 4733.737, 2176.9402, 1439.3766, 168.3387]
2025-09-12 01:26:59,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:26:59,937 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 28/100 (estimated time remaining: 19 hours, 23 minutes, 32 seconds)
2025-09-12 01:38:51,360 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:38:51,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:43:24,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3944.24878 ± 1459.140
2025-09-12 01:43:24,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [298.40637, 4650.824, 4944.4683, 4472.9795, 4923.8057, 5043.3037, 5322.393, 3964.3682, 3003.1729, 2818.7686]
2025-09-12 01:43:24,390 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:43:24,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 29/100 (estimated time remaining: 19 hours, 21 minutes, 15 seconds)
2025-09-12 01:55:16,649 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 01:55:16,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 01:59:51,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3958.37964 ± 2031.437
2025-09-12 01:59:51,463 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2048.4832, 1815.0068, 5558.526, 5962.8203, 5346.4595, 5233.9395, 738.251, 5656.202, 1443.8945, 5780.2144]
2025-09-12 01:59:51,463 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 01:59:51,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 30/100 (estimated time remaining: 19 hours, 19 minutes, 52 seconds)
2025-09-12 02:11:45,329 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:11:45,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:16:19,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4114.23584 ± 1931.003
2025-09-12 02:16:19,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4694.28, 5273.4683, 5660.256, 5587.2856, 5401.6123, 5382.3315, 5066.8516, 2991.593, 762.9472, 321.73758]
2025-09-12 02:16:19,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 02:16:19,382 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 31/100 (estimated time remaining: 19 hours, 10 minutes, 13 seconds)
2025-09-12 02:28:13,520 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:28:13,530 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:32:48,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4637.46436 ± 1049.537
2025-09-12 02:32:48,341 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4858.8774, 2128.5767, 5379.0186, 5704.6333, 4827.0845, 3658.97, 5223.8896, 5218.1665, 5494.9956, 3880.4282]
2025-09-12 02:32:48,341 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 02:32:48,354 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 32/100 (estimated time remaining: 18 hours, 54 minutes, 14 seconds)
2025-09-12 02:44:42,892 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 02:44:42,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 02:49:16,563 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2517.46729 ± 2190.955
2025-09-12 02:49:16,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5235.4067, 5757.3403, -61.360134, 2085.5435, 172.16328, 322.6443, 5454.7397, 561.3907, 2421.353, 3225.4487]
2025-09-12 02:49:16,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 02:49:16,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 33/100 (estimated time remaining: 18 hours, 38 minutes, 58 seconds)
2025-09-12 03:01:11,416 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:01:11,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:05:46,810 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4554.64893 ± 1408.411
2025-09-12 03:05:46,820 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2006.0211, 5881.1255, 5767.295, 5260.603, 5834.31, 3208.8557, 3124.736, 5294.1377, 3248.6094, 5920.794]
2025-09-12 03:05:46,820 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:05:46,828 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 34/100 (estimated time remaining: 18 hours, 23 minutes, 48 seconds)
2025-09-12 03:17:42,496 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:17:42,501 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:22:18,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3595.87231 ± 2229.637
2025-09-12 03:22:18,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5772.2407, 5810.6357, 995.75903, 5660.5444, 738.6512, 5942.8926, 1271.5408, 5774.4995, 2227.4104, 1764.5476]
2025-09-12 03:22:18,952 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:22:18,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 35/100 (estimated time remaining: 18 hours, 8 minutes, 26 seconds)
2025-09-12 03:34:16,190 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:34:16,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:38:50,619 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3728.98120 ± 1961.262
2025-09-12 03:38:50,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4681.201, 5127.715, 3373.696, 5570.766, 3916.518, 2834.0142, 5719.1704, 5534.6733, 120.561646, 411.497]
2025-09-12 03:38:50,627 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:38:50,635 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 36/100 (estimated time remaining: 17 hours, 52 minutes, 46 seconds)
2025-09-12 03:50:48,393 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 03:50:48,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 03:55:21,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 1842.04102 ± 1746.779
2025-09-12 03:55:21,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [360.61902, 1410.7482, 4821.993, 1510.723, 468.92874, 859.7984, 1106.2015, 654.375, 1605.6012, 5621.4214]
2025-09-12 03:55:21,766 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 03:55:21,816 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 37/100 (estimated time remaining: 17 hours, 36 minutes, 44 seconds)
2025-09-12 04:07:18,640 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:07:18,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:11:53,375 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2020.60425 ± 2002.883
2025-09-12 04:11:53,382 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2058.4202, 5910.4233, 253.94643, 1749.0311, 1160.7373, -42.06758, 3276.4822, 5112.743, 647.49945, 78.82524]
2025-09-12 04:11:53,382 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:11:53,391 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 38/100 (estimated time remaining: 17 hours, 20 minutes, 55 seconds)
2025-09-12 04:23:50,614 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:23:50,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:28:25,486 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3579.43945 ± 2250.229
2025-09-12 04:28:25,487 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [390.93533, 5393.896, 5751.386, 5374.12, 399.45242, 5868.5654, 300.04245, 4694.9404, 2867.059, 4753.9946]
2025-09-12 04:28:25,487 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:28:25,533 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 39/100 (estimated time remaining: 17 hours, 4 minutes, 47 seconds)
2025-09-12 04:40:21,972 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:40:21,977 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 04:44:59,112 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3636.75049 ± 1920.455
2025-09-12 04:44:59,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2400.3472, 5455.928, 5327.127, 5824.776, 5136.516, 2908.399, 1029.6831, 1475.5533, 1163.6757, 5645.5005]
2025-09-12 04:44:59,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 04:44:59,153 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 40/100 (estimated time remaining: 16 hours, 48 minutes, 34 seconds)
2025-09-12 04:56:55,133 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 04:56:55,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:01:32,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3387.67969 ± 2144.582
2025-09-12 05:01:32,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5140.4346, 5869.921, 1314.54, 23.585049, 1311.3303, 5481.932, 1072.3495, 5833.995, 4437.0767, 3391.6345]
2025-09-12 05:01:32,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 05:01:32,665 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 41/100 (estimated time remaining: 16 hours, 32 minutes, 24 seconds)
2025-09-12 05:13:29,000 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:13:29,005 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:18:04,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5321.07129 ± 851.235
2025-09-12 05:18:04,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3035.3564, 5935.109, 5751.0894, 5415.254, 5752.8916, 5852.459, 5717.0513, 4563.0977, 5819.2764, 5369.128]
2025-09-12 05:18:04,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 05:18:04,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (5321.07) for latency ExtremeClogL1U23
2025-09-12 05:18:04,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 42/100 (estimated time remaining: 16 hours, 15 minutes, 57 seconds)
2025-09-12 05:30:00,990 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:30:00,997 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:34:37,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2609.75342 ± 1950.586
2025-09-12 05:34:37,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1060.1669, 1323.8661, 2282.5876, 588.8924, 5682.078, -3.9386108, 5959.2944, 2768.1868, 2338.7925, 4097.6094]
2025-09-12 05:34:37,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 05:34:37,299 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 43/100 (estimated time remaining: 15 hours, 59 minutes, 41 seconds)
2025-09-12 05:46:34,418 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 05:46:34,424 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 05:51:11,803 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4231.46143 ± 1776.405
2025-09-12 05:51:11,809 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1508.9893, 2518.162, 4487.082, 5501.7812, 4037.3586, 1106.1305, 5860.551, 5706.1333, 5718.381, 5870.0425]
2025-09-12 05:51:11,809 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 05:51:11,824 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 44/100 (estimated time remaining: 15 hours, 43 minutes, 35 seconds)
2025-09-12 06:03:10,308 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:03:10,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:07:45,329 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5054.35010 ± 1383.363
2025-09-12 06:07:45,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5985.205, 5961.6562, 2403.0745, 5123.2905, 4939.861, 5701.6255, 2395.6296, 6244.178, 5688.351, 6100.6274]
2025-09-12 06:07:45,338 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:07:45,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 45/100 (estimated time remaining: 15 hours, 27 minutes, 1 second)
2025-09-12 06:19:43,317 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:19:43,323 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:24:17,923 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3847.17847 ± 2178.262
2025-09-12 06:24:17,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1813.9795, 5806.988, 6238.7437, 1188.0835, 5703.2944, 5763.931, 1175.2783, 1214.0433, 3416.7297, 6150.7144]
2025-09-12 06:24:17,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:24:17,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 46/100 (estimated time remaining: 15 hours, 10 minutes, 17 seconds)
2025-09-12 06:36:15,829 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:36:15,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:40:49,746 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5338.93311 ± 869.059
2025-09-12 06:40:49,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5947.2324, 4623.432, 6075.154, 6086.702, 4351.9272, 5999.0234, 4599.868, 3690.9993, 5964.3154, 6050.68]
2025-09-12 06:40:49,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:40:49,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (5338.93) for latency ExtremeClogL1U23
2025-09-12 06:40:49,762 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 47/100 (estimated time remaining: 14 hours, 53 minutes, 47 seconds)
2025-09-12 06:52:46,411 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 06:52:46,438 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 06:57:23,259 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5379.01514 ± 1326.925
2025-09-12 06:57:23,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5936.299, 2888.241, 5415.541, 5957.002, 6080.473, 6246.055, 6362.113, 6444.7554, 2685.4814, 5774.1943]
2025-09-12 06:57:23,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 06:57:23,271 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (5379.02) for latency ExtremeClogL1U23
2025-09-12 06:57:23,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 48/100 (estimated time remaining: 14 hours, 37 minutes, 19 seconds)
2025-09-12 07:09:19,703 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:09:19,711 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:13:55,234 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4925.33691 ± 1788.595
2025-09-12 07:13:55,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [6242.266, 3205.2783, 5128.7974, 5930.2485, 5714.9043, 6125.9375, 5114.968, 177.1776, 5959.3276, 5654.459]
2025-09-12 07:13:55,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:13:55,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 49/100 (estimated time remaining: 14 hours, 20 minutes, 20 seconds)
2025-09-12 07:25:52,435 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:25:52,440 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:30:29,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4418.13672 ± 1723.499
2025-09-12 07:30:29,649 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5450.062, 2202.8962, 6256.091, 5949.2754, 1976.1462, 5852.708, 2041.4207, 4466.9336, 3637.07, 6348.7676]
2025-09-12 07:30:29,649 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:30:29,659 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 50/100 (estimated time remaining: 14 hours, 3 minutes, 55 seconds)
2025-09-12 07:42:26,990 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:42:26,996 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 07:47:03,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4238.93652 ± 1875.528
2025-09-12 07:47:03,185 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [6357.7886, 658.9778, 5885.778, 1552.7194, 4310.0156, 5988.237, 4535.118, 4319.7905, 5929.509, 2851.4365]
2025-09-12 07:47:03,185 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 07:47:03,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 51/100 (estimated time remaining: 13 hours, 47 minutes, 32 seconds)
2025-09-12 07:59:01,646 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 07:59:01,652 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:03:38,389 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4341.04932 ± 1818.564
2025-09-12 08:03:38,391 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4671.7354, 478.67035, 6230.023, 5908.549, 6086.325, 4508.119, 5966.533, 3166.7776, 4270.636, 2123.1226]
2025-09-12 08:03:38,391 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:03:38,400 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 52/100 (estimated time remaining: 13 hours, 31 minutes, 32 seconds)
2025-09-12 08:15:36,305 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:15:36,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:20:10,862 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4466.84473 ± 1947.785
2025-09-12 08:20:10,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [6052.7397, 4107.422, 6088.472, 611.83875, 6211.119, 1680.694, 6342.4194, 3415.569, 5980.726, 4177.4473]
2025-09-12 08:20:10,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:20:10,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 53/100 (estimated time remaining: 13 hours, 14 minutes, 49 seconds)
2025-09-12 08:32:08,502 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:32:08,508 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:36:45,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4826.77637 ± 1789.948
2025-09-12 08:36:45,850 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4180.9243, 4641.3047, 4929.1045, 6520.8486, 2682.2383, 5767.0654, 717.6487, 6236.582, 6165.2544, 6426.7925]
2025-09-12 08:36:45,850 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:36:45,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 54/100 (estimated time remaining: 12 hours, 58 minutes, 43 seconds)
2025-09-12 08:48:44,771 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 08:48:44,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 08:53:20,496 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4639.93164 ± 1726.441
2025-09-12 08:53:20,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2446.5222, 6116.0884, 2374.5085, 6455.475, 2950.9382, 6346.926, 6507.6294, 5474.5913, 2592.5266, 5134.1094]
2025-09-12 08:53:20,499 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 08:53:20,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 55/100 (estimated time remaining: 12 hours, 42 minutes, 11 seconds)
2025-09-12 09:05:18,876 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:05:18,882 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:09:54,464 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4262.25977 ± 2030.651
2025-09-12 09:09:54,466 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [6322.9697, 3013.3032, 5998.659, 5096.79, 5864.4272, 5907.303, 1093.5554, 2088.1006, 5947.747, 1289.7405]
2025-09-12 09:09:54,466 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:09:54,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 56/100 (estimated time remaining: 12 hours, 25 minutes, 41 seconds)
2025-09-12 09:21:54,117 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:21:54,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:26:29,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4642.14746 ± 1792.970
2025-09-12 09:26:29,637 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5398.1006, 4750.241, 6218.0547, 1270.4404, 6379.052, 2267.935, 6398.848, 2631.4917, 5187.445, 5919.8647]
2025-09-12 09:26:29,637 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:26:29,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 57/100 (estimated time remaining: 12 hours, 9 minutes, 6 seconds)
2025-09-12 09:38:29,892 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:38:29,898 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:43:04,783 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4364.27588 ± 2364.932
2025-09-12 09:43:04,783 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [490.4943, 6112.255, 2955.9697, 6486.5757, 3332.8223, 5812.6826, 6016.49, 58.276596, 6569.671, 5807.5195]
2025-09-12 09:43:04,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:43:04,789 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 58/100 (estimated time remaining: 11 hours, 52 minutes, 55 seconds)
2025-09-12 09:55:04,340 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 09:55:04,354 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 09:59:41,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3695.29102 ± 2596.708
2025-09-12 09:59:41,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [197.81065, 6011.9204, 2399.898, 6190.1616, 641.6543, 3180.1345, 6320.1274, -74.11373, 6373.4556, 5711.8613]
2025-09-12 09:59:41,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 09:59:41,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 59/100 (estimated time remaining: 11 hours, 36 minutes, 34 seconds)
2025-09-12 10:11:40,166 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:11:40,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:16:14,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4490.12744 ± 1719.378
2025-09-12 10:16:14,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5586.816, 5246.1987, 6451.4297, 5344.3296, 1905.0823, 6192.9707, 5823.35, 2555.3877, 1665.3009, 4130.4087]
2025-09-12 10:16:14,725 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:16:14,738 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 60/100 (estimated time remaining: 11 hours, 19 minutes, 48 seconds)
2025-09-12 10:28:14,813 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:28:14,820 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:32:50,829 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4503.17090 ± 2442.509
2025-09-12 10:32:50,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [400.53934, 223.50337, 6903.015, 5747.4263, 4336.3433, 6180.6836, 6468.222, 5877.6953, 6521.3896, 2372.8867]
2025-09-12 10:32:50,838 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:32:50,880 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 61/100 (estimated time remaining: 11 hours, 3 minutes, 31 seconds)
2025-09-12 10:44:50,530 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 10:44:50,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 10:49:26,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3752.67310 ± 2271.612
2025-09-12 10:49:26,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [6031.1934, -80.692245, 1581.5376, 3497.6602, 1469.8417, 2815.0679, 6326.4214, 6228.159, 3129.008, 6528.533]
2025-09-12 10:49:26,933 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 10:49:27,019 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 62/100 (estimated time remaining: 10 hours, 47 minutes, 3 seconds)
2025-09-12 11:01:26,298 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:01:26,312 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:06:05,385 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4155.32520 ± 1853.758
2025-09-12 11:06:05,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [6357.025, 3518.8992, 4324.8965, 6702.4624, 3877.0505, 67.03982, 3043.6658, 3123.9036, 5984.218, 4554.0933]
2025-09-12 11:06:05,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:06:05,454 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 63/100 (estimated time remaining: 10 hours, 30 minutes, 53 seconds)
2025-09-12 11:18:06,224 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:18:06,231 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:22:42,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4232.61719 ± 1998.443
2025-09-12 11:22:42,835 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [675.037, 5503.0166, 6158.2056, 2631.8687, 3753.3608, 3655.2878, 6107.578, 1513.9973, 5680.1753, 6647.644]
2025-09-12 11:22:42,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:22:42,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 64/100 (estimated time remaining: 10 hours, 14 minutes, 23 seconds)
2025-09-12 11:34:43,297 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:34:43,311 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:39:20,760 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4000.28125 ± 2445.172
2025-09-12 11:39:20,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [6227.589, 2278.859, 5750.181, 6785.8994, 72.94243, 5396.4736, 1027.061, 4895.0645, 6424.0117, 1144.7343]
2025-09-12 11:39:20,767 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:39:20,815 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 65/100 (estimated time remaining: 9 hours, 58 minutes, 19 seconds)
2025-09-12 11:51:21,955 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 11:51:21,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 11:55:56,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5140.78027 ± 1492.250
2025-09-12 11:55:56,746 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5630.1836, 1757.39, 6558.7983, 6464.94, 5083.493, 5633.792, 5830.406, 3998.6052, 6770.472, 3679.7197]
2025-09-12 11:55:56,746 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 11:55:56,818 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 66/100 (estimated time remaining: 9 hours, 41 minutes, 41 seconds)
2025-09-12 12:07:58,065 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:07:58,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:12:34,205 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2671.82568 ± 1998.639
2025-09-12 12:12:34,215 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5371.8516, 1536.4285, 614.404, 2137.5522, 790.19604, 697.6823, 3570.8682, 5826.268, 5124.5684, 1048.4385]
2025-09-12 12:12:34,215 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:12:34,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 67/100 (estimated time remaining: 9 hours, 25 minutes, 13 seconds)
2025-09-12 12:24:34,470 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:24:34,512 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:29:12,414 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4587.52295 ± 2078.678
2025-09-12 12:29:12,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5951.351, 4198.967, 6328.784, 3265.938, 6578.9097, 1739.8464, 4532.2935, 437.78207, 5973.327, 6868.028]
2025-09-12 12:29:12,426 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:29:12,491 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 68/100 (estimated time remaining: 9 hours, 8 minutes, 34 seconds)
2025-09-12 12:41:12,322 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:41:12,342 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 12:45:48,651 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3922.26245 ± 1994.039
2025-09-12 12:45:48,659 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [537.2386, 1663.3188, 6728.4316, 4963.3423, 2495.6638, 4394.466, 6690.7935, 2231.6084, 4583.967, 4933.796]
2025-09-12 12:45:48,659 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 12:45:48,712 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 69/100 (estimated time remaining: 8 hours, 51 minutes, 49 seconds)
2025-09-12 12:57:50,653 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 12:57:50,661 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:02:26,764 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4930.03369 ± 2024.854
2025-09-12 13:02:26,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [6517.894, 6868.942, 6178.3564, 6243.225, 1874.6832, 1669.0764, 6751.095, 4595.5317, 2397.1135, 6204.4165]
2025-09-12 13:02:26,780 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:02:26,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 70/100 (estimated time remaining: 8 hours, 35 minutes, 13 seconds)
2025-09-12 13:14:27,874 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:14:27,879 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:19:03,923 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4010.42725 ± 2137.937
2025-09-12 13:19:03,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [6242.9683, 3775.7092, 5623.94, 5892.79, -42.30176, 6233.6914, 718.925, 3476.3757, 5096.5645, 3085.6077]
2025-09-12 13:19:03,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:19:03,997 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 71/100 (estimated time remaining: 8 hours, 18 minutes, 43 seconds)
2025-09-12 13:31:03,892 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:31:03,902 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:35:42,348 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4347.45605 ± 1828.203
2025-09-12 13:35:42,371 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [6649.74, 6612.611, 2344.2898, 1881.038, 3914.278, 2530.0754, 3095.8127, 6777.124, 5689.353, 3980.2393]
2025-09-12 13:35:42,372 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:35:42,413 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 72/100 (estimated time remaining: 8 hours, 2 minutes, 11 seconds)
2025-09-12 13:47:43,714 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 13:47:43,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 13:52:18,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2924.67456 ± 2029.581
2025-09-12 13:52:18,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1055.4435, 3889.659, 196.59471, 2836.0356, 931.3869, 5442.696, 6746.181, 4301.2515, 1423.6025, 2423.894]
2025-09-12 13:52:18,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 13:52:18,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 73/100 (estimated time remaining: 7 hours, 45 minutes, 24 seconds)
2025-09-12 14:04:18,589 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:04:18,599 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:08:56,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4829.45605 ± 2231.493
2025-09-12 14:08:56,873 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2671.7253, 1061.6366, 6460.0854, 4590.6777, 6965.7407, 1074.6022, 6271.5664, 6377.6484, 6621.552, 6199.3228]
2025-09-12 14:08:56,873 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:08:56,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 74/100 (estimated time remaining: 7 hours, 28 minutes, 56 seconds)
2025-09-12 14:20:57,913 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:20:57,926 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:25:33,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3658.02979 ± 2743.820
2025-09-12 14:25:33,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [452.6285, 165.87051, 176.7789, 5661.567, 4925.25, 645.2012, 5150.059, 6243.717, 6489.4565, 6669.7686]
2025-09-12 14:25:33,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:25:33,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 75/100 (estimated time remaining: 7 hours, 12 minutes, 10 seconds)
2025-09-12 14:37:34,277 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:37:34,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:42:09,143 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5347.13574 ± 2023.922
2025-09-12 14:42:09,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [645.1992, 6719.238, 6516.9756, 4306.4775, 6917.559, 6218.787, 6540.2583, 6442.095, 6519.1455, 2645.6262]
2025-09-12 14:42:09,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:42:09,212 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 76/100 (estimated time remaining: 6 hours, 55 minutes, 26 seconds)
2025-09-12 14:54:09,447 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 14:54:09,456 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 14:58:47,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3927.06909 ± 2092.085
2025-09-12 14:58:47,530 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [2554.3696, 6350.5205, 4961.231, 2087.8118, 2930.799, 1667.9027, 6536.7134, 4315.6597, 6974.2817, 891.40326]
2025-09-12 14:58:47,530 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 14:58:47,599 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 77/100 (estimated time remaining: 6 hours, 38 minutes, 48 seconds)
2025-09-12 15:10:47,738 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:10:47,749 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:15:23,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4831.22998 ± 2101.230
2025-09-12 15:15:23,089 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1832.7323, 6686.4097, 6291.6724, 6770.03, 5679.4062, 2545.7869, 1967.9584, 2893.468, 6794.1895, 6850.6514]
2025-09-12 15:15:23,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:15:23,172 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 78/100 (estimated time remaining: 6 hours, 22 minutes, 7 seconds)
2025-09-12 15:27:23,780 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:27:23,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:32:01,521 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3501.31104 ± 2036.506
2025-09-12 15:32:01,523 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3105.4102, 6402.005, 1599.7528, 5867.1895, 4740.8843, 329.69055, 362.95206, 3484.0308, 4557.054, 4564.14]
2025-09-12 15:32:01,523 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:32:01,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 79/100 (estimated time remaining: 6 hours, 5 minutes, 32 seconds)
2025-09-12 15:44:03,775 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 15:44:03,785 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 15:48:42,626 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4894.78516 ± 2288.020
2025-09-12 15:48:42,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [6833.315, 373.79507, 1329.6989, 6384.1797, 4258.0903, 6614.431, 6799.8735, 3592.702, 6135.253, 6626.5117]
2025-09-12 15:48:42,632 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 15:48:42,707 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 80/100 (estimated time remaining: 5 hours, 49 minutes, 14 seconds)
2025-09-12 16:00:43,746 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:00:43,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:05:21,165 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4275.41699 ± 2762.847
2025-09-12 16:05:21,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [6675.719, 1222.4067, 6301.9844, 161.27621, 1612.7952, 6726.121, 6829.496, 872.19495, 5229.5293, 7122.646]
2025-09-12 16:05:21,174 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:05:21,245 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 81/100 (estimated time remaining: 5 hours, 32 minutes, 48 seconds)
2025-09-12 16:17:23,293 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:17:23,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:21:59,239 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4395.56152 ± 2687.054
2025-09-12 16:21:59,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [232.70581, 6692.068, 7296.9795, 1500.0825, 6895.257, 6379.117, 5959.6445, 1683.1875, 1237.1489, 6079.421]
2025-09-12 16:21:59,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:21:59,258 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 82/100 (estimated time remaining: 5 hours, 16 minutes, 8 seconds)
2025-09-12 16:34:00,304 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:34:00,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:38:34,783 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4821.22266 ± 2406.856
2025-09-12 16:38:34,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [6141.2773, 6144.121, 5762.834, 853.76044, 7029.5845, 3371.55, 5863.343, 6466.8687, -53.03384, 6631.924]
2025-09-12 16:38:34,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:38:34,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 83/100 (estimated time remaining: 4 hours, 59 minutes, 29 seconds)
2025-09-12 16:50:36,060 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 16:50:36,069 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 16:55:11,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 2150.85840 ± 2466.349
2025-09-12 16:55:11,295 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [-30.866688, 3334.3057, 583.13916, 130.04887, 2624.8113, 141.10526, 33.037212, 6765.0864, 6363.146, 1564.7723]
2025-09-12 16:55:11,295 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 16:55:11,346 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 84/100 (estimated time remaining: 4 hours, 42 minutes, 45 seconds)
2025-09-12 17:07:12,483 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:07:12,494 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:11:46,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5196.23779 ± 1695.876
2025-09-12 17:11:46,851 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [6750.782, 5754.857, 4822.846, 6137.331, 6939.961, 6400.847, 2657.0825, 1906.3638, 6565.8037, 4026.5034]
2025-09-12 17:11:46,851 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:11:46,925 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 85/100 (estimated time remaining: 4 hours, 25 minutes, 49 seconds)
2025-09-12 17:23:49,285 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:23:49,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:28:25,961 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5412.31787 ± 2028.044
2025-09-12 17:28:25,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1973.3981, 6739.4175, 5986.183, 1229.5083, 6634.6143, 6719.702, 4515.715, 6602.231, 6642.7407, 7079.6694]
2025-09-12 17:28:25,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:28:25,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (5412.32) for latency ExtremeClogL1U23
2025-09-12 17:28:26,034 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 86/100 (estimated time remaining: 4 hours, 9 minutes, 14 seconds)
2025-09-12 17:40:28,011 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:40:28,027 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 17:45:06,029 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4050.09253 ± 2169.799
2025-09-12 17:45:06,037 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5752.9434, 5264.0176, 1906.6227, 2826.7383, 1318.285, 6652.2617, 6560.6465, 2418.2134, 6486.2563, 1314.9431]
2025-09-12 17:45:06,037 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 17:45:06,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 87/100 (estimated time remaining: 3 hours, 52 minutes, 43 seconds)
2025-09-12 17:57:08,597 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 17:57:08,610 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:01:46,094 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4366.56934 ± 2580.079
2025-09-12 18:01:46,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [6916.3936, 2141.3538, 3597.6396, 307.73288, 4589.4937, 7092.267, 7011.3745, 5712.578, 82.82016, 6214.0347]
2025-09-12 18:01:46,103 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:01:46,176 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 36 minutes, 17 seconds)
2025-09-12 18:13:48,922 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:13:48,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:18:25,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4671.01855 ± 2485.964
2025-09-12 18:18:25,608 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [3105.249, 7299.512, 5866.3237, 1974.8182, 6549.911, 7128.3154, 6901.883, 446.29175, 5902.3794, 1535.5049]
2025-09-12 18:18:25,608 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:18:25,650 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 89/100 (estimated time remaining: 3 hours, 19 minutes, 46 seconds)
2025-09-12 18:30:27,054 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:30:27,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:35:03,046 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3745.26367 ± 2714.370
2025-09-12 18:35:03,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1429.5865, 6232.1006, 2373.7422, -16.968536, 2215.0508, 2409.8604, 1233.6892, 7072.9683, 7202.5806, 7300.0273]
2025-09-12 18:35:03,052 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:35:03,087 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 90/100 (estimated time remaining: 3 hours, 3 minutes, 11 seconds)
2025-09-12 18:47:05,531 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 18:47:05,546 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 18:51:42,230 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3758.97900 ± 2495.667
2025-09-12 18:51:42,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [6731.584, 3584.1924, 205.23413, 6000.681, 5942.6, 1825.0255, 5343.7114, 6398.8276, 858.86224, 699.0689]
2025-09-12 18:51:42,233 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 18:51:42,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 46 minutes, 32 seconds)
2025-09-12 19:03:44,654 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:03:44,665 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 19:08:19,512 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4351.07910 ± 2533.196
2025-09-12 19:08:19,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5083.177, 6563.3833, 169.20346, 6430.607, 1286.1732, 6150.603, 6904.119, 6913.0044, 1682.0885, 2328.4314]
2025-09-12 19:08:19,522 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:08:19,574 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 29 minutes, 48 seconds)
2025-09-12 19:20:20,356 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:20:20,368 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 19:24:58,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 6087.66650 ± 1142.664
2025-09-12 19:24:58,762 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [6693.167, 6225.1396, 6840.0293, 6333.0527, 6244.248, 7248.4136, 6812.333, 6298.585, 3059.112, 5122.581]
2025-09-12 19:24:58,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:24:58,763 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1226 [INFO]: New best (6087.67) for latency ExtremeClogL1U23
2025-09-12 19:24:58,830 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 93/100 (estimated time remaining: 2 hours, 13 minutes, 8 seconds)
2025-09-12 19:37:02,182 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:37:02,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 19:41:37,043 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5704.88232 ± 1866.874
2025-09-12 19:41:37,051 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [6903.4756, 2100.214, 6871.802, 6811.4395, 5485.82, 7075.1807, 6515.1616, 6735.1787, 2027.5618, 6522.991]
2025-09-12 19:41:37,051 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:41:37,129 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 94/100 (estimated time remaining: 1 hour, 56 minutes, 28 seconds)
2025-09-12 19:53:39,771 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 19:53:39,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 19:58:15,217 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 3836.96484 ± 2052.659
2025-09-12 19:58:15,225 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [4449.175, 6160.2847, 3425.755, 4412.0005, -53.41467, 6863.7803, 3944.8906, 4614.178, 4019.9512, 533.0461]
2025-09-12 19:58:15,225 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 19:58:15,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 39 minutes, 50 seconds)
2025-09-12 20:10:17,822 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:10:17,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 20:14:56,906 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5738.57910 ± 1717.354
2025-09-12 20:14:56,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [6140.507, 7070.7065, 5377.469, 7201.6826, 3280.4675, 6451.9087, 6950.459, 6870.6826, 1775.5226, 6266.384]
2025-09-12 20:14:56,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:14:56,948 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 23 minutes, 14 seconds)
2025-09-12 20:27:00,262 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:27:00,281 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 20:31:36,110 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4538.53223 ± 2464.635
2025-09-12 20:31:36,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [6288.26, 52.25103, 6535.199, 4069.3235, 7051.035, 6724.6655, 3351.597, 429.32278, 6521.4185, 4362.2505]
2025-09-12 20:31:36,120 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:31:36,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 6 minutes, 37 seconds)
2025-09-12 20:43:38,894 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 20:43:38,912 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 20:48:13,248 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4966.67090 ± 2513.617
2025-09-12 20:48:13,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [7168.0894, 6966.2793, 6716.5244, 4355.0444, 6950.994, 5421.993, 4266.6655, 544.9193, 233.32256, 7042.88]
2025-09-12 20:48:13,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 20:48:13,334 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 98/100 (estimated time remaining: 49 minutes, 56 seconds)
2025-09-12 21:00:16,328 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:00:16,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 21:04:53,562 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4870.81543 ± 2262.837
2025-09-12 21:04:53,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [5856.511, 6989.101, 2372.2734, 7257.508, 7034.818, 1941.2861, 1240.4265, 6885.851, 5813.115, 3317.2615]
2025-09-12 21:04:53,571 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:04:53,636 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 99/100 (estimated time remaining: 33 minutes, 18 seconds)
2025-09-12 21:16:58,255 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:16:58,262 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 21:21:34,094 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 4722.04395 ± 2381.973
2025-09-12 21:21:34,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [1748.0817, 6643.2476, 7487.306, 4915.8965, 6216.883, 2839.9001, 7359.934, 190.40982, 3649.0618, 6169.7227]
2025-09-12 21:21:34,122 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:21:34,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1199 [INFO]: Iteration 100/100 (estimated time remaining: 16 minutes, 39 seconds)
2025-09-12 21:33:36,875 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-09-12 21:33:36,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1214 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-09-12 21:38:14,053 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1221 [DEBUG]: Total Reward: 5287.80762 ± 2442.567
2025-09-12 21:38:14,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1222 [DEBUG]: All rewards: [6796.758, 5155.382, 5860.528, 6835.2744, 1063.2361, 131.80753, 5740.287, 7145.8584, 7367.253, 6781.6904]
2025-09-12 21:38:14,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1223 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-09-12 21:38:14,140 latency_env.delayed_mdp:training_loop(baseline-mbpac-noiseperc0-halfcheetah):1251 [DEBUG]: Training session finished
