2025-05-06 08:22:15,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1006 [DEBUG]: logdir: _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-mbpac_memdelay
2025-05-06 08:22:15,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1007 [DEBUG]: trainer_prefix: benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-mbpac_memdelay
2025-05-06 08:22:15,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1008 [DEBUG]: args.trainer_eval_latencies: {'ExtremeSparseL4U32': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x7f7d6ece6d40>}
2025-05-06 08:22:15,257 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1009 [DEBUG]: using device: cuda
2025-05-06 08:22:15,261 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1031 [INFO]: Creating new trainer
2025-05-06 08:22:15,292 baseline-mbpac-noisy-halfcheetah:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-05-06 08:22:15,292 baseline-mbpac-noisy-halfcheetah:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-05-06 08:22:15,303 baseline-mbpac-noisy-halfcheetah:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=6, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-05-06 08:22:16,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1092 [DEBUG]: Starting training session...
2025-05-06 08:22:16,116 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 1/100
2025-05-06 08:40:16,063 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 08:40:16,066 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 08:49:07,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -188.08965 ± 17.729
2025-05-06 08:49:07,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-174.54861, -189.585, -196.19298, -164.75317, -150.88152, -209.72246, -201.19362, -202.17232, -196.94234, -194.90442]
2025-05-06 08:49:07,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 08:49:07,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1124 [INFO]: New best (-188.09) for latency ExtremeSparseL4U32
2025-05-06 08:49:07,669 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-06 08:49:07,678 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-mbpac_memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 08:49:07,705 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 2/100 (estimated time remaining: 44 hours, 19 minutes, 7 seconds)
2025-05-06 09:06:03,741 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 09:06:03,746 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 09:13:54,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 50.62461 ± 53.609
2025-05-06 09:13:54,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-5.2722187, 71.09881, 72.89436, 46.20045, -52.322437, 22.675892, 54.62449, 163.40802, 60.884888, 72.053894]
2025-05-06 09:13:54,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 09:13:54,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1124 [INFO]: New best (50.62) for latency ExtremeSparseL4U32
2025-05-06 09:13:54,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-06 09:13:54,816 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-mbpac_memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 09:13:54,843 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 3/100 (estimated time remaining: 42 hours, 10 minutes, 37 seconds)
2025-05-06 09:31:37,774 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 09:31:37,781 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 09:38:28,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 393.20660 ± 186.780
2025-05-06 09:38:28,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [470.63052, 68.59811, 208.49715, 584.22375, 512.66724, 463.76004, 566.8154, 95.376434, 562.8375, 398.65973]
2025-05-06 09:38:28,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 09:38:28,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1124 [INFO]: New best (393.21) for latency ExtremeSparseL4U32
2025-05-06 09:38:28,087 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-06 09:38:28,102 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-mbpac_memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 09:38:28,134 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 4/100 (estimated time remaining: 41 hours, 3 minutes, 48 seconds)
2025-05-06 09:55:23,786 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 09:55:23,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 10:05:04,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1307.56689 ± 345.215
2025-05-06 10:05:04,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1763.069, 1467.1282, 1375.485, 1524.835, 906.3202, 1408.3871, 690.75256, 835.27454, 1472.8698, 1631.5466]
2025-05-06 10:05:04,814 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 10:05:04,815 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1124 [INFO]: New best (1307.57) for latency ExtremeSparseL4U32
2025-05-06 10:05:04,815 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-06 10:05:04,827 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-mbpac_memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 10:05:04,870 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 5/100 (estimated time remaining: 41 hours, 7 minutes, 30 seconds)
2025-05-06 10:23:39,541 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 10:23:39,553 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 10:31:23,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1669.94751 ± 462.505
2025-05-06 10:31:23,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2258.3315, 1135.8955, 2105.2485, 1345.1085, 1937.2152, 654.4449, 1885.5092, 1888.7642, 1696.735, 1792.2212]
2025-05-06 10:31:23,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 10:31:23,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1124 [INFO]: New best (1669.95) for latency ExtremeSparseL4U32
2025-05-06 10:31:23,776 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-06 10:31:23,788 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-mbpac_memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 10:31:23,824 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 6/100 (estimated time remaining: 40 hours, 53 minutes, 26 seconds)
2025-05-06 10:49:08,122 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 10:49:08,127 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 10:57:19,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1925.94861 ± 739.393
2025-05-06 10:57:19,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2506.0493, 2280.3516, 616.3624, 2728.1985, 2137.5044, 2264.7432, 498.72305, 1571.9277, 2300.9343, 2354.6912]
2025-05-06 10:57:19,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 10:57:19,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1124 [INFO]: New best (1925.95) for latency ExtremeSparseL4U32
2025-05-06 10:57:19,735 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-06 10:57:19,743 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-mbpac_memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 10:57:19,771 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 7/100 (estimated time remaining: 40 hours, 10 minutes, 10 seconds)
2025-05-06 11:15:09,087 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 11:15:09,092 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 11:24:17,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2465.14453 ± 486.343
2025-05-06 11:24:17,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2547.4102, 2792.9993, 2927.128, 2750.4453, 1153.5863, 2310.0757, 2414.6206, 2268.2026, 2832.055, 2654.9224]
2025-05-06 11:24:17,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 11:24:17,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1124 [INFO]: New best (2465.14) for latency ExtremeSparseL4U32
2025-05-06 11:24:17,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-06 11:24:17,574 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-mbpac_memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 11:24:17,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 8/100 (estimated time remaining: 40 hours, 25 minutes, 3 seconds)
2025-05-06 11:39:52,405 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 11:39:52,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 11:47:38,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2376.45581 ± 611.229
2025-05-06 11:47:38,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2076.279, 3008.534, 1776.1643, 2980.5583, 2682.233, 2792.9846, 1377.2983, 1467.334, 2985.7168, 2617.456]
2025-05-06 11:47:38,156 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 11:47:38,159 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 9/100 (estimated time remaining: 39 hours, 36 minutes, 40 seconds)
2025-05-06 12:04:20,141 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 12:04:20,147 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 12:11:37,997 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2381.69531 ± 785.512
2025-05-06 12:11:37,997 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [813.54346, 2012.3684, 2475.2598, 3212.2126, 3002.984, 2829.9841, 1060.7466, 2780.1047, 2886.9688, 2742.7812]
2025-05-06 12:11:37,997 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 12:11:37,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 10/100 (estimated time remaining: 38 hours, 23 minutes, 14 seconds)
2025-05-06 12:25:46,595 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 12:25:46,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 12:32:24,165 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2489.32861 ± 1095.299
2025-05-06 12:32:24,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2425.9229, 2977.773, 3034.2722, 3168.6501, 2811.155, 3307.3884, 3007.527, 230.158, 498.72794, 3431.7117]
2025-05-06 12:32:24,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 12:32:24,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1124 [INFO]: New best (2489.33) for latency ExtremeSparseL4U32
2025-05-06 12:32:24,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-06 12:32:24,174 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-mbpac_memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 12:32:24,208 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 11/100 (estimated time remaining: 36 hours, 18 minutes, 6 seconds)
2025-05-06 12:46:07,969 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 12:46:07,973 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 12:52:37,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2801.24219 ± 932.526
2025-05-06 12:52:37,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2126.7021, 2549.3162, 3222.6765, 2979.8403, 3556.3184, 2906.2747, 346.14346, 3545.4019, 3627.1895, 3152.5593]
2025-05-06 12:52:37,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 12:52:37,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1124 [INFO]: New best (2801.24) for latency ExtremeSparseL4U32
2025-05-06 12:52:37,845 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-06 12:52:37,890 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-mbpac_memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 12:52:37,919 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 12/100 (estimated time remaining: 34 hours, 12 minutes, 23 seconds)
2025-05-06 13:06:17,365 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 13:06:17,368 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 13:13:00,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2612.37524 ± 1228.920
2025-05-06 13:13:00,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3563.9712, 1747.8419, 3738.362, 3589.7712, 844.6902, 1218.6737, 3463.835, 3674.1538, 3530.9956, 751.45715]
2025-05-06 13:13:00,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 13:13:00,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 13/100 (estimated time remaining: 31 hours, 53 minutes, 14 seconds)
2025-05-06 13:25:48,912 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 13:25:48,916 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 13:31:57,056 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2580.35400 ± 995.148
2025-05-06 13:31:57,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1057.5703, 3475.5413, 3774.1223, 1514.4974, 3076.6099, 1257.368, 3569.9644, 1898.5631, 3451.8274, 2727.4766]
2025-05-06 13:31:57,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 13:31:57,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 14/100 (estimated time remaining: 30 hours, 15 minutes, 4 seconds)
2025-05-06 13:44:34,531 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 13:44:34,536 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 13:50:35,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3204.35425 ± 931.752
2025-05-06 13:50:35,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3508.0642, 2539.2258, 2892.3816, 758.8718, 3169.247, 3693.567, 4030.2542, 3914.9524, 3819.2505, 3717.7295]
2025-05-06 13:50:35,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 13:50:35,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1124 [INFO]: New best (3204.35) for latency ExtremeSparseL4U32
2025-05-06 13:50:35,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-06 13:50:35,676 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-mbpac_memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 13:50:35,701 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 15/100 (estimated time remaining: 28 hours, 22 minutes, 8 seconds)
2025-05-06 14:02:46,727 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 14:02:46,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 14:08:39,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3630.57349 ± 226.880
2025-05-06 14:08:39,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3402.4692, 3505.8762, 3886.669, 3841.0305, 3802.49, 3648.757, 3097.223, 3761.8193, 3658.6697, 3700.7297]
2025-05-06 14:08:39,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 14:08:39,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1124 [INFO]: New best (3630.57) for latency ExtremeSparseL4U32
2025-05-06 14:08:39,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-06 14:08:39,126 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-mbpac_memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 14:08:39,151 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 16/100 (estimated time remaining: 27 hours, 16 minutes, 14 seconds)
2025-05-06 14:20:46,984 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 14:20:46,988 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 14:26:43,957 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2754.32690 ± 1382.997
2025-05-06 14:26:43,957 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4094.2869, 3170.8684, 101.11812, 3336.2898, 3988.4114, 3778.3762, 2409.0217, 3392.1414, 190.29755, 3082.458]
2025-05-06 14:26:43,957 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 14:26:43,959 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 17/100 (estimated time remaining: 26 hours, 20 minutes, 53 seconds)
2025-05-06 14:38:58,745 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 14:38:58,750 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 14:44:57,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3164.24927 ± 734.188
2025-05-06 14:44:57,805 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3279.1619, 1784.5417, 3308.9148, 3789.3809, 3499.9915, 1680.4354, 3720.0312, 3512.1726, 3385.4272, 3682.434]
2025-05-06 14:44:57,805 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 14:44:57,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 18/100 (estimated time remaining: 25 hours, 26 minutes, 35 seconds)
2025-05-06 14:58:05,140 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 14:58:05,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 15:06:16,037 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3174.02148 ± 1075.955
2025-05-06 15:06:16,037 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1447.5712, 3879.2344, 3510.787, 3692.2017, 3988.5005, 4053.5156, 3645.7039, 2657.3381, 901.1101, 3964.252]
2025-05-06 15:06:16,037 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 15:06:16,039 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 19/100 (estimated time remaining: 25 hours, 46 minutes, 47 seconds)
2025-05-06 15:24:03,683 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 15:24:03,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 15:32:22,572 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2640.99365 ± 1168.147
2025-05-06 15:32:22,573 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2141.961, 3747.0356, 2707.736, 1251.7963, 3236.1096, 3622.051, 286.0337, 3780.0989, 3822.525, 1814.5889]
2025-05-06 15:32:22,573 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 15:32:22,575 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 20/100 (estimated time remaining: 27 hours, 28 minutes, 51 seconds)
2025-05-06 15:48:38,190 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 15:48:38,199 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 15:55:59,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3242.43677 ± 1318.170
2025-05-06 15:55:59,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3673.1506, 3668.5195, 1202.212, 156.23767, 4041.746, 3574.6636, 3903.909, 3928.7292, 4312.3296, 3962.8699]
2025-05-06 15:55:59,697 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 15:55:59,700 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 21/100 (estimated time remaining: 28 hours, 37 minutes, 28 seconds)
2025-05-06 16:12:56,185 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 16:12:56,190 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 16:20:26,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3434.52661 ± 1085.495
2025-05-06 16:20:26,432 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4258.0557, 3876.0793, 804.1275, 1898.972, 3719.6033, 4129.108, 4215.8896, 3941.6257, 3776.541, 3725.2622]
2025-05-06 16:20:26,432 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 16:20:26,436 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 22/100 (estimated time remaining: 29 hours, 56 minutes, 35 seconds)
2025-05-06 16:37:06,760 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 16:37:06,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 16:44:49,715 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3269.77100 ± 988.535
2025-05-06 16:44:49,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1682.0936, 3984.8608, 3946.2224, 3996.4097, 2532.555, 4003.3853, 3582.1611, 3877.2598, 3803.2537, 1289.5089]
2025-05-06 16:44:49,716 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 16:44:49,718 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 23/100 (estimated time remaining: 31 hours, 9 minutes, 53 seconds)
2025-05-06 17:01:09,116 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 17:01:09,121 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 17:09:52,421 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2800.97729 ± 1458.968
2025-05-06 17:09:52,421 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3446.3162, 4175.499, 2360.3457, 4319.65, 1772.8185, 3649.0012, 4215.628, 3416.9722, 525.8156, 127.72693]
2025-05-06 17:09:52,421 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 17:09:52,424 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 24/100 (estimated time remaining: 31 hours, 43 minutes, 32 seconds)
2025-05-06 17:27:11,796 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 17:27:11,801 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 17:34:55,276 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2406.82031 ± 1323.384
2025-05-06 17:34:55,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4095.1042, 2135.6077, 3995.9473, 3865.1763, 3820.187, 1859.8378, 825.418, 1483.902, 1376.3867, 610.6377]
2025-05-06 17:34:55,277 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 17:34:55,279 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 25/100 (estimated time remaining: 31 hours, 2 minutes, 41 seconds)
2025-05-06 17:50:58,664 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 17:50:58,676 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 17:58:10,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2417.27100 ± 1144.485
2025-05-06 17:58:10,509 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3730.6562, 1970.376, 4226.8926, 453.90604, 1933.9956, 3661.1953, 1035.5295, 2049.0933, 2855.8152, 2255.253]
2025-05-06 17:58:10,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 17:58:10,515 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 26/100 (estimated time remaining: 30 hours, 32 minutes, 42 seconds)
2025-05-06 18:14:11,245 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 18:14:11,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 18:21:38,406 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2820.20264 ± 1247.328
2025-05-06 18:21:38,407 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1668.2338, 2579.502, 3125.0327, 4330.5864, 3532.7517, 3637.981, 4144.8027, 724.37177, 3603.6875, 855.07666]
2025-05-06 18:21:38,407 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 18:21:38,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 27/100 (estimated time remaining: 29 hours, 53 minutes, 45 seconds)
2025-05-06 18:37:34,211 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 18:37:34,218 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 18:45:09,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2928.15234 ± 1434.338
2025-05-06 18:45:09,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3939.2546, 1407.1302, 196.44298, 2849.7947, 4054.6848, 3728.8877, 4154.4624, 4131.7964, 931.6514, 3887.419]
2025-05-06 18:45:09,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 18:45:09,091 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 28/100 (estimated time remaining: 29 hours, 16 minutes, 42 seconds)
2025-05-06 19:01:36,534 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 19:01:36,539 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 19:10:06,409 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2943.19995 ± 1421.100
2025-05-06 19:10:06,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4084.5767, 4347.7334, 3775.3477, 765.1097, 3610.2998, 1059.4749, 4012.7065, 3811.9714, 614.0187, 3350.7625]
2025-05-06 19:10:06,410 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 19:10:06,412 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 29/100 (estimated time remaining: 28 hours, 51 minutes, 21 seconds)
2025-05-06 19:25:00,954 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 19:25:00,958 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 19:32:19,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3711.94263 ± 612.631
2025-05-06 19:32:19,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4107.4424, 4067.3655, 3847.654, 1931.0455, 3908.9521, 3728.788, 4110.083, 3641.505, 3799.8848, 3976.7065]
2025-05-06 19:32:19,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 19:32:19,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1124 [INFO]: New best (3711.94) for latency ExtremeSparseL4U32
2025-05-06 19:32:19,125 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-06 19:32:19,136 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-mbpac_memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 19:32:19,166 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 30/100 (estimated time remaining: 27 hours, 47 minutes, 3 seconds)
2025-05-06 19:47:34,725 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 19:47:34,733 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 19:55:03,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2706.75439 ± 1304.453
2025-05-06 19:55:03,537 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3590.672, 3944.0928, 4167.108, 2042.2775, 3991.848, 1624.4814, 2286.8652, 435.1596, 1099.9548, 3885.086]
2025-05-06 19:55:03,538 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 19:55:03,540 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 31/100 (estimated time remaining: 27 hours, 16 minutes, 22 seconds)
2025-05-06 20:10:52,727 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 20:10:52,735 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 20:18:21,321 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3564.60034 ± 1105.911
2025-05-06 20:18:21,321 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [390.68323, 3860.8948, 4090.6248, 3814.5894, 3964.8115, 3099.238, 4021.7698, 3970.481, 3962.1375, 4470.772]
2025-05-06 20:18:21,321 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 20:18:21,324 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 32/100 (estimated time remaining: 26 hours, 50 minutes, 40 seconds)
2025-05-06 20:34:18,864 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 20:34:18,869 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 20:41:41,954 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3026.14795 ± 1442.568
2025-05-06 20:41:41,954 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [26.550674, 4039.9192, 3968.0464, 4518.0396, 4006.3813, 971.857, 2377.2524, 3522.0452, 2543.8145, 4287.5723]
2025-05-06 20:41:41,955 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 20:41:41,960 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 33/100 (estimated time remaining: 26 hours, 25 minutes, 3 seconds)
2025-05-06 20:56:16,389 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 20:56:16,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 21:03:20,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3793.43750 ± 377.974
2025-05-06 21:03:20,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4165.785, 3386.53, 3944.3774, 2937.296, 3886.6694, 4006.6382, 3591.627, 4156.6953, 4158.3745, 3700.3845]
2025-05-06 21:03:20,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 21:03:20,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1124 [INFO]: New best (3793.44) for latency ExtremeSparseL4U32
2025-05-06 21:03:20,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-06 21:03:20,196 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-mbpac_memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 21:03:20,225 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 34/100 (estimated time remaining: 25 hours, 17 minutes, 17 seconds)
2025-05-06 21:18:17,752 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 21:18:17,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 21:25:30,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3247.91162 ± 1408.298
2025-05-06 21:25:30,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4389.484, 4265.0083, 3907.4956, 246.27803, 3929.7407, 2026.2634, 4037.0571, 4104.382, 4254.9443, 1318.4624]
2025-05-06 21:25:30,792 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 21:25:30,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 35/100 (estimated time remaining: 24 hours, 54 minutes, 9 seconds)
2025-05-06 21:41:56,792 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 21:41:56,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 21:49:38,148 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3079.84180 ± 1347.205
2025-05-06 21:49:38,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4045.794, 755.1219, 4257.8486, 787.7059, 3766.2332, 3011.3389, 1909.2482, 4276.9517, 4325.5684, 3662.607]
2025-05-06 21:49:38,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 21:49:38,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 36/100 (estimated time remaining: 24 hours, 49 minutes, 30 seconds)
2025-05-06 22:04:45,281 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 22:04:45,290 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 22:11:23,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2846.87158 ± 1488.863
2025-05-06 22:11:23,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2908.6665, 3917.084, 936.314, 193.96956, 3364.9922, 3947.2307, 859.7016, 3893.7244, 4311.6846, 4135.3467]
2025-05-06 22:11:23,061 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 22:11:23,064 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 37/100 (estimated time remaining: 24 hours, 6 minutes, 46 seconds)
2025-05-06 22:25:47,820 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 22:25:47,824 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 22:32:44,854 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3355.11475 ± 1230.585
2025-05-06 22:32:44,854 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4236.375, 4062.686, 3688.9421, 2987.336, 4137.317, 501.68076, 4094.6724, 4131.637, 4123.2876, 1587.2156]
2025-05-06 22:32:44,855 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 22:32:44,860 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 38/100 (estimated time remaining: 23 hours, 19 minutes, 12 seconds)
2025-05-06 22:50:18,475 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 22:50:18,481 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 22:58:24,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3345.87354 ± 1091.343
2025-05-06 22:58:24,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1369.5023, 3995.5054, 3768.7336, 4299.194, 4082.8193, 2460.198, 1411.2932, 3849.7695, 4120.073, 4101.646]
2025-05-06 22:58:24,350 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 22:58:24,356 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 39/100 (estimated time remaining: 23 hours, 46 minutes, 51 seconds)
2025-05-06 23:15:30,379 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 23:15:30,385 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 23:24:18,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3078.61841 ± 1331.611
2025-05-06 23:24:18,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1329.3044, 2604.3682, 3888.1643, 869.5179, 1341.6416, 4226.937, 4363.538, 3839.7188, 4168.6846, 4154.308]
2025-05-06 23:24:18,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 23:24:18,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 40/100 (estimated time remaining: 24 hours, 9 minutes, 13 seconds)
2025-05-06 23:41:04,536 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-06 23:41:04,541 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-06 23:48:20,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4183.27100 ± 238.031
2025-05-06 23:48:20,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4297.6606, 3949.1272, 3835.54, 4238.4067, 4124.6836, 3815.368, 4198.3594, 4537.4805, 4419.9946, 4416.0913]
2025-05-06 23:48:20,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-06 23:48:20,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1124 [INFO]: New best (4183.27) for latency ExtremeSparseL4U32
2025-05-06 23:48:20,590 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-06 23:48:20,600 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-mbpac_memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-06 23:48:20,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 41/100 (estimated time remaining: 23 hours, 44 minutes, 29 seconds)
2025-05-07 00:05:18,257 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 00:05:18,262 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 00:13:42,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3216.78687 ± 1467.351
2025-05-07 00:13:42,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3832.292, 2851.0483, 284.4419, 4238.2188, 4382.625, 584.9364, 4317.8047, 3350.4397, 3976.0364, 4350.025]
2025-05-07 00:13:42,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 00:13:42,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 42/100 (estimated time remaining: 24 hours, 3 minutes, 20 seconds)
2025-05-07 00:30:10,751 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 00:30:10,760 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 00:37:33,717 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3145.09229 ± 1093.638
2025-05-07 00:37:33,718 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [731.21094, 3163.3787, 4035.394, 4061.0542, 2145.9578, 4143.6777, 3855.4653, 2179.6506, 4164.4478, 2970.6846]
2025-05-07 00:37:33,718 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 00:37:33,723 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 43/100 (estimated time remaining: 24 hours, 7 minutes, 50 seconds)
2025-05-07 00:53:10,798 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 00:53:10,803 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 01:00:35,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3159.52710 ± 1758.391
2025-05-07 01:00:35,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4255.8345, 4421.818, 265.32062, 457.43622, 4190.6216, 725.2327, 4541.666, 4197.516, 4226.3228, 4313.504]
2025-05-07 01:00:35,220 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 01:00:35,224 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 44/100 (estimated time remaining: 23 hours, 12 minutes, 51 seconds)
2025-05-07 01:15:43,327 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 01:15:43,339 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 01:24:02,754 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3307.15820 ± 1268.692
2025-05-07 01:24:02,755 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4038.2144, 4062.523, 3997.6619, 3802.8174, 4126.1733, 3536.329, 4125.794, 3775.6777, 1043.6875, 562.7048]
2025-05-07 01:24:02,755 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 01:24:02,760 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 45/100 (estimated time remaining: 22 hours, 21 minutes, 7 seconds)
2025-05-07 01:41:12,048 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 01:41:12,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 01:49:42,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4083.93896 ± 284.404
2025-05-07 01:49:42,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3529.1672, 4448.0127, 4482.409, 4397.9893, 3845.87, 4090.0752, 3998.2937, 3945.7212, 3937.36, 4164.492]
2025-05-07 01:49:42,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 01:49:42,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 46/100 (estimated time remaining: 22 hours, 14 minutes, 55 seconds)
2025-05-07 02:06:57,557 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 02:06:57,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 02:14:53,008 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2525.97217 ± 1367.167
2025-05-07 02:14:53,008 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-1.9762807, 1411.9617, 2316.9763, 3260.5977, 3706.4565, 4076.6763, 747.3339, 4283.708, 3142.937, 2315.0483]
2025-05-07 02:14:53,008 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 02:14:53,012 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 47/100 (estimated time remaining: 21 hours, 48 minutes, 45 seconds)
2025-05-07 02:32:06,403 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 02:32:06,408 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 02:39:11,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1985.75745 ± 1680.702
2025-05-07 02:39:11,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3765.5046, 245.10988, 4099.08, 4209.7803, 955.88983, 859.3296, 90.12452, 585.9698, 1078.1576, 3968.628]
2025-05-07 02:39:11,471 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 02:39:11,475 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 48/100 (estimated time remaining: 21 hours, 29 minutes, 16 seconds)
2025-05-07 02:55:30,516 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 02:55:30,548 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 03:02:45,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3963.89722 ± 799.973
2025-05-07 03:02:45,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4194.35, 4182.407, 4254.38, 3812.9197, 4374.227, 4135.78, 4191.006, 1622.3279, 4352.917, 4518.6577]
2025-05-07 03:02:45,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 03:02:45,888 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 49/100 (estimated time remaining: 21 hours, 10 minutes, 38 seconds)
2025-05-07 03:17:07,817 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 03:17:07,822 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 03:23:43,940 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3380.55933 ± 1389.167
2025-05-07 03:23:43,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4016.434, 2419.4766, 4337.172, 83.21705, 3839.3123, 4586.4854, 4388.646, 1856.8932, 4017.872, 4260.087]
2025-05-07 03:23:43,941 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 03:23:43,944 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 50/100 (estimated time remaining: 20 hours, 20 minutes, 48 seconds)
2025-05-07 03:37:01,837 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 03:37:01,842 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 03:43:23,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3496.61011 ± 979.427
2025-05-07 03:43:23,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4263.688, 2275.673, 4007.613, 1524.2052, 3695.2075, 4156.325, 3892.625, 2406.792, 4444.1255, 4299.8467]
2025-05-07 03:43:23,901 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 03:43:23,905 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 51/100 (estimated time remaining: 18 hours, 56 minutes, 58 seconds)
2025-05-07 03:56:06,092 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 03:56:06,097 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 04:02:01,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2981.68140 ± 1317.907
2025-05-07 04:02:01,196 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1003.1028, 3913.8728, 3206.6377, 4108.8906, 4282.7734, 2771.2146, 353.50232, 2206.7014, 4036.2046, 3933.9133]
2025-05-07 04:02:01,197 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 04:02:01,200 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 52/100 (estimated time remaining: 17 hours, 29 minutes, 56 seconds)
2025-05-07 04:14:28,782 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 04:14:28,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 04:20:25,191 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3081.21289 ± 1520.718
2025-05-07 04:20:25,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4399.1646, 41.099342, 1686.413, 4121.896, 4106.7275, 4529.025, 1997.7712, 4227.664, 4150.452, 1551.917]
2025-05-07 04:20:25,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 04:20:25,197 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 53/100 (estimated time remaining: 16 hours, 11 minutes, 47 seconds)
2025-05-07 04:32:49,471 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 04:32:49,477 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 04:38:46,966 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2586.77295 ± 1725.061
2025-05-07 04:38:46,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4432.2305, 269.5585, 4099.9893, 4093.351, 1622.759, 520.30396, 4195.6953, 1950.1708, 328.1254, 4355.547]
2025-05-07 04:38:46,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 04:38:46,971 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 54/100 (estimated time remaining: 15 hours, 2 minutes, 34 seconds)
2025-05-07 04:50:52,529 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 04:50:52,532 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 04:56:47,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3333.48682 ± 1407.428
2025-05-07 04:56:47,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4278.3315, 4389.7314, 4080.2427, 1121.2977, 4486.5835, 1361.5061, 3978.6538, 4371.8433, 1107.6316, 4159.043]
2025-05-07 04:56:47,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 04:56:47,618 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 55/100 (estimated time remaining: 14 hours, 16 minutes, 9 seconds)
2025-05-07 05:08:52,024 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 05:08:52,028 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 05:14:44,035 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2731.86279 ± 1576.636
2025-05-07 05:14:44,035 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [983.3119, 4121.785, 3803.8582, 844.959, 4526.6245, 4714.054, 2294.2344, 460.5657, 1632.923, 3936.312]
2025-05-07 05:14:44,035 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 05:14:44,040 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 56/100 (estimated time remaining: 13 hours, 42 minutes, 1 second)
2025-05-07 05:26:33,416 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 05:26:33,423 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 05:32:22,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2279.54541 ± 1347.537
2025-05-07 05:32:22,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4043.2292, 2137.516, 3860.077, 1927.0706, 3488.4468, 3636.113, 146.29912, 661.38806, 923.0051, 1972.3096]
2025-05-07 05:32:22,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 05:32:22,298 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 57/100 (estimated time remaining: 13 hours, 15 minutes, 5 seconds)
2025-05-07 05:44:11,793 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 05:44:11,797 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 05:49:58,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3695.84106 ± 1144.684
2025-05-07 05:49:58,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4397.147, 4567.983, 971.63525, 3748.964, 4598.6904, 4314.298, 2069.6926, 3839.9524, 4267.589, 4182.4575]
2025-05-07 05:49:58,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 05:49:58,167 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 58/100 (estimated time remaining: 12 hours, 50 minutes, 7 seconds)
2025-05-07 06:01:53,663 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 06:01:53,670 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 06:07:38,046 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2827.75659 ± 1679.671
2025-05-07 06:07:38,046 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3763.7644, 729.9031, 519.64197, 4121.6, 4411.869, 445.17047, 1544.0773, 4360.6025, 4326.2485, 4054.69]
2025-05-07 06:07:38,046 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 06:07:38,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 59/100 (estimated time remaining: 12 hours, 26 minutes, 21 seconds)
2025-05-07 06:19:26,782 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 06:19:26,787 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 06:25:22,650 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2908.46460 ± 1321.557
2025-05-07 06:25:22,650 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3529.7583, 4330.7812, 1540.4238, 1048.1727, 1885.2844, 2041.5366, 4693.7954, 3894.417, 1696.4417, 4424.0327]
2025-05-07 06:25:22,650 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 06:25:22,655 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 60/100 (estimated time remaining: 12 hours, 6 minutes, 23 seconds)
2025-05-07 06:37:10,384 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 06:37:10,388 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 06:42:53,968 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2873.08740 ± 1339.409
2025-05-07 06:42:53,968 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4120.693, 3498.5527, 2893.1487, 3638.2727, 3686.493, 1109.7798, 4007.3574, 548.2242, 4164.9863, 1063.3665]
2025-05-07 06:42:53,968 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 06:42:53,973 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 61/100 (estimated time remaining: 11 hours, 45 minutes, 19 seconds)
2025-05-07 06:54:49,979 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 06:54:49,983 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 07:00:23,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3517.11914 ± 1559.112
2025-05-07 07:00:23,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4190.1895, 550.46484, 306.82452, 4057.8838, 4089.8274, 4624.836, 4308.7817, 4666.8813, 4349.957, 4025.544]
2025-05-07 07:00:23,892 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 07:00:23,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 62/100 (estimated time remaining: 11 hours, 26 minutes, 36 seconds)
2025-05-07 07:12:14,539 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 07:12:14,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 07:17:58,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2540.26050 ± 1533.891
2025-05-07 07:17:58,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3956.0037, 2341.803, 395.68375, 656.7144, 4350.6396, 2201.239, 4530.647, 4295.8, 1392.1906, 1281.8824]
2025-05-07 07:17:58,439 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 07:17:58,443 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 63/100 (estimated time remaining: 11 hours, 8 minutes, 50 seconds)
2025-05-07 07:30:01,766 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 07:30:01,770 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 07:35:41,638 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3610.69971 ± 978.213
2025-05-07 07:35:41,638 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3208.2405, 4273.0674, 4268.8643, 3041.1628, 2291.6165, 1563.7456, 4351.127, 4412.254, 4375.2847, 4321.6343]
2025-05-07 07:35:41,638 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 07:35:41,642 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 64/100 (estimated time remaining: 10 hours, 51 minutes, 38 seconds)
2025-05-07 07:47:26,878 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 07:47:26,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 07:53:22,135 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3207.76221 ± 1349.872
2025-05-07 07:53:22,136 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [933.714, 3588.0298, 3587.1384, 3970.8918, 3978.7139, 3873.5818, 4295.34, 179.9531, 3880.0005, 3790.2578]
2025-05-07 07:53:22,136 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 07:53:22,141 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 65/100 (estimated time remaining: 10 hours, 33 minutes, 32 seconds)
2025-05-07 08:05:08,699 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 08:05:08,703 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 08:11:03,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3555.33252 ± 1276.679
2025-05-07 08:11:03,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4427.308, 4184.785, 4589.1143, 4010.7114, 4452.923, 4073.1304, 3812.2434, 1413.7902, 719.5578, 3869.761]
2025-05-07 08:11:03,270 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 08:11:03,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 66/100 (estimated time remaining: 10 hours, 17 minutes, 5 seconds)
2025-05-07 08:22:41,019 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 08:22:41,023 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 08:28:20,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3633.42065 ± 1016.412
2025-05-07 08:28:20,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4356.5073, 3271.987, 4408.051, 3975.333, 1689.4479, 1747.806, 3811.4875, 4458.521, 4259.7637, 4355.304]
2025-05-07 08:28:20,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 08:28:20,162 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 67/100 (estimated time remaining: 9 hours, 57 minutes, 58 seconds)
2025-05-07 08:40:01,639 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 08:40:01,643 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 08:45:42,808 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3183.59912 ± 1412.865
2025-05-07 08:45:42,809 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3993.0872, 3800.5786, 4643.0967, 4201.3174, 712.31213, 3043.3274, 4226.958, 1308.8522, 4522.2603, 1384.2018]
2025-05-07 08:45:42,809 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 08:45:42,813 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 68/100 (estimated time remaining: 9 hours, 39 minutes, 4 seconds)
2025-05-07 08:57:27,111 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 08:57:27,117 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 09:03:08,209 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3537.67578 ± 1259.501
2025-05-07 09:03:08,209 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4446.736, 3889.6787, 1059.8052, 3993.5518, 1125.5354, 4190.082, 3916.0076, 3611.941, 4702.693, 4440.7266]
2025-05-07 09:03:08,209 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 09:03:08,213 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 69/100 (estimated time remaining: 9 hours, 19 minutes, 38 seconds)
2025-05-07 09:15:02,097 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 09:15:02,101 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 09:20:44,002 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3995.51318 ± 314.269
2025-05-07 09:20:44,002 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3765.3384, 4173.655, 4052.3413, 4379.9507, 4393.0444, 3509.3186, 3810.093, 4282.882, 4076.6829, 3511.8298]
2025-05-07 09:20:44,002 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 09:20:44,007 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 70/100 (estimated time remaining: 9 hours, 1 minute, 39 seconds)
2025-05-07 09:32:24,347 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 09:32:24,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 09:38:10,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3027.06934 ± 1686.320
2025-05-07 09:38:10,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4214.646, 4520.5, 4262.1753, 816.28, 358.76028, 2142.9482, 4369.425, 4251.536, 806.938, 4527.4844]
2025-05-07 09:38:10,119 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 09:38:10,124 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 71/100 (estimated time remaining: 8 hours, 42 minutes, 41 seconds)
2025-05-07 09:50:13,944 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 09:50:13,948 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 09:56:10,231 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2813.77588 ± 1458.257
2025-05-07 09:56:10,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1562.4176, 4440.027, 4338.362, 506.1254, 4330.22, 681.35266, 3478.9556, 3972.975, 2888.9038, 1938.4178]
2025-05-07 09:56:10,232 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 09:56:10,236 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 72/100 (estimated time remaining: 8 hours, 29 minutes, 26 seconds)
2025-05-07 10:07:56,415 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 10:07:56,419 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 10:13:36,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3004.47949 ± 1517.148
2025-05-07 10:13:36,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2582.864, 1827.2917, 685.0904, 3690.6943, 4588.1953, 3973.168, 188.17998, 3852.884, 4305.9585, 4350.469]
2025-05-07 10:13:36,550 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 10:13:36,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 73/100 (estimated time remaining: 8 hours, 12 minutes, 12 seconds)
2025-05-07 10:25:24,183 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 10:25:24,187 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 10:31:06,905 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3084.99512 ± 1786.337
2025-05-07 10:31:06,906 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4344.4805, 4436.2964, 561.32837, 4163.504, 4091.2637, 342.3609, 4299.751, 4350.511, 189.7831, 4070.6716]
2025-05-07 10:31:06,906 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 10:31:06,910 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 74/100 (estimated time remaining: 7 hours, 55 minutes, 4 seconds)
2025-05-07 10:43:11,349 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 10:43:11,355 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 10:49:00,025 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2156.61597 ± 1290.724
2025-05-07 10:49:00,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2118.9019, 980.3037, 2410.0828, 3617.6038, 1293.2261, 3962.1506, 1348.4751, 1234.5859, 4240.2393, 360.5901]
2025-05-07 10:49:00,026 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 10:49:00,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 75/100 (estimated time remaining: 7 hours, 38 minutes, 59 seconds)
2025-05-07 11:00:55,926 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 11:00:55,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:06:41,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3183.01099 ± 1330.739
2025-05-07 11:06:41,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4120.5146, 4401.622, 4097.7554, 3387.0059, 566.6329, 4276.718, 2020.4928, 3858.5532, 1169.0037, 3931.8103]
2025-05-07 11:06:41,607 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 11:06:41,613 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 76/100 (estimated time remaining: 7 hours, 22 minutes, 37 seconds)
2025-05-07 11:18:28,991 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 11:18:28,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:24:10,859 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3273.71655 ± 1107.741
2025-05-07 11:24:10,859 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2413.4436, 936.30414, 4521.679, 2839.9407, 3700.0273, 4338.9644, 4147.322, 3959.5535, 3820.0034, 2059.9272]
2025-05-07 11:24:10,859 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 11:24:10,864 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 77/100 (estimated time remaining: 7 hours, 2 minutes, 27 seconds)
2025-05-07 11:36:10,917 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 11:36:10,922 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:42:08,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3010.75146 ± 1579.275
2025-05-07 11:42:08,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4013.871, 4169.5923, 128.08116, 4044.72, 4209.7983, 4475.424, 923.81757, 2476.0413, 4402.4365, 1263.7302]
2025-05-07 11:42:08,192 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 11:42:08,197 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 78/100 (estimated time remaining: 6 hours, 47 minutes, 13 seconds)
2025-05-07 11:54:03,988 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 11:54:03,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 11:59:48,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2896.18823 ± 1705.351
2025-05-07 11:59:48,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [757.313, 664.978, 3899.0361, 2704.6135, 3297.4932, 4366.792, 4286.6626, 4388.8423, -77.43328, 4673.5854]
2025-05-07 11:59:48,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 11:59:48,068 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 79/100 (estimated time remaining: 6 hours, 30 minutes, 13 seconds)
2025-05-07 12:11:37,702 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 12:11:37,706 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:17:29,365 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3828.16211 ± 856.748
2025-05-07 12:17:29,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3615.2869, 3546.3564, 4681.632, 2892.6665, 4208.2236, 4363.124, 1816.9856, 4488.7495, 4027.9485, 4640.6494]
2025-05-07 12:17:29,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 12:17:29,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 80/100 (estimated time remaining: 6 hours, 11 minutes, 39 seconds)
2025-05-07 12:29:17,689 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 12:29:17,693 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:35:00,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3917.49854 ± 968.513
2025-05-07 12:35:00,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3248.6326, 4165.714, 1255.5358, 4029.3552, 4065.4795, 4716.237, 4409.5264, 4658.8237, 4247.6616, 4378.019]
2025-05-07 12:35:00,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 12:35:00,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 81/100 (estimated time remaining: 5 hours, 53 minutes, 14 seconds)
2025-05-07 12:46:44,961 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 12:46:44,965 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 12:52:37,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2636.80688 ± 1649.912
2025-05-07 12:52:37,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [476.67838, 4670.3857, 770.54486, 3885.8186, 4712.33, 703.502, 1878.0344, 2698.8547, 4684.1187, 1887.7999]
2025-05-07 12:52:37,630 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 12:52:37,635 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 82/100 (estimated time remaining: 5 hours, 36 minutes, 5 seconds)
2025-05-07 13:04:41,357 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 13:04:41,361 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:10:23,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3352.63354 ± 1323.648
2025-05-07 13:10:23,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3659.7617, 4575.6494, 4410.927, 3243.918, 4544.346, 908.68835, 2966.191, 4199.981, 4107.317, 909.55725]
2025-05-07 13:10:23,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 13:10:23,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 83/100 (estimated time remaining: 5 hours, 17 minutes, 41 seconds)
2025-05-07 13:22:29,860 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 13:22:29,865 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:28:14,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2980.08447 ± 1886.547
2025-05-07 13:28:14,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4485.173, 107.047874, 274.13303, 3643.106, 4720.797, 3969.586, 4370.7534, 4180.4175, 12.384389, 4037.4424]
2025-05-07 13:28:14,503 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 13:28:14,510 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 84/100 (estimated time remaining: 5 hours, 41 seconds)
2025-05-07 13:40:07,775 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 13:40:07,778 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 13:45:49,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3693.25854 ± 1106.886
2025-05-07 13:45:49,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1689.3488, 4369.0444, 1611.7328, 4324.1084, 2956.4832, 4451.3184, 4496.1094, 4438.1914, 4307.872, 4288.3784]
2025-05-07 13:45:49,756 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 13:45:49,761 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 85/100 (estimated time remaining: 4 hours, 42 minutes, 41 seconds)
2025-05-07 13:57:42,105 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 13:57:42,110 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:03:33,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2871.05591 ± 1624.099
2025-05-07 14:03:33,302 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [668.9939, 4362.4995, 855.7925, 4287.879, 4358.8555, 2809.952, 48.740223, 3367.265, 3546.8123, 4403.769]
2025-05-07 14:03:33,302 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 14:03:33,308 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 86/100 (estimated time remaining: 4 hours, 25 minutes, 39 seconds)
2025-05-07 14:15:26,163 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 14:15:26,168 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:21:07,830 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3376.24097 ± 1175.412
2025-05-07 14:21:07,830 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4071.245, 1040.6359, 4056.1982, 4355.6904, 2517.909, 4157.564, 4132.3486, 1436.0731, 3970.594, 4024.1533]
2025-05-07 14:21:07,830 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 14:21:07,836 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 87/100 (estimated time remaining: 4 hours, 7 minutes, 48 seconds)
2025-05-07 14:32:55,543 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 14:32:55,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:38:48,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3085.08203 ± 1500.028
2025-05-07 14:38:48,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1681.0177, 3114.508, 4624.2686, 4753.5493, 2370.5117, 3730.013, 4406.7324, 118.53627, 4421.3696, 1630.3135]
2025-05-07 14:38:48,118 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 14:38:48,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 88/100 (estimated time remaining: 3 hours, 49 minutes, 52 seconds)
2025-05-07 14:50:30,073 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 14:50:30,077 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 14:56:11,828 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3768.25781 ± 1006.131
2025-05-07 14:56:11,828 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2207.7095, 4141.297, 4059.0781, 3199.6338, 4549.3735, 4490.4204, 4595.741, 1646.9749, 4530.0083, 4262.339]
2025-05-07 14:56:11,828 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 14:56:11,834 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 89/100 (estimated time remaining: 3 hours, 31 minutes, 5 seconds)
2025-05-07 15:08:23,917 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 15:08:23,923 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:14:14,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4231.15332 ± 333.688
2025-05-07 15:14:14,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4507.1416, 4215.2686, 4481.0347, 4123.431, 4481.9517, 4582.4097, 3463.2837, 4071.046, 3911.421, 4474.548]
2025-05-07 15:14:14,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 15:14:14,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1124 [INFO]: New best (4231.15) for latency ExtremeSparseL4U32
2025-05-07 15:14:14,980 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-07 15:14:14,989 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc7/noisy-halfcheetah/ExtremeSparseL4U32-mbpac_memdelay/checkpoints/best_ExtremeSparseL4U32.pkl
2025-05-07 15:14:15,019 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 90/100 (estimated time remaining: 3 hours, 14 minutes, 31 seconds)
2025-05-07 15:26:07,222 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 15:26:07,225 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:32:14,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3798.66260 ± 1134.334
2025-05-07 15:32:14,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4342.484, 4407.32, 4869.7705, 1109.3572, 2985.1553, 3620.4792, 4529.905, 4697.8877, 2771.5295, 4652.735]
2025-05-07 15:32:14,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 15:32:14,460 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 91/100 (estimated time remaining: 2 hours, 57 minutes, 22 seconds)
2025-05-07 15:44:16,075 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 15:44:16,079 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 15:50:09,956 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3791.13037 ± 1314.213
2025-05-07 15:50:09,956 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4806.255, 510.39505, 4483.053, 3927.948, 4130.6694, 4617.4697, 4075.669, 4763.6787, 2142.8254, 4453.343]
2025-05-07 15:50:09,956 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 15:50:09,963 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 92/100 (estimated time remaining: 2 hours, 40 minutes, 15 seconds)
2025-05-07 16:02:30,871 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 16:02:30,875 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:08:43,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2929.30298 ± 1751.682
2025-05-07 16:08:43,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1728.4192, 4406.3433, 4653.973, 4437.4365, 4598.1562, 368.66934, 4492.5576, 276.1072, 3119.572, 1211.7938]
2025-05-07 16:08:43,010 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 16:08:43,016 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 93/100 (estimated time remaining: 2 hours, 23 minutes, 51 seconds)
2025-05-07 16:20:32,305 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 16:20:32,310 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:26:10,206 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3972.95264 ± 1173.610
2025-05-07 16:26:10,207 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4581.1973, 4109.1973, 4466.0156, 4102.4126, 4390.183, 510.2775, 4493.404, 4710.1777, 4012.8662, 4353.7983]
2025-05-07 16:26:10,207 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 16:26:10,212 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 94/100 (estimated time remaining: 2 hours, 5 minutes, 57 seconds)
2025-05-07 16:37:57,769 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 16:37:57,773 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 16:43:37,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2954.65308 ± 1437.357
2025-05-07 16:43:37,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2200.1511, 4218.5405, 4140.7188, 957.8699, 3425.301, 4414.3267, 4088.455, 749.19366, 1185.6277, 4166.346]
2025-05-07 16:43:37,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 16:43:37,853 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 95/100 (estimated time remaining: 1 hour, 47 minutes, 15 seconds)
2025-05-07 16:55:12,026 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 16:55:12,030 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:00:54,842 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4005.56006 ± 671.450
2025-05-07 17:00:54,842 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2740.0256, 4216.0576, 3943.1685, 4482.588, 4423.9956, 4785.4634, 2799.9001, 4429.563, 3809.72, 4425.1206]
2025-05-07 17:00:54,842 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 17:00:54,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 96/100 (estimated time remaining: 1 hour, 28 minutes, 40 seconds)
2025-05-07 17:12:37,174 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 17:12:37,178 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:18:18,790 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2257.02026 ± 1967.311
2025-05-07 17:18:18,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [546.6022, 4733.4766, 872.3483, 4897.313, 467.4541, 4211.8506, 1634.836, 412.73764, 4622.1914, 171.39195]
2025-05-07 17:18:18,791 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 17:18:18,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 10 minutes, 31 seconds)
2025-05-07 17:30:05,365 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 17:30:05,369 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:35:51,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3215.39062 ± 982.416
2025-05-07 17:35:51,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2654.7195, 3531.513, 4323.08, 4348.521, 1429.0564, 2678.3032, 3611.182, 4101.996, 3698.9517, 1776.5842]
2025-05-07 17:35:51,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 17:35:51,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 98/100 (estimated time remaining: 52 minutes, 17 seconds)
2025-05-07 17:47:30,204 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 17:47:30,210 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 17:53:16,425 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3069.46533 ± 1424.751
2025-05-07 17:53:16,425 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1527.4526, 4143.2183, 612.5927, 4335.2397, 4434.9795, 2689.9392, 4285.46, 4468.463, 1149.828, 3047.4775]
2025-05-07 17:53:16,425 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 17:53:16,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 99/100 (estimated time remaining: 34 minutes, 50 seconds)
2025-05-07 18:05:10,557 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 18:05:10,561 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 18:10:54,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3119.20801 ± 1374.547
2025-05-07 18:10:54,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3312.2722, 4382.089, 4641.448, 2956.1028, 998.02527, 3366.2078, 4179.3945, 496.62006, 2390.1714, 4469.7485]
2025-05-07 18:10:54,076 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 18:10:54,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 100/100 (estimated time remaining: 17 minutes, 27 seconds)
2025-05-07 18:22:27,198 latency_env.training.mbpac:636 [DEBUG]: train() done
2025-05-07 18:22:27,203 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeSparseL4U32...
2025-05-07 18:28:16,140 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3235.03101 ± 1359.466
2025-05-07 18:28:16,140 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2271.8923, 461.84586, 3428.5786, 4325.6807, 3667.8264, 4645.171, 1390.3141, 3221.1082, 4390.7637, 4547.129]
2025-05-07 18:28:16,140 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-07 18:28:16,146 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1149 [DEBUG]: Training session finished
