2025-04-30 08:29:03,608 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1006 [DEBUG]: logdir: _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-mbpac_memdelay
2025-04-30 08:29:03,608 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1007 [DEBUG]: trainer_prefix: benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-mbpac_memdelay
2025-04-30 08:29:03,608 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1008 [DEBUG]: args.trainer_eval_latencies: {'ExtremeClogL1U23': <latency_env.delayed_mdp.HiddenMarkovianDelay object at 0x7f02e070fb50>}
2025-04-30 08:29:03,608 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1009 [DEBUG]: using device: cuda
2025-04-30 08:29:03,612 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1031 [INFO]: Creating new trainer
2025-04-30 08:29:03,643 baseline-mbpac-noisy-halfcheetah:110 [DEBUG]: pi network:
NNGaussianPolicy(
  (common_head): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=384, out_features=256, bias=True)
    (2): ReLU()
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): ReLU()
  )
  (mu_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (log_std_head): Sequential(
    (0): Linear(in_features=256, out_features=6, bias=True)
    (1): Unflatten(dim=1, unflattened_size=(6,))
  )
  (tanh_refit): NNTanhRefit(scale: tensor([[2., 2., 2., 2., 2., 2.]]), shift: tensor([[-1., -1., -1., -1., -1., -1.]]))
)
2025-04-30 08:29:03,644 baseline-mbpac-noisy-halfcheetah:111 [DEBUG]: q network:
NNLayerConcat2(
  dim: -1
  (next): Sequential(
    (0): Linear(in_features=23, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=1, bias=True)
    (5): NNLayerSqueeze(dim: -1)
  )
  (init_left): Flatten(start_dim=1, end_dim=-1)
  (init_right): Flatten(start_dim=1, end_dim=-1)
)
2025-04-30 08:29:03,654 baseline-mbpac-noisy-halfcheetah:140 [DEBUG]: Model structure:
NNPredictiveRecurrent(
  (emitter): NNGaussianProbabilisticEmitter(
    (emitter): NNLayerConcat(
      dim: -1
      (next): Sequential(
        (0): Sequential(
          (0): Linear(in_features=384, out_features=256, bias=True)
          (1): NNLayerClipSiLU(lower=-20.0)
          (2): Linear(in_features=256, out_features=256, bias=True)
          (3): NNLayerClipSiLU(lower=-20.0)
          (4): Linear(in_features=256, out_features=256, bias=True)
        )
        (1): NNLayerClipSiLU(lower=-20.0)
        (2): NNLayerHeadSplit(
          (heads): ModuleDict(
            (mu): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
            (log_std): Sequential(
              (0): Linear(in_features=256, out_features=256, bias=True)
              (1): NNLayerClipSiLU(lower=-20.0)
              (2): Linear(in_features=256, out_features=17, bias=True)
            )
          )
        )
      )
      (init_all): Identity()
    )
  )
  (net_embed_state): Sequential(
    (0): Linear(in_features=17, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
    (3): NNLayerClipSiLU(lower=-20.0)
    (4): Linear(in_features=256, out_features=384, bias=True)
  )
  (net_embed_action): Sequential(
    (0): Linear(in_features=6, out_features=256, bias=True)
    (1): NNLayerClipSiLU(lower=-20.0)
    (2): Linear(in_features=256, out_features=256, bias=True)
  )
  (net_rec): GRU(256, 384, batch_first=True)
)
2025-04-30 08:29:04,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1092 [DEBUG]: Starting training session...
2025-04-30 08:29:04,442 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 1/100
2025-04-30 08:41:36,045 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 08:41:36,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 08:49:17,252 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: -196.85901 ± 14.730
2025-04-30 08:49:17,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [-217.84889, -193.29036, -200.69162, -214.63287, -194.08841, -162.55751, -195.09901, -204.23036, -200.67299, -185.47797]
2025-04-30 08:49:17,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-04-30 08:49:17,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1124 [INFO]: New best (-196.86) for latency ExtremeClogL1U23
2025-04-30 08:49:17,253 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1127 [INFO]: saving network
2025-04-30 08:49:17,262 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-04-30 08:49:17,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 2/100 (estimated time remaining: 33 hours, 21 minutes, 11 seconds)
2025-04-30 09:05:43,844 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 09:05:43,850 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 09:12:50,169 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 152.05362 ± 107.990
2025-04-30 09:12:50,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [224.98943, 178.07301, 181.76595, 115.88313, 111.73343, -102.35928, 144.92616, 214.87656, 342.25317, 108.39459]
2025-04-30 09:12:50,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-04-30 09:12:50,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1124 [INFO]: New best (152.05) for latency ExtremeClogL1U23
2025-04-30 09:12:50,170 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1127 [INFO]: saving network
2025-04-30 09:12:50,179 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-04-30 09:12:50,209 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 3/100 (estimated time remaining: 35 hours, 44 minutes, 22 seconds)
2025-04-30 09:28:35,846 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 09:28:35,852 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 09:35:39,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 1692.52075 ± 404.095
2025-04-30 09:35:39,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1009.0276, 1968.466, 1773.9026, 2026.2097, 1518.1827, 909.2763, 1848.0184, 2107.4663, 2054.6243, 1710.0331]
2025-04-30 09:35:39,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-04-30 09:35:39,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1124 [INFO]: New best (1692.52) for latency ExtremeClogL1U23
2025-04-30 09:35:39,418 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1127 [INFO]: saving network
2025-04-30 09:35:39,426 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-04-30 09:35:39,455 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 4/100 (estimated time remaining: 35 hours, 52 minutes, 52 seconds)
2025-04-30 09:52:01,889 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 09:52:01,896 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 09:59:27,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2246.23486 ± 647.970
2025-04-30 09:59:27,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2521.975, 1333.8683, 2599.338, 2634.1875, 2344.595, 2725.2405, 676.7745, 2379.5513, 2695.4917, 2551.3235]
2025-04-30 09:59:27,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-04-30 09:59:27,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1124 [INFO]: New best (2246.23) for latency ExtremeClogL1U23
2025-04-30 09:59:27,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1127 [INFO]: saving network
2025-04-30 09:59:27,174 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-04-30 09:59:27,203 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 5/100 (estimated time remaining: 36 hours, 9 minutes, 6 seconds)
2025-04-30 10:15:21,944 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 10:15:21,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 10:22:33,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2660.81519 ± 427.391
2025-04-30 10:22:33,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3030.2625, 2486.274, 1527.2219, 2875.8484, 2866.1023, 2476.333, 2972.5017, 2633.226, 3052.7312, 2687.653]
2025-04-30 10:22:33,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-04-30 10:22:33,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1124 [INFO]: New best (2660.82) for latency ExtremeClogL1U23
2025-04-30 10:22:33,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1127 [INFO]: saving network
2025-04-30 10:22:33,895 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-04-30 10:22:33,924 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 6/100 (estimated time remaining: 35 hours, 56 minutes, 20 seconds)
2025-04-30 10:38:23,996 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 10:38:24,003 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 10:45:54,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2771.37402 ± 745.470
2025-04-30 10:45:54,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2932.7175, 3242.8843, 2841.4358, 632.98474, 3175.938, 2773.2349, 3139.5354, 3125.868, 2564.3926, 3284.7493]
2025-04-30 10:45:54,846 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-04-30 10:45:54,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1124 [INFO]: New best (2771.37) for latency ExtremeClogL1U23
2025-04-30 10:45:54,847 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1127 [INFO]: saving network
2025-04-30 10:45:54,857 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-04-30 10:45:54,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 7/100 (estimated time remaining: 36 hours, 32 minutes, 34 seconds)
2025-04-30 11:02:42,344 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 11:02:42,351 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 11:10:10,343 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2768.08081 ± 936.743
2025-04-30 11:10:10,343 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [658.529, 3401.134, 3491.191, 3458.6501, 3234.9243, 3332.4075, 1522.8699, 3344.616, 2138.7217, 3097.7646]
2025-04-30 11:10:10,343 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-04-30 11:10:10,344 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 8/100 (estimated time remaining: 36 hours, 22 minutes, 26 seconds)
2025-04-30 11:25:36,045 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 11:25:36,050 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 11:33:13,505 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3616.03125 ± 362.048
2025-04-30 11:33:13,505 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3645.0674, 3744.4143, 3621.6428, 3426.9653, 3976.003, 3938.0889, 2662.744, 3677.9734, 3938.7664, 3528.6475]
2025-04-30 11:33:13,505 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-04-30 11:33:13,505 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1124 [INFO]: New best (3616.03) for latency ExtremeClogL1U23
2025-04-30 11:33:13,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1127 [INFO]: saving network
2025-04-30 11:33:13,517 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-04-30 11:33:13,547 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 9/100 (estimated time remaining: 36 hours, 3 minutes, 15 seconds)
2025-04-30 11:48:55,805 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 11:48:55,809 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 11:56:28,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2912.20825 ± 1212.536
2025-04-30 11:56:28,989 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3750.2224, 840.20544, 3483.3442, 4105.8315, 3856.9895, 3724.1116, 3478.4106, 1275.6736, 1149.3687, 3457.9246]
2025-04-30 11:56:28,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-04-30 11:56:28,991 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 10/100 (estimated time remaining: 35 hours, 29 minutes, 56 seconds)
2025-04-30 12:11:54,461 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 12:11:54,468 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 12:19:50,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3130.59717 ± 1222.911
2025-04-30 12:19:50,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1387.9211, 4276.6147, 992.99817, 4021.668, 3504.0542, 4110.5845, 3464.8123, 4042.3481, 1545.686, 3959.2854]
2025-04-30 12:19:50,431 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-04-30 12:19:50,432 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 11/100 (estimated time remaining: 35 hours, 10 minutes, 57 seconds)
2025-04-30 12:35:28,523 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 12:35:28,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 12:43:08,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3633.21484 ± 985.594
2025-04-30 12:43:08,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2982.4023, 4023.6714, 4436.7305, 4055.964, 4108.092, 3278.2014, 4252.8306, 4164.1084, 966.56055, 4063.588]
2025-04-30 12:43:08,287 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-04-30 12:43:08,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1124 [INFO]: New best (3633.21) for latency ExtremeClogL1U23
2025-04-30 12:43:08,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1127 [INFO]: saving network
2025-04-30 12:43:08,297 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-04-30 12:43:08,328 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 12/100 (estimated time remaining: 34 hours, 46 minutes, 35 seconds)
2025-04-30 12:58:15,451 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 12:58:15,458 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 13:05:58,542 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3445.77100 ± 954.202
2025-04-30 13:05:58,542 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4076.9614, 4221.807, 2247.4614, 1365.1459, 4156.123, 4145.7383, 4356.827, 2864.129, 3222.0608, 3801.4578]
2025-04-30 13:05:58,542 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-04-30 13:05:58,544 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 13/100 (estimated time remaining: 33 hours, 58 minutes, 8 seconds)
2025-04-30 13:21:28,395 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 13:21:28,402 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 13:29:05,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3010.60400 ± 1407.107
2025-04-30 13:29:05,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1516.3374, 605.69, 4354.91, 4544.128, 2632.1829, 3791.3955, 4356.856, 1143.648, 4260.1523, 2900.7385]
2025-04-30 13:29:05,646 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-04-30 13:29:05,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 14/100 (estimated time remaining: 33 hours, 36 minutes, 6 seconds)
2025-04-30 13:45:13,646 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 13:45:13,650 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 13:52:23,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3262.80225 ± 1446.440
2025-04-30 13:52:23,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4520.067, 4499.985, 4519.279, 3105.2678, 518.789, 3795.6306, 3834.0232, 612.633, 2978.2979, 4244.0503]
2025-04-30 13:52:23,930 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-04-30 13:52:23,931 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 15/100 (estimated time remaining: 33 hours, 13 minutes, 44 seconds)
2025-04-30 14:11:02,155 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 14:11:02,160 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 14:18:04,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3853.82690 ± 1315.132
2025-04-30 14:18:04,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4528.1836, 4622.2905, 4504.517, 4532.699, 4424.492, 4385.876, 4577.2935, 4464.2217, 1633.2964, 865.3995]
2025-04-30 14:18:04,482 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-04-30 14:18:04,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1124 [INFO]: New best (3853.83) for latency ExtremeClogL1U23
2025-04-30 14:18:04,483 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1127 [INFO]: saving network
2025-04-30 14:18:04,495 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-04-30 14:18:04,527 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 16/100 (estimated time remaining: 33 hours, 29 minutes, 59 seconds)
2025-04-30 14:34:32,729 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 14:34:32,735 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 14:41:43,639 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2714.23511 ± 1714.805
2025-04-30 14:41:43,640 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1772.0735, 4763.3, 262.7446, 1483.502, 4000.5999, 1967.6519, 4759.1255, 4931.2153, 2827.3953, 374.74612]
2025-04-30 14:41:43,640 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-04-30 14:41:43,641 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 17/100 (estimated time remaining: 33 hours, 12 minutes, 17 seconds)
2025-04-30 14:57:51,682 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 14:57:51,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 15:05:17,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3081.74146 ± 1741.174
2025-04-30 15:05:17,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [880.12885, 1225.1986, 4919.12, 1943.9436, 4375.6562, 156.34444, 4049.736, 4574.79, 4961.125, 3731.372]
2025-04-30 15:05:17,742 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-04-30 15:05:17,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 18/100 (estimated time remaining: 33 hours, 42 seconds)
2025-04-30 15:21:05,520 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 15:21:05,525 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 15:28:29,272 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3774.08667 ± 1328.736
2025-04-30 15:28:29,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4673.5684, 4727.1924, 1197.7981, 4715.6406, 3622.2852, 4539.912, 4848.8076, 3200.7834, 1459.4038, 4755.4766]
2025-04-30 15:28:29,273 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-04-30 15:28:29,275 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 19/100 (estimated time remaining: 32 hours, 38 minutes, 3 seconds)
2025-04-30 15:45:14,922 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 15:45:14,929 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 15:53:18,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3825.63354 ± 1524.938
2025-04-30 15:53:18,398 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2189.4275, 4677.7666, 4641.403, 4796.658, 4830.9453, 3439.8367, 69.66753, 5132.4727, 3558.5613, 4919.5996]
2025-04-30 15:53:18,398 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-04-30 15:53:18,400 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 20/100 (estimated time remaining: 32 hours, 38 minutes, 42 seconds)
2025-04-30 16:09:03,933 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 16:09:03,940 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 16:16:53,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3119.13379 ± 1589.766
2025-04-30 16:16:53,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4794.126, 4607.4487, 4722.4307, 484.2418, 4953.046, 929.7872, 2732.696, 1680.6036, 3569.6926, 2717.2654]
2025-04-30 16:16:53,568 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-04-30 16:16:53,570 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 21/100 (estimated time remaining: 31 hours, 41 minutes, 4 seconds)
2025-04-30 16:32:13,559 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 16:32:13,565 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 16:39:41,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4136.61816 ± 1448.150
2025-04-30 16:39:41,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4923.489, 4933.8516, 621.51404, 4838.653, 4687.793, 2008.6804, 4748.038, 5032.652, 4695.054, 4876.4526]
2025-04-30 16:39:41,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-04-30 16:39:41,558 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1124 [INFO]: New best (4136.62) for latency ExtremeClogL1U23
2025-04-30 16:39:41,559 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1127 [INFO]: saving network
2025-04-30 16:39:41,569 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-04-30 16:39:41,597 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 22/100 (estimated time remaining: 31 hours, 3 minutes, 51 seconds)
2025-04-30 16:55:14,071 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 16:55:14,075 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 17:03:02,171 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2332.64697 ± 2060.225
2025-04-30 17:03:02,171 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4762.1226, 4763.6133, 619.1456, 820.2066, 405.6325, 957.60345, 606.73865, 4899.447, 4970.4727, 521.48895]
2025-04-30 17:03:02,171 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-04-30 17:03:02,173 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 23/100 (estimated time remaining: 30 hours, 36 minutes, 45 seconds)
2025-04-30 17:17:29,202 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 17:17:29,205 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 17:24:47,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3168.55347 ± 1624.907
2025-04-30 17:24:47,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [921.9262, 490.6071, 5073.4473, 1634.5254, 4953.4214, 4165.1943, 4933.278, 3901.0793, 3040.4248, 2571.6338]
2025-04-30 17:24:47,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-04-30 17:24:47,506 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 24/100 (estimated time remaining: 29 hours, 51 minutes, 4 seconds)
2025-04-30 17:40:45,450 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 17:40:45,453 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 17:48:13,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2990.28809 ± 2057.552
2025-04-30 17:48:13,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4715.409, 4847.1304, -18.573803, 2167.8652, 410.7798, 2313.6272, 5098.4956, 5145.2954, 409.80624, 4813.0474]
2025-04-30 17:48:13,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-04-30 17:48:13,478 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 25/100 (estimated time remaining: 29 hours, 6 minutes, 45 seconds)
2025-04-30 18:02:29,893 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 18:02:29,897 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 18:09:28,755 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3969.56641 ± 1438.408
2025-04-30 18:09:28,755 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4687.5483, 4853.3774, 1665.4467, 5013.903, 4062.6396, 5146.0645, 1544.7094, 2331.5837, 5325.363, 5065.026]
2025-04-30 18:09:28,755 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-04-30 18:09:28,757 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 26/100 (estimated time remaining: 28 hours, 8 minutes, 47 seconds)
2025-04-30 18:23:34,741 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 18:23:34,744 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 18:30:47,511 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2857.02710 ± 2016.641
2025-04-30 18:30:47,511 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [5087.035, 5102.95, 334.52692, 3629.129, 5161.73, 246.66812, 2470.918, 4674.393, 340.27005, 1522.6519]
2025-04-30 18:30:47,511 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-04-30 18:30:47,513 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 27/100 (estimated time remaining: 27 hours, 24 minutes, 15 seconds)
2025-04-30 18:45:26,525 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 18:45:26,528 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 18:52:46,093 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3814.75708 ± 1486.780
2025-04-30 18:52:46,108 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2175.2124, 5538.5786, 2912.2822, 3585.9038, 802.63745, 4906.191, 5299.179, 5161.331, 3080.495, 4685.761]
2025-04-30 18:52:46,108 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-04-30 18:52:46,111 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 28/100 (estimated time remaining: 26 hours, 42 minutes, 5 seconds)
2025-04-30 19:06:18,370 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 19:06:18,373 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 19:13:48,620 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3975.89404 ± 996.098
2025-04-30 19:13:48,620 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4698.042, 2804.2292, 2525.1582, 4994.4937, 3111.3433, 5319.897, 4827.505, 3455.1877, 4827.9136, 3195.172]
2025-04-30 19:13:48,620 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-04-30 19:13:48,623 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 29/100 (estimated time remaining: 26 hours, 9 minutes, 52 seconds)
2025-04-30 19:27:32,222 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 19:27:32,225 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 19:35:10,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3695.40186 ± 1721.737
2025-04-30 19:35:10,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [5227.988, 4330.918, 4211.9043, 3530.3853, 1800.8882, 1720.5348, 5465.412, 302.0488, 5040.149, 5323.7944]
2025-04-30 19:35:10,244 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-04-30 19:35:10,247 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 30/100 (estimated time remaining: 25 hours, 18 minutes, 38 seconds)
2025-04-30 19:48:38,101 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 19:48:38,104 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 19:56:03,805 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4656.08838 ± 1137.834
2025-04-30 19:56:03,805 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4867.829, 5422.213, 3499.3396, 4626.259, 1686.2671, 5402.1, 5560.666, 5122.8594, 5200.339, 5173.011]
2025-04-30 19:56:03,805 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-04-30 19:56:03,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1124 [INFO]: New best (4656.09) for latency ExtremeClogL1U23
2025-04-30 19:56:03,806 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1127 [INFO]: saving network
2025-04-30 19:56:03,816 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-04-30 19:56:03,844 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 31/100 (estimated time remaining: 24 hours, 52 minutes, 11 seconds)
2025-04-30 20:09:47,838 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 20:09:47,841 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 20:17:09,526 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3370.07739 ± 1671.772
2025-04-30 20:17:09,526 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3392.8027, 5394.3994, 5240.78, 2308.1106, 1450.1497, 496.71432, 4435.86, 5086.0405, 1755.607, 4140.3066]
2025-04-30 20:17:09,526 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-04-30 20:17:09,529 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 32/100 (estimated time remaining: 24 hours, 27 minutes, 51 seconds)
2025-04-30 20:30:56,396 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 20:30:56,399 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 20:38:08,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3590.86768 ± 1782.872
2025-04-30 20:38:08,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [513.2334, 2461.9265, 5268.3315, 4147.7793, 1909.3002, 5413.4756, 5396.071, 4040.5598, 5405.773, 1352.2279]
2025-04-30 20:38:08,920 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-04-30 20:38:08,923 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 33/100 (estimated time remaining: 23 hours, 53 minutes, 10 seconds)
2025-04-30 20:51:56,240 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 20:51:56,243 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 20:59:09,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3901.29639 ± 1663.652
2025-04-30 20:59:09,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4901.1553, 3862.8188, 825.51025, 5536.2744, 3782.4243, 4998.032, 824.31525, 5762.739, 4520.342, 3999.3499]
2025-04-30 20:59:09,313 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-04-30 20:59:09,316 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 34/100 (estimated time remaining: 23 hours, 31 minutes, 37 seconds)
2025-04-30 21:12:52,368 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 21:12:52,371 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 21:20:04,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3806.20557 ± 1852.569
2025-04-30 21:20:04,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [5387.3267, 3496.2876, 4828.8296, 383.12067, 5445.9014, 5395.6123, 2309.649, 5020.2173, 4944.657, 850.4524]
2025-04-30 21:20:04,221 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-04-30 21:20:04,224 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 35/100 (estimated time remaining: 23 hours, 4 minutes, 40 seconds)
2025-04-30 21:33:55,912 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 21:33:55,915 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 21:41:20,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3588.08350 ± 1940.506
2025-04-30 21:41:20,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [244.4441, 5072.998, 4971.7476, 2381.1963, 4906.6064, 1280.7126, 4838.317, 5278.571, 5618.749, 1287.4902]
2025-04-30 21:41:20,123 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-04-30 21:41:20,126 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 36/100 (estimated time remaining: 22 hours, 48 minutes, 31 seconds)
2025-04-30 21:55:11,552 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 21:55:11,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 22:02:45,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3760.33057 ± 2291.256
2025-04-30 22:02:45,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3129.1814, 5638.355, 883.84357, 5466.688, 5482.061, 5673.706, 501.29074, 5540.372, 5334.5654, -46.759262]
2025-04-30 22:02:45,088 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-04-30 22:02:45,091 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 37/100 (estimated time remaining: 22 hours, 31 minutes, 35 seconds)
2025-04-30 22:16:34,178 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 22:16:34,181 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 22:24:11,263 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 2679.66992 ± 2066.240
2025-04-30 22:24:11,263 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [959.92114, 378.3468, 4309.823, 450.2148, 353.0513, 5189.891, 5504.055, 2862.3352, 5135.9453, 1653.1162]
2025-04-30 22:24:11,263 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-04-30 22:24:11,266 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 38/100 (estimated time remaining: 22 hours, 16 minutes, 5 seconds)
2025-04-30 22:37:37,141 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 22:37:37,144 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 22:45:10,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4399.44727 ± 1382.942
2025-04-30 22:45:10,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [5452.745, 5241.0244, 5470.709, 2024.0356, 5448.8804, 1662.5311, 3930.648, 5136.541, 4173.7026, 5453.652]
2025-04-30 22:45:10,686 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-04-30 22:45:10,689 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 39/100 (estimated time remaining: 21 hours, 54 minutes, 41 seconds)
2025-04-30 22:59:16,867 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 22:59:16,871 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 23:06:43,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3124.42993 ± 1892.719
2025-04-30 23:06:43,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [5404.0156, 1931.3273, 3532.879, 282.22824, 5574.8105, 1923.4742, 5767.743, 3920.022, 1059.6793, 1848.1188]
2025-04-30 23:06:43,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-04-30 23:06:43,331 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 40/100 (estimated time remaining: 21 hours, 41 minutes, 9 seconds)
2025-04-30 23:20:42,432 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 23:20:42,435 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 23:27:58,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3882.98120 ± 1443.927
2025-04-30 23:27:58,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3938.9648, 2262.653, 4835.4287, 3183.182, 5274.504, 1920.8881, 5791.2793, 3176.2405, 2397.8423, 6048.83]
2025-04-30 23:27:58,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-04-30 23:27:58,366 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 41/100 (estimated time remaining: 21 hours, 19 minutes, 38 seconds)
2025-04-30 23:41:55,487 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-04-30 23:41:55,491 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-04-30 23:49:05,964 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4438.97998 ± 1413.790
2025-04-30 23:49:05,964 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [5507.7427, 4472.5034, 6026.7354, 1552.2739, 2740.4578, 5840.9883, 4011.4504, 3509.3098, 4927.7207, 5800.6196]
2025-04-30 23:49:05,964 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-04-30 23:49:05,967 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 42/100 (estimated time remaining: 20 hours, 54 minutes, 54 seconds)
2025-05-01 00:03:04,305 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 00:03:04,308 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 00:10:16,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3501.08789 ± 1299.310
2025-05-01 00:10:16,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3414.513, 5047.828, 2567.1348, 4941.6187, 5758.058, 3391.9211, 3469.968, 2506.8381, 2534.3933, 1378.6062]
2025-05-01 00:10:16,018 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 00:10:16,022 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 43/100 (estimated time remaining: 20 hours, 30 minutes, 31 seconds)
2025-05-01 00:24:14,527 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 00:24:14,530 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 00:31:25,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3709.12036 ± 1490.026
2025-05-01 00:31:25,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1995.8208, 3443.021, 4229.729, 5702.2793, 4697.7944, 5297.5444, 5372.176, 1783.8683, 1606.3346, 2962.6365]
2025-05-01 00:31:25,497 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 00:31:25,500 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 44/100 (estimated time remaining: 20 hours, 11 minutes, 12 seconds)
2025-05-01 00:45:11,227 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 00:45:11,231 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 00:52:25,137 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3749.01880 ± 2113.122
2025-05-01 00:52:25,138 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [5680.687, 5261.737, 1995.1215, 5616.2656, 2759.4773, 5806.258, 1080.3064, 59.573036, 6135.8726, 3094.89]
2025-05-01 00:52:25,138 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 00:52:25,141 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 45/100 (estimated time remaining: 19 hours, 43 minutes, 48 seconds)
2025-05-01 01:06:03,929 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 01:06:03,932 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 01:13:15,997 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4744.35254 ± 1883.638
2025-05-01 01:13:15,997 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [5895.743, 5069.834, 5558.8853, 5656.301, 5254.6587, 1038.8636, 5839.0986, 5985.233, 6129.6187, 1015.2929]
2025-05-01 01:13:15,997 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 01:13:15,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1124 [INFO]: New best (4744.35) for latency ExtremeClogL1U23
2025-05-01 01:13:15,998 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-01 01:13:16,008 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-01 01:13:16,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 46/100 (estimated time remaining: 19 hours, 18 minutes, 14 seconds)
2025-05-01 01:27:08,266 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 01:27:08,269 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 01:34:30,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3215.54175 ± 2551.765
2025-05-01 01:34:30,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [5542.622, -18.681671, 2823.6746, 5450.7705, 791.3573, 9.199283, 5710.5576, 295.18436, 5583.7773, 5966.957]
2025-05-01 01:34:30,083 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 01:34:30,086 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 47/100 (estimated time remaining: 18 hours, 58 minutes, 20 seconds)
2025-05-01 01:47:52,855 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 01:47:52,858 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 01:55:27,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3700.48877 ± 2407.119
2025-05-01 01:55:27,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [6169.773, 6276.9507, 1584.9945, 752.73956, 6220.5303, 382.40338, 1480.7931, 5879.3413, 5651.953, 2605.407]
2025-05-01 01:55:27,907 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 01:55:27,911 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 48/100 (estimated time remaining: 18 hours, 35 minutes, 6 seconds)
2025-05-01 02:09:04,056 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 02:09:04,059 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 02:16:36,683 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4346.83203 ± 1742.849
2025-05-01 02:16:36,683 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3916.9148, 5686.0815, 5975.56, 5343.7075, 5933.0527, 5427.374, 857.4589, 1523.5773, 5078.3145, 3726.2812]
2025-05-01 02:16:36,683 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 02:16:36,687 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 49/100 (estimated time remaining: 18 hours, 13 minutes, 56 seconds)
2025-05-01 02:30:42,677 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 02:30:42,680 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 02:38:02,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3149.62256 ± 1883.617
2025-05-01 02:38:02,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1799.5209, 1265.8406, 577.93506, 2571.9402, 6283.128, 4588.375, 1406.1853, 3033.0898, 4124.459, 5845.753]
2025-05-01 02:38:02,645 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 02:38:02,648 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 50/100 (estimated time remaining: 17 hours, 57 minutes, 22 seconds)
2025-05-01 02:51:47,469 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 02:51:47,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 02:59:01,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3533.55273 ± 1645.093
2025-05-01 02:59:01,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4772.666, 1701.7941, 1520.5618, 6360.2607, 5078.5005, 1551.5168, 2373.2717, 3331.76, 5091.5796, 3553.6152]
2025-05-01 02:59:01,668 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 02:59:01,672 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 51/100 (estimated time remaining: 17 hours, 37 minutes, 36 seconds)
2025-05-01 03:12:41,923 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 03:12:41,927 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 03:19:52,792 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4500.61230 ± 1914.804
2025-05-01 03:19:52,793 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1516.5398, 234.04646, 4170.1475, 6053.7163, 4911.973, 5057.297, 6023.157, 5496.2114, 5681.295, 5861.7373]
2025-05-01 03:19:52,793 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 03:19:52,796 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 52/100 (estimated time remaining: 17 hours, 12 minutes, 42 seconds)
2025-05-01 03:33:49,772 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 03:33:49,775 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 03:41:04,691 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3062.67700 ± 1517.015
2025-05-01 03:41:04,691 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2742.438, 772.6737, 3037.487, 1115.8413, 2665.4746, 4838.3706, 2777.0532, 6159.25, 2619.7527, 3898.4287]
2025-05-01 03:41:04,692 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 03:41:04,695 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 53/100 (estimated time remaining: 16 hours, 53 minutes, 53 seconds)
2025-05-01 03:55:00,684 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 03:55:00,688 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 04:02:11,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3983.64502 ± 1934.534
2025-05-01 04:02:11,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2657.5217, 183.27757, 5291.6016, 6105.8906, 2584.005, 2430.5916, 5562.864, 5711.3843, 6113.7046, 3195.6094]
2025-05-01 04:02:11,042 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 04:02:11,046 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 54/100 (estimated time remaining: 16 hours, 32 minutes, 22 seconds)
2025-05-01 04:15:56,778 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 04:15:56,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 04:23:12,900 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4447.38916 ± 2113.345
2025-05-01 04:23:12,900 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3243.824, 6211.9917, 5442.4507, 36.04799, 6013.7617, 6372.257, 1231.6829, 4520.066, 5736.3145, 5665.492]
2025-05-01 04:23:12,900 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 04:23:12,904 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 55/100 (estimated time remaining: 16 hours, 7 minutes, 34 seconds)
2025-05-01 04:37:02,139 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 04:37:02,142 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 04:44:15,343 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4219.80518 ± 2441.683
2025-05-01 04:44:15,343 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [606.97516, 6302.3696, 6391.5483, 961.0302, 5895.252, 5452.072, 50.01488, 5784.791, 5215.224, 5538.7725]
2025-05-01 04:44:15,343 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 04:44:15,347 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 56/100 (estimated time remaining: 15 hours, 47 minutes, 3 seconds)
2025-05-01 04:57:49,987 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 04:57:49,990 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 05:05:19,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 5469.05762 ± 848.072
2025-05-01 05:05:19,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [5367.363, 6010.7485, 4998.285, 6138.3315, 4494.662, 6524.7046, 3625.4858, 6316.2295, 5593.6245, 5621.1377]
2025-05-01 05:05:19,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 05:05:19,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1124 [INFO]: New best (5469.06) for latency ExtremeClogL1U23
2025-05-01 05:05:19,602 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-01 05:05:19,610 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-01 05:05:19,639 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 57/100 (estimated time remaining: 15 hours, 27 minutes, 56 seconds)
2025-05-01 05:19:02,501 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 05:19:02,504 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 05:26:30,905 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4013.07031 ± 1948.303
2025-05-01 05:26:30,905 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [170.29402, 5581.115, 1911.1023, 6022.1133, 5751.1714, 5892.992, 4791.5513, 3804.376, 4410.176, 1795.8099]
2025-05-01 05:26:30,906 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 05:26:30,910 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 58/100 (estimated time remaining: 15 hours, 6 minutes, 45 seconds)
2025-05-01 05:40:09,575 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 05:40:09,578 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 05:47:43,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4407.10938 ± 2024.025
2025-05-01 05:47:43,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [6243.1763, 5520.42, 4252.248, 377.9411, 5370.6685, 5563.9297, 5353.39, 558.9276, 5633.0127, 5197.379]
2025-05-01 05:47:43,001 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 05:47:43,006 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 59/100 (estimated time remaining: 14 hours, 46 minutes, 28 seconds)
2025-05-01 06:01:22,172 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 06:01:22,175 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 06:08:49,729 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 5070.34668 ± 1837.779
2025-05-01 06:08:49,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [6258.47, 6207.542, 5117.543, 5846.595, 1127.7443, 5392.6343, 6465.6445, 5843.149, 6549.461, 1894.6853]
2025-05-01 06:08:49,730 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 06:08:49,734 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 60/100 (estimated time remaining: 14 hours, 26 minutes, 2 seconds)
2025-05-01 06:22:36,159 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 06:22:36,163 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 06:29:50,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 5111.56689 ± 1479.531
2025-05-01 06:29:50,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [5968.2856, 5977.1685, 3876.0383, 5489.603, 5727.044, 5322.3037, 6068.1807, 5885.641, 1061.2769, 5740.126]
2025-05-01 06:29:50,284 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 06:29:50,288 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 61/100 (estimated time remaining: 14 hours, 4 minutes, 39 seconds)
2025-05-01 06:44:01,780 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 06:44:01,784 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 06:51:17,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4028.67773 ± 2568.183
2025-05-01 06:51:17,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [6389.352, 724.41644, 6865.837, 6303.0674, 6088.0293, 740.61884, 564.3971, 4473.4785, 1948.9027, 6188.678]
2025-05-01 06:51:17,555 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 06:51:17,560 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 62/100 (estimated time remaining: 13 hours, 46 minutes, 31 seconds)
2025-05-01 07:05:10,276 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 07:05:10,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 07:12:19,804 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3731.39844 ± 2316.116
2025-05-01 07:12:19,805 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [5919.9233, 5998.1685, 2394.4797, 4778.2173, 5524.377, 716.3801, 512.7333, 5171.117, 5925.7563, 372.8325]
2025-05-01 07:12:19,805 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 07:12:19,809 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 63/100 (estimated time remaining: 13 hours, 24 minutes, 11 seconds)
2025-05-01 07:26:08,288 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 07:26:08,291 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 07:33:21,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4122.68945 ± 2116.177
2025-05-01 07:33:21,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [210.08275, 6300.507, 5325.161, 6356.3286, 5984.3306, 911.4871, 3376.6064, 5866.605, 3516.8093, 3378.9768]
2025-05-01 07:33:21,033 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 07:33:21,038 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 64/100 (estimated time remaining: 13 hours, 1 minute, 41 seconds)
2025-05-01 07:47:20,990 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 07:47:20,993 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 07:54:34,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3668.85596 ± 2445.290
2025-05-01 07:54:34,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [5188.6274, 6381.3745, 1285.4689, 6564.26, 103.259865, 5383.6123, 2742.338, 203.84068, 6209.8643, 2625.9146]
2025-05-01 07:54:34,260 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 07:54:34,265 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 65/100 (estimated time remaining: 12 hours, 41 minutes, 20 seconds)
2025-05-01 08:08:08,565 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 08:08:08,569 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 08:15:18,295 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3682.67041 ± 1960.926
2025-05-01 08:15:18,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4651.543, 5925.608, 2813.0835, 708.6827, 6024.301, 2819.4646, 5833.5503, 3112.0605, 376.73993, 4561.668]
2025-05-01 08:15:18,296 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 08:15:18,300 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 66/100 (estimated time remaining: 12 hours, 18 minutes, 16 seconds)
2025-05-01 08:29:04,161 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 08:29:04,164 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 08:36:22,879 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3348.79224 ± 2345.060
2025-05-01 08:36:22,879 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [1526.2179, 6260.739, 2058.2583, 6222.2295, 3486.9592, 1673.305, 575.21515, 6885.6963, 4453.816, 345.48285]
2025-05-01 08:36:22,879 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 08:36:22,884 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 67/100 (estimated time remaining: 11 hours, 54 minutes, 36 seconds)
2025-05-01 08:50:14,661 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 08:50:14,664 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 08:57:53,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 5017.62939 ± 1776.590
2025-05-01 08:57:53,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4488.2065, 637.9698, 2949.5864, 6076.424, 6103.1963, 5822.6978, 5461.631, 6727.529, 5946.493, 5962.5586]
2025-05-01 08:57:53,883 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 08:57:53,887 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 68/100 (estimated time remaining: 11 hours, 36 minutes, 44 seconds)
2025-05-01 09:11:37,373 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 09:11:37,376 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 09:19:25,393 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4222.47559 ± 1756.407
2025-05-01 09:19:25,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3251.3005, 6130.909, 2043.4805, 3558.5913, 5605.2544, 6275.213, 4525.8257, 4769.2783, 611.0409, 5453.8677]
2025-05-01 09:19:25,394 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 09:19:25,398 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 69/100 (estimated time remaining: 11 hours, 18 minutes, 51 seconds)
2025-05-01 09:32:51,027 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 09:32:51,031 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 09:40:32,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4705.92676 ± 2460.775
2025-05-01 09:40:32,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [6005.4673, -12.679768, 516.07886, 5969.0933, 6811.391, 3327.631, 6672.4185, 6104.5625, 7006.26, 4659.045]
2025-05-01 09:40:32,327 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 09:40:32,332 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 70/100 (estimated time remaining: 10 hours, 57 minutes)
2025-05-01 09:54:15,802 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 09:54:15,805 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 10:01:45,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3741.75977 ± 2283.296
2025-05-01 10:01:45,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [5928.5234, 2738.8232, 909.7934, 6712.8643, 843.26715, 6501.9014, 5525.8525, 4255.8857, 3408.7253, 591.9621]
2025-05-01 10:01:45,807 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 10:01:45,812 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 71/100 (estimated time remaining: 10 hours, 38 minutes, 45 seconds)
2025-05-01 10:15:19,800 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 10:15:19,803 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 10:22:45,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 5623.82812 ± 1194.209
2025-05-01 10:22:45,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3484.41, 6224.923, 6555.655, 5977.2725, 5762.9175, 5915.16, 6900.9155, 3166.681, 6289.9, 5960.447]
2025-05-01 10:22:45,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 10:22:45,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1124 [INFO]: New best (5623.83) for latency ExtremeClogL1U23
2025-05-01 10:22:45,241 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-01 10:22:45,251 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-01 10:22:45,280 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 72/100 (estimated time remaining: 10 hours, 16 minutes, 57 seconds)
2025-05-01 10:36:20,462 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 10:36:20,465 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 10:43:33,057 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4426.91016 ± 2127.320
2025-05-01 10:43:33,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [5917.954, 6093.0493, 6238.0034, 6304.497, 2611.1946, 3546.6218, 7154.3735, 2385.0278, 3752.6736, 265.7051]
2025-05-01 10:43:33,058 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 10:43:33,063 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 73/100 (estimated time remaining: 9 hours, 51 minutes, 39 seconds)
2025-05-01 10:57:24,247 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 10:57:24,250 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 11:04:41,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 5027.56104 ± 1984.773
2025-05-01 11:04:41,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3201.203, 6405.477, 5365.3286, 6221.463, 53.113213, 5030.0215, 6649.9434, 6675.8164, 4256.235, 6417.007]
2025-05-01 11:04:41,082 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 11:04:41,087 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 74/100 (estimated time remaining: 9 hours, 28 minutes, 24 seconds)
2025-05-01 11:18:39,561 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 11:18:39,564 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 11:25:51,362 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 5630.58154 ± 1592.874
2025-05-01 11:25:51,362 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2087.0686, 6576.1826, 5847.9365, 6299.807, 6596.9434, 6659.373, 6686.4595, 6281.342, 6355.104, 2915.596]
2025-05-01 11:25:51,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 11:25:51,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1124 [INFO]: New best (5630.58) for latency ExtremeClogL1U23
2025-05-01 11:25:51,363 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-01 11:25:51,373 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-01 11:25:51,404 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 75/100 (estimated time remaining: 9 hours, 7 minutes, 39 seconds)
2025-05-01 11:39:40,056 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 11:39:40,060 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 11:46:48,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4690.91260 ± 1894.094
2025-05-01 11:46:48,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [6623.243, 3493.5183, 5599.991, 5552.59, 378.642, 3915.4746, 5773.394, 2955.995, 6822.8325, 5793.4434]
2025-05-01 11:46:48,782 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 11:46:48,788 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 76/100 (estimated time remaining: 8 hours, 45 minutes, 14 seconds)
2025-05-01 12:00:40,697 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 12:00:40,700 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 12:07:53,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4356.39746 ± 2224.459
2025-05-01 12:07:53,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [6477.5815, 5981.345, 6596.8203, 5573.678, 113.36757, 411.85715, 4325.3203, 3841.147, 5770.024, 4472.8325]
2025-05-01 12:07:53,237 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 12:07:53,243 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 77/100 (estimated time remaining: 8 hours, 24 minutes, 38 seconds)
2025-05-01 12:21:41,598 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 12:21:41,601 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 12:29:02,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3064.57031 ± 1888.385
2025-05-01 12:29:02,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2551.5625, 5930.095, 5672.574, 128.04391, 3676.0884, 5130.8164, 1553.021, 2913.732, 1853.8732, 1235.8956]
2025-05-01 12:29:02,128 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 12:29:02,133 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 78/100 (estimated time remaining: 8 hours, 5 minutes, 13 seconds)
2025-05-01 12:42:42,540 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 12:42:42,543 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 12:50:25,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3998.27100 ± 2380.924
2025-05-01 12:50:25,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2382.068, 5980.503, 6581.57, 995.60046, 5665.575, 1391.2864, 5618.202, 272.50342, 4210.931, 6884.474]
2025-05-01 12:50:25,987 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 12:50:25,992 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 79/100 (estimated time remaining: 7 hours, 45 minutes, 17 seconds)
2025-05-01 13:04:03,999 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 13:04:04,002 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 13:11:34,916 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4370.12500 ± 2095.984
2025-05-01 13:11:34,916 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [7028.618, 2444.6106, 4798.2827, 3271.9473, 6148.9805, 2083.3892, 422.0471, 6308.92, 6309.8687, 4884.586]
2025-05-01 13:11:34,916 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 13:11:34,921 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 80/100 (estimated time remaining: 7 hours, 24 minutes, 2 seconds)
2025-05-01 13:25:13,992 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 13:25:13,995 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 13:32:43,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3818.65381 ± 2274.405
2025-05-01 13:32:43,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2597.977, 1486.0262, 3701.7598, 7080.0845, 6555.0986, 5890.14, 3599.7478, 211.9663, 1393.2246, 5670.5146]
2025-05-01 13:32:43,152 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 13:32:43,158 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 81/100 (estimated time remaining: 7 hours, 3 minutes, 37 seconds)
2025-05-01 13:46:41,472 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 13:46:41,476 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 13:54:04,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3905.25732 ± 2349.053
2025-05-01 13:54:04,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [5765.239, 6350.396, 5960.4062, 4123.868, 1142.7041, 395.8801, 4697.407, 6611.1, 3782.613, 222.96077]
2025-05-01 13:54:04,999 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 13:54:05,005 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 82/100 (estimated time remaining: 6 hours, 43 minutes, 32 seconds)
2025-05-01 14:08:03,393 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 14:08:03,397 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 14:15:58,598 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4464.08691 ± 2765.208
2025-05-01 14:15:58,598 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [7102.2837, 6921.3906, 680.48346, 5963.9043, 5909.0205, 146.78036, 5727.062, 5977.4546, 56.623093, 6155.8613]
2025-05-01 14:15:58,598 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 14:15:58,604 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 83/100 (estimated time remaining: 6 hours, 24 minutes, 59 seconds)
2025-05-01 14:30:19,363 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 14:30:19,370 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 14:37:52,712 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4501.44629 ± 2648.171
2025-05-01 14:37:52,712 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [6743.596, 3365.585, 5885.347, 7209.722, 5986.827, 941.7144, 1062.3721, 303.76566, 6920.506, 6595.029]
2025-05-01 14:37:52,712 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 14:37:52,718 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 84/100 (estimated time remaining: 6 hours, 5 minutes, 18 seconds)
2025-05-01 14:54:11,580 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 14:54:11,586 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 15:01:53,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4331.04590 ± 2468.111
2025-05-01 15:01:53,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [6653.1274, 21.692013, 5725.04, 1656.0435, 5015.176, 6885.4155, 1883.4243, 6468.7173, 2237.8113, 6764.011]
2025-05-01 15:01:53,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 15:01:53,112 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 85/100 (estimated time remaining: 5 hours, 52 minutes, 58 seconds)
2025-05-01 15:18:16,341 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 15:18:16,349 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 15:25:31,186 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 5044.98096 ± 2516.122
2025-05-01 15:25:31,186 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [6839.529, 6610.7026, 6928.0938, 753.7475, 3484.2744, 6903.7993, 6139.7334, 6242.33, 6504.799, 42.80127]
2025-05-01 15:25:31,186 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 15:25:31,193 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 86/100 (estimated time remaining: 5 hours, 38 minutes, 24 seconds)
2025-05-01 15:42:10,141 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 15:42:10,149 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 15:49:40,650 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4375.50439 ± 2253.117
2025-05-01 15:49:40,650 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [261.468, 3077.499, 2808.5146, 5433.8706, 6618.436, 1328.2822, 6601.328, 6559.413, 4568.8916, 6497.3403]
2025-05-01 15:49:40,650 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 15:49:40,658 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 87/100 (estimated time remaining: 5 hours, 23 minutes, 39 seconds)
2025-05-01 16:06:11,225 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 16:06:11,230 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 16:13:44,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4145.36426 ± 2711.327
2025-05-01 16:13:44,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3698.266, 4218.5835, 7181.9688, 402.87048, 6764.5137, 6578.772, 408.50104, 6385.396, 5668.905, 145.86697]
2025-05-01 16:13:44,177 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 16:13:44,184 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 88/100 (estimated time remaining: 5 hours, 6 minutes, 10 seconds)
2025-05-01 16:29:51,848 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 16:29:51,856 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 16:37:41,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3803.25049 ± 2634.257
2025-05-01 16:37:41,301 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [6583.6743, 6245.644, 112.47472, 6933.5586, 353.43304, 2706.8054, 6792.3335, 1874.3094, 4921.857, 1508.4128]
2025-05-01 16:37:41,302 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 16:37:41,308 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 89/100 (estimated time remaining: 4 hours, 47 minutes, 32 seconds)
2025-05-01 16:54:00,974 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 16:54:00,982 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 17:01:42,165 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3958.27686 ± 2617.289
2025-05-01 17:01:42,165 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [6087.134, 3606.7566, 867.9925, 6777.9434, 6995.3677, 5448.844, 1452.4902, 108.22566, 1497.2928, 6740.724]
2025-05-01 17:01:42,165 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 17:01:42,172 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 90/100 (estimated time remaining: 4 hours, 23 minutes, 35 seconds)
2025-05-01 17:18:46,960 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 17:18:46,969 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 17:28:29,633 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 3538.83521 ± 2039.825
2025-05-01 17:28:29,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [5557.322, 6101.1807, 1288.3982, 1786.0947, 6011.0044, 1885.9939, 1633.4978, 1794.3688, 3119.6377, 6210.8574]
2025-05-01 17:28:29,634 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 17:28:29,647 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 91/100 (estimated time remaining: 4 hours, 5 minutes, 56 seconds)
2025-05-01 17:47:49,937 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 17:47:49,943 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 17:57:07,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 5118.32471 ± 1915.130
2025-05-01 17:57:07,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [5780.285, 3311.12, 6219.8804, 6114.7646, 211.20403, 6945.0312, 4674.4478, 6011.9272, 5269.262, 6645.3247]
2025-05-01 17:57:07,473 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 17:57:07,485 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 92/100 (estimated time remaining: 3 hours, 49 minutes, 24 seconds)
2025-05-01 18:15:56,664 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 18:15:56,674 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 18:25:06,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4537.01367 ± 2249.953
2025-05-01 18:25:06,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [424.52057, 5631.89, 5298.7676, 1053.5923, 2235.778, 6305.145, 6015.5786, 6250.2754, 5208.7593, 6945.833]
2025-05-01 18:25:06,848 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 18:25:06,861 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 93/100 (estimated time remaining: 3 hours, 30 minutes, 12 seconds)
2025-05-01 18:44:32,155 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 18:44:32,161 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 18:54:49,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 5389.66943 ± 2094.235
2025-05-01 18:54:49,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [2539.195, 3320.4563, 3906.128, 7093.5073, 6815.535, 7401.776, 6483.3564, 7227.945, 1880.9227, 7227.8755]
2025-05-01 18:54:49,765 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 18:54:49,777 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 94/100 (estimated time remaining: 3 hours, 11 minutes, 59 seconds)
2025-05-01 19:14:09,282 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 19:14:09,289 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 19:24:02,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 5233.76172 ± 2187.529
2025-05-01 19:24:02,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [3309.1265, 2239.7617, 648.46906, 5758.212, 6896.1895, 6556.0938, 6370.9004, 6496.7285, 7139.6587, 6922.476]
2025-05-01 19:24:02,434 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 19:24:02,447 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 95/100 (estimated time remaining: 2 hours, 50 minutes, 48 seconds)
2025-05-01 19:43:12,337 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 19:43:12,343 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 19:52:15,226 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 5883.28906 ± 1318.626
2025-05-01 19:52:15,227 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [4692.208, 6805.613, 6646.0005, 5912.6504, 7299.596, 6206.4546, 7151.874, 3342.188, 6778.9453, 3997.359]
2025-05-01 19:52:15,227 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 19:52:15,227 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1124 [INFO]: New best (5883.29) for latency ExtremeClogL1U23
2025-05-01 19:52:15,227 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-01 19:52:15,245 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-01 19:52:15,294 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 96/100 (estimated time remaining: 2 hours, 23 minutes, 45 seconds)
2025-05-01 20:11:19,931 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 20:11:19,942 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 20:21:11,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4874.79590 ± 2371.135
2025-05-01 20:21:11,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [6902.217, 4174.9595, 1447.6228, 6442.882, 6731.8374, 6420.1445, 7460.5967, 4007.2458, 5189.099, -28.641405]
2025-05-01 20:21:11,758 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 20:21:11,772 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 97/100 (estimated time remaining: 1 hour, 55 minutes, 15 seconds)
2025-05-01 20:41:37,151 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 20:41:37,157 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 20:51:18,478 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 5155.60840 ± 2235.013
2025-05-01 20:51:18,479 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [7548.7944, 6290.176, 4076.6965, 6367.132, 6773.99, 6038.437, 1861.4004, 6076.355, 6264.329, 258.7688]
2025-05-01 20:51:18,479 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 20:51:18,492 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 98/100 (estimated time remaining: 1 hour, 27 minutes, 42 seconds)
2025-05-01 21:11:21,230 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 21:11:21,240 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 21:20:27,950 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4419.71582 ± 2058.755
2025-05-01 21:20:27,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [6529.3213, 7167.01, 4128.359, 3275.993, 2033.4751, 7070.848, 5736.5796, 4609.9307, 2534.5105, 1111.1344]
2025-05-01 21:20:27,951 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 21:20:27,963 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 99/100 (estimated time remaining: 58 minutes, 15 seconds)
2025-05-01 21:41:27,514 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 21:41:27,524 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 21:51:33,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 5984.25488 ± 1885.781
2025-05-01 21:51:33,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [406.12848, 6525.1943, 6528.5728, 6815.4634, 7045.288, 5772.6216, 6638.8286, 6723.785, 6607.151, 6779.5107]
2025-05-01 21:51:33,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 21:51:33,885 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1124 [INFO]: New best (5984.25) for latency ExtremeClogL1U23
2025-05-01 21:51:33,886 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1127 [INFO]: saving network
2025-05-01 21:51:33,904 latency_env.training.utils:544 [DEBUG]: Saving evalcopy of MBPAC to _logs/benchmark-v3-tc4/noisy-halfcheetah/ExtremeClogL1U23-mbpac_memdelay/checkpoints/best_ExtremeClogL1U23.pkl
2025-05-01 21:51:33,960 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1097 [INFO]: Iteration 100/100 (estimated time remaining: 29 minutes, 30 seconds)
2025-05-01 22:10:44,438 latency_env.training.mbpac:635 [DEBUG]: train() done
2025-05-01 22:10:44,448 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1112 [DEBUG]: Evaluating for latency ExtremeClogL1U23...
2025-05-01 22:21:02,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1119 [DEBUG]: Total Reward: 4754.79004 ± 2145.544
2025-05-01 22:21:02,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1120 [DEBUG]: All rewards: [900.1552, 6294.807, 6348.75, 5789.8125, 4489.7437, 2458.0112, 6869.8613, 6339.6914, 6458.1367, 1598.9285]
2025-05-01 22:21:02,090 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1121 [DEBUG]: All trajectory lengths: [1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0, 1000.0]
2025-05-01 22:21:02,105 latency_env.delayed_mdp:training_loop(baseline-mbpac-noisy-halfcheetah):1149 [DEBUG]: Training session finished
